6,244 results on '"Object tracking"'
Search Results
2. Educational Computer Vision Materials for Classification and Tracking of Objects
- Author
-
Emeršič, Žiga, Hrastnik, Gregor, Peer, Nataša Meh, Kirn, Vasja Lev, Justin, Aljaž, Videnović, Jovana, Markićević, Luka, Peer, Peter, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Julian, Vicente, editor, Camacho, David, editor, Yin, Hujun, editor, Alberola, Juan M., editor, Nogueira, Vitor Beires, editor, Novais, Paulo, editor, and Tallón-Ballesteros, Antonio, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
- Author
-
Wu, Zhewei, Yu, Ruilong, Liu, Qihe, Cheng, Shuying, Qiu, Shilin, Zhou, Shijie, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Elysium: Exploring Object-Level Perception in Videos via MLLM
- Author
-
Wang, Han, Ye, Yongjie, Wang, Yanjie, Nie, Yuxiang, Huang, Can, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. A closer look at single object tracking under variable haze.
- Author
-
Singh, Satbir, Lamba, Nikhil, and Khosla, Arun
- Abstract
The task of monitoring the object's path as it travels within a scene has consistently been challenging. When a specific level of haze is introduced to the environment, the endeavor becomes more difficult. The most recent tracking algorithms claim to be capable of accurately monitoring objects in typical visual conditions. However, it is imperative to conduct a comprehensive analysis of their functionality in hazy conditions, as haze is a meteorological adversity that is frequently encountered and has the potential to result in severe consequences. The primary objective of this investigation is to evaluate the efficacy of prominent tracking algorithms in the presence or absence of haze. Additionally, the performance was assessed by examining it in a variety of hazy conditions that were generated using the monocular depth information of the original image. The comparison between the authentic hazy photographs and the artificially created hazy photos has also been demonstrated. Furthermore, several novel relative parameters have been developed for object tracking under obscured vision conditions. These parameters can be employed to maintain the relative tracking performances under both normal and varying hazy conditions.To emphasize the effects of haze, the results have been obtained by using the help of state-of-the-art tracking algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Hybrid-Mode tracker with online SA-LSTM updater.
- Author
-
Zheng, Hongsheng, Gao, Yun, Hu, Yaqing, and Zhang, Xuejie
- Subjects
- *
TRANSFORMER models , *CONVOLUTIONAL neural networks , *VISUAL learning , *SPINE , *GLOBAL method of teaching - Abstract
The backbone network and target template are pivotal factors influencing the performance of Siamese trackers. However, traditional approaches encounter challenges in eliminating local redundancy and establishing global dependencies when learning visual data representations. While convolutional neural networks (CNNs) and vision transformers (ViTs) are commonly employed as backbones in Siamese-based trackers, each primarily addresses only one of these challenges. Furthermore, tracking is a dynamic process. Nonetheless, in many Siamese trackers, solely a fixed initial template is employed to facilitate target state matching. This approach often proves inadequate for effectively handling scenes characterized by target deformation, occlusion, and fast motion. In this paper, we propose a Hybrid-Mode Siamese tracker featuring an online SA-LSTM updater. Distinct learning operators are tailored to exploit characteristics at different depth levels of the backbone, integrating convolution and transformers to form a Hybrid-Mode backbone. This backbone efficiently learns global dependencies among input tokens while minimizing redundant computations in local domains, enhancing feature richness for target tracking. The online SA-LSTM updater comprehensively integrates spatial–temporal context during tracking, producing dynamic template features with enhanced representations of target appearance. Extensive experiments across multiple benchmark datasets, including GOT-10K, LaSOT, TrackingNet, OTB-100, UAV123, and NFS, demonstrate that the proposed method achieves outstanding performance, running at 35 FPS on a single GPU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Infrastructure sensor-based cooperative perception for early stage connected and automated vehicle deployment.
- Author
-
Chen, Chenxi, Tang, Qing, Hu, Xianbiao, and Huang, Zhitong
- Subjects
- *
AUTONOMOUS vehicles , *DETECTORS , *LIDAR , *SYNCHRONIZATION , *AUTOMATION , *PEDESTRIANS - Abstract
Infrastructure-based sensors provide a potentially promising solution to support the wide adoption of connected and automated vehicles (CAVs) technologies at an early stage. For connected vehicles with lower level of automation that do not have perception sensors, infrastructure sensors will significantly boost its capability to understand the driving context. Even if a full suite of sensors is available on a vehicle with higher level of automation, infrastructure sensors can support overcome the issues of occlusion and limited sensor range. To this end, a cooperative perception modeling framework is proposed in this manuscript. In particular, the modeling focus is placed on a key technical challenge, time delay in the cooperative perception process, which is of vital importance to the synchronization, perception, and localization modules. A constant turn-rate velocity (CTRV) model is firstly developed to estimate the future motion states of a vehicle. A delay compensation and fusion module is presented next, to compensate for the time delay due to the computing time and communication latency. Last but not the least, as the behavior of moving objects (i.e., vehicles, cyclists, and pedestrians) is nonlinear in both position and speed aspects, an unscented Kalman filter (UKF) algorithm is developed to improve object tracking accuracy considering communication time delay between the ego vehicle and infrastructure-based LiDAR sensors. Simulation experiments are performed to test the feasibility and evaluate the performance of the proposed algorithm, which shows satisfactory results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Object Tracking Algorithm Based on Integrated Multi-Scale Templates Guided by Judgment Mechanism.
- Author
-
Wang, Jing, Wang, Yanru, Que, Yuxiang, Huang, Weichao, and Wei, Yuan
- Subjects
CONVOLUTIONAL neural networks ,TRACKING algorithms ,TRANSFORMER models ,ALGORITHMS - Abstract
The object tracking algorithm TransT, based on Transformer, achieves significant improvements in accuracy and success rate by fusing the extracted features of convolutional neural networks with the structure of Transformer. However, when dealing with the deformation of the object's appearance, the algorithm exhibits issues such as insufficient tracking accuracy and drift, which directly affect the stability of the algorithm. In order to overcome this problem, this paper demonstrates how to expand scene information at the scale level during the fusion process and, on this basis, achieve accurate recognition and positioning. The predicted results are promptly fed back to the subsequent tracking process, from which temporal templates are embedded. Starting from both location and time can effectively improve the adaptive ability of the tracking model. In the final experimental comparison results, the algorithm proposed in this paper can adapt well to the situation of object deformation, and the overall performance of the tracking model has also been improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. INTELLIGENT MONITORING OF THE PHYSIOLOGICAL STATE OF AGRICULTURAL PRODUCTS USING UAV.
- Author
-
KUZNETSOV, PAVEL, KOTELNIKOV, DMITRY, VORONIN, DMITRIY, EVSTIGNEEV, VLADYSLAV, YAKIMOVICH, BORIS, and KELEMEN, MICHAL
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,DRONE aircraft ,FARM produce ,PATIENT monitoring - Abstract
The article discusses the technology for automated neural network monitoring of the vineyard's physiological condition. Images of leaves, obtained using an unmanned aerial vehicle (UAV), are the main indicator of the physiological vineyard's condition. The proposed solution is based on the integrated use of convolutional neural network method (CNN) and machine vision technologies. To determine the optimal neural network (NN) model, a variant analysis was carried out. In accordance with its results, the YOLOv7 model was chosen, which satisfies the introduced time limit and provides the required detection quality. The training of the YOLOv7 neural network was implemented in the Python environment using the PyTorch framework and the OpenCV computer vision library. The dataset consisting of 6320 images of grape leaves (including healthy and diseased ones) has been used for neural network training. The obtained results showed that the detection accuracy is at least 91%. Visualization of monitoring results has been carried out using heatmap, allowing to obtain information about vineyard physiological condition in dynamics. The proposed mathematical model allows to calculate the monitored vineyard's area made by one complex per day. The obtained results showed that effective monitoring area using one DJI Phantom 4 UAV per day is 2.5 hectares. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. SiamAUDT: adaptive updating decision for online Siamese tracker.
- Author
-
Hu, Yaqing, Gao, Yun, and Zhang, Chi
- Subjects
EVALUATION methodology ,GENERALIZATION ,CONFIDENCE ,TRACKING algorithms ,COST - Abstract
Most Siamese trackers use the first frame tracked object as a fixed template. To adapt to the changes in object appearances, some trackers have explored "how to update" templates. Meanwhile, to suppress negative samples from contaminating templates, some trackers utilize fixed thresholds to determine "updating or not". To further improve the decision adaptability of "updating or not", we first propose an online Siamese tracking framework with an adaptive updating decision (SiamAUDT). The decision module (AUD) can adaptively determine "updating or not" according to the confidence evaluation of the current tracking result. Second, we define a metric to assess the peak deviation degree (PDD) in the current response map. Then, we design the confidence evaluation method via the fluctuation amplitude of the PDD. The evaluation does not need any additional parameters or any training costs. Furthermore, the proposed framework can be easily migrated to existing tracking algorithms. Finally, we evaluated the effectiveness of our proposed SiamAUDT on several benchmarks and verified the generalization ability of the framework under several Siamese trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Point of Interest Recognition and Tracking in Aerial Video during Live Cycling Broadcasts.
- Author
-
Vanhaeverbeke, Jelle, Decorte, Robbe, Slembrouck, Maarten, Hoecke, Sofie Van, and Verstockt, Steven
- Subjects
BICYCLE racing ,STREAMING video & television ,COMPUTER vision ,SPORTSMANSHIP ,SUPPLY & demand ,CYCLING competitions - Abstract
Road cycling races, such as the Tour de France, captivate millions of viewers globally, combining competitive sportsmanship with the promotion of regional landmarks. Traditionally, points of interest (POIs) are highlighted during broadcasts using manually created static overlays, a process that is both outdated and labor-intensive. This paper presents a novel, fully automated methodology for detecting and tracking POIs in live helicopter video streams, aiming to streamline the visualization workflow and enhance viewer engagement. Our approach integrates a saliency and Segment Anything-based technique to propose potential POI regions, which are then recognized using a keypoint matching method that requires only a few reference images. This system supports both automatic and semi-automatic operations, allowing video editors to intervene when necessary, thereby balancing automation with manual control. The proposed pipeline demonstrated high effectiveness, achieving over 75% precision and recall in POI detection, and offers two tracking solutions: a traditional MedianFlow tracker and an advanced SAM 2 tracker. While the former provides speed and simplicity, the latter delivers superior segmentation tracking, albeit with higher computational demands. Our findings suggest that this methodology significantly reduces manual workload and opens new possibilities for interactive visualizations, enhancing the live viewing experience of cycling races. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Hyperspectral Attention Network for Object Tracking.
- Author
-
Yu, Shuangjiang, Ni, Jianjun, Fu, Shuai, and Qu, Tao
- Subjects
- *
DILEMMA , *VIDEOS , *SPECTRAL imaging - Abstract
Hyperspectral video provides rich spatial and spectral information, which is crucial for object tracking in complex scenarios. Despite extensive research, existing methods often face an inherent trade-off between rich spectral information and redundant noisy information. This dilemma arises from the efficient utilization of hyperspectral image data channels. To alleviate this problem, this paper introduces a hierarchical spectral attention network for hyperspectral object tracking. We employ a spectral band attention mechanism with adaptive soft threshold to examine the correlations across spectral bands, which integrates the information available in various spectral bands and eliminates redundant information. Moreover, we integrate spectral attention into a hierarchical tracking network to improve the integration of spectral and spatial information. The experimental results on entire public hyperspectral competition dataset WHISPER2020 show the superior performance of our proposed method compared with that of several related methods in visual effects and objective evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Gaussian-based adaptive frame skipping for visual object tracking.
- Author
-
Gao, Fei, You, Shengzhe, Ge, Yisu, and Zhang, Shifeng
- Subjects
- *
TRACKING algorithms , *COMPUTER vision , *ALGORITHMS , *OBJECT tracking (Computer vision) , *VIDEO surveillance , *FORECASTING - Abstract
Visual object tracking is a basic computer vision problem, which has been greatly developed in recent years. Although the accuracy of object tracking algorithms has been improved, the efficiency of most trackers is hard to meet practical requirements, especially for devices with limited computational power. To improve visual object tracking efficiency with no or little loss of accuracy, a frame skipping method is proposed for correlation filter-based trackers, which includes an adaptive tracking-skipping algorithm and Gaussian-based movement prediction. According to the movement state of objects in the previous frames, the position of objects in the next frame can be predicted, and whether or not the tracking process should be skipped is determined by the predicted position. Experiments are conducted on both practical video surveillance and well-known public data sets to evaluate the proposed method. Experimental results show that the proposed method can almost double the tracking efficiency of correlation filter-based trackers with no or little accuracy loss. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression.
- Author
-
Zhang, Jianming, Chen, Wentao, He, Yufan, Kuang, Li-Dan, and Sangaiah, Arun Kumar
- Abstract
Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. 融合动态目标跟踪的视觉 SLAM 算法.
- Author
-
白克强, 朱亚兰, 杨秀清, 向勇, 邓子犇, and 姜官武
- Abstract
Copyright of Information & Control is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
16. Motion-aware object tracking for aerial images with deep features and discriminative correlation filter.
- Author
-
Delibaşoğlu, İbrahim
- Subjects
HUMAN-computer interaction ,DEEP learning ,SOURCE code ,OBJECT tracking (Computer vision) ,SAND ,LITERATURE ,TRACKING radar - Abstract
Object tracking is a challenging task which is required for different problems such as surveillance, traffic analysis and human-computer interaction. The problem of tracking an object can be considered in different categories such as single object tracking, multiple object tracking, short-term tracking, long-term tracking, tracking by detection and detection-free tracking. This study focuses on detection-free tracking for ground targets on aerial images. The studies in the literature show that correlation filter and deep learning based object trackers perform well recently. This paper proposes a new correlation filter-based tracker containing a strategy for re-detection issue. We improve the performance of correlation filter-based tracker by adding a lightweight re-detection ability to the correlation filter tracker in case of a long occlusion or complete loss of target. We use deep features to train Discriminative Correlation Filter(DCF) by integrating sub-networks from pre-trained ResNet and SAND models. The experimental results on the popular UAV123L dataset show that the proposed method(MADCF) improves the performance of DCF tracker and have a reasonable performance for long-term tracking problem. Moreover, we prepare a new tracking dataset (PESMOD tracking) consisting of UAV images, and we evaluate the proposed method and state-of-the-art method in this dataset. We observed that the proposed method performs much better in ground target tracking from VIVID and PESMOD aerial datasets. The proposed MADCF tracker performs better for small targets tracked by UAVs compared to the deep learning-based trackers. The source code and prepared dataset are available at http://github.com/mribrahim/MADCF [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. 基于多阶段特征和非对称-膨胀卷积的目标跟踪.
- Author
-
孙 波, 杨春成, 徐 立, 尚海滨, and 余帅良
- Subjects
- *
CONVOLUTIONAL neural networks , *NATURAL language processing , *COMPUTER vision , *FEATURE extraction , *VISUAL fields - Abstract
Objectives: Object tracking is a research focus in the field of computer vision. The method based on correlation filters performs well in object tracking, but artificial feature description of images has certain limitations in the process of feature extraction. Convolutional neural network (CNN) has been widely used in computer vision, natural language processing and other fields, and they can tune the weights of network parameters by learning training samples to extract depth features of images. In order to obtain more robust feature expression of images, CNN is used to extract the features of images in object tracking. Methods: Combining CNN with correlation filters, we propose an object tracking method based on multi-stage features and asymmetric-dilated convolution. The ResNet50 network embedded with asymmetric-dilated convolution block is used as the network of feature extraction and it can respectively output the feature maps from multiple stages of the network for correlation filters to achieve object detection and localization. Results: The proposed method is tested on OTB100 video dataset. The distance precision can reach 85.38% if the distance threshold is set as 20 pixels, and the overlap precision can reach 80.42% if the overlap threshold is set as 50%. Conclusions: The experimental results verify the accuracy of the proposed method which is relatively robust under certain conditions such as complexity background, occlusion and rotational deformation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. KISS—Keep It Static SLAMMOT—The Cost of Integrating Moving Object Tracking into an EKF-SLAM Algorithm.
- Author
-
Mandel, Nicolas, Kompe, Nils, Gerwin, Moritz, and Ernst, Floris
- Subjects
- *
TRACKING algorithms , *RESEARCH personnel , *DYNAMIC models , *ROBOTICS , *ALGORITHMS - Abstract
The treatment of moving objects in simultaneous localization and mapping (SLAM) is a key challenge in contemporary robotics. In this paper, we propose an extension of the EKF-SLAM algorithm that incorporates moving objects into the estimation process, which we term KISS. We have extended the robotic vision toolbox to analyze the influence of moving objects in simulations. Two linear and one nonlinear motion models are used to represent the moving objects. The observation model remains the same for all objects. The proposed model is evaluated against an implementation of the state-of-the-art formulation for moving object tracking, DATMO. We investigate increasing numbers of static landmarks and dynamic objects to demonstrate the impact on the algorithm and compare it with cases where a moving object is mistakenly integrated as a static landmark (false negative) and a static landmark as a moving object (false positive). In practice, distances to dynamic objects are important, and we propose the safety–distance–error metric to evaluate the difference between the true and estimated distances to a dynamic object. The results show that false positives have a negligible impact on map distortion and ATE with increasing static landmarks, while false negatives significantly distort maps and degrade performance metrics. Explicitly modeling dynamic objects not only performs comparably in terms of map distortion and ATE but also enables more accurate tracking of dynamic objects with a lower safety–distance–error than DATMO. We recommend that researchers model objects with uncertain motion using a simple constant position model, hence we name our contribution Keep it Static SLAMMOT. We hope this work will provide valuable data points and insights for future research into integrating moving objects into SLAM algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Fog-Assisted Abnormal Motion Detection System: A Semantic Ontology Approach.
- Author
-
Amshavalli, R. S. and Kalaivani, J.
- Subjects
- *
STREAMING video & television , *VIDEO surveillance , *TRACKING algorithms , *KALMAN filtering , *ACCELERATION (Mechanics) , *ONTOLOGIES (Information retrieval) - Abstract
The growing concern over high-profile violence has led to a widespread adoption of intelligent smart video surveillance systems in educational institutions. However, accurately identifying abnormal events in the video stream remains a complex task due to the absence of a definitive generic definition for abnormal actions, heavily reliant on contextual factors. In this paper, we propose a fog-assisted abnormal motion recognition system utilizing an ontology-based semantic algorithm to bolster security within university campuses under real-time surveillance. Our ontology-based semantic analysis is meticulously designed to characterize intricate spatio-temporal interactions and contextual relationships in the video scene, thereby significantly improving the sensitivity in distinguishing abnormal events based on their severity level. To ensure swift responses to specific abnormal events, the video streams captured through surveillance cameras are processed at the edge of the network. Prior to semantic analysis, two crucial steps—foreground object segmentation and object tracking—are executed to streamline the detection process. These steps involve segmenting the target foreground using a connected components labeling algorithm and tracking motion patterns based on velocity, direction, and acceleration, employing the Kalman filter. The rule-based reasoning of the ontology accurately defines abnormal conditions in the video scene, providing a clearer understanding of the decision-making process. Furthermore, we incorporate a context-aware refinement step aimed at enhancing detection accuracy. This step distinguishes abnormal events based on their severity and generates corresponding alerts. We conducted extensive experiments and evaluated the proposed system using various measures, demonstrating its potential in real-time video analysis. The results showcase an impressive prediction success ratio of 0.989, affirming the reliability and robustness of our system in detecting abnormal events. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Deep Learning Model-Based Real-Time Inspection System for Foreign Particles inside Flexible Fluid Bags.
- Author
-
Lim, Chae Whan and Son, Kwang Chul
- Subjects
DEEP learning ,MASS production ,IMAGE processing ,SPATIAL resolution ,FLUIDS - Abstract
Intravenous fluid bags are essential in hospitals, but foreign particles can contaminate them during mass production, posing significant risks. Although produced in sanitary environments, contamination can cause severe problems if products reach consumers. Traditional inspection methods struggle with the flexible nature of these bags, which deform easily, complicating particle detection. Recent deep learning advancements offer promising solutions in regard to quality inspection, but high-resolution image processing remains challenging. This paper introduces a real-time deep learning-based inspection system addressing bag deformation and memory constraints for high-resolution images. The system uses object-level background rejection, filtering out objects similar to the background to isolate moving foreign particles. To further enhance performance, the method aggregates object patches, reducing unnecessary data and preserving spatial resolution for accurate detection. During aggregation, candidate objects are tracked across frames, forming tracks re-identified as bubbles or particles by the deep learning model. Ensemble detection results provide robust final decisions. Experiments demonstrate that this system effectively detects particles in real-time with over 98% accuracy, leveraging deep learning advancements to tackle the complexities of inspecting flexible fluid bags. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios.
- Author
-
Nguyen, Tuan T., Nguyen, Hoang H., Sartipi, Mina, and Fisichella, Marco
- Subjects
GRAPH neural networks ,LANGUAGE models ,INTELLIGENT transportation systems ,TRACKING algorithms ,SCARCITY ,CAMERAS - Abstract
Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Visual Detection of Traffic Incident through Automatic Monitoring of Vehicle Activities.
- Author
-
Karim, Abdul, Raza, Muhammad Amir, Alharthi, Yahya Z., Abbas, Ghulam, Othmen, Salwa, Hossain, Md. Shouquat, Nahar, Afroza, and Mercorelli, Paolo
- Subjects
INTELLIGENT transportation systems ,SUSTAINABLE transportation ,COMPUTER vision ,TRAFFIC patterns ,CITY traffic - Abstract
Intelligent transportation systems (ITSs) derive significant advantages from advanced models like YOLOv8, which excel in predicting traffic incidents in dynamic urban environments. Roboflow plays a crucial role in organizing and preparing image data essential for computer vision models. Initially, a dataset of 1000 images is utilized for training, with an additional 500 images reserved for validation purposes. Subsequently, the Deep Simple Online and Real-time Tracking (Deep-SORT) algorithm enhances scene analyses over time, offering continuous monitoring of vehicle behavior. Following this, the YOLOv8 model is deployed to detect specific traffic incidents effectively. By combining YOLOv8 with Deep SORT, urban traffic patterns are accurately detected and analyzed with high precision. The findings demonstrate that YOLOv8 achieves an accuracy of 98.4%, significantly surpassing alternative methodologies. Moreover, the proposed approach exhibits outstanding performance in the recall (97.2%), precision (98.5%), and F1 score (95.7%), underscoring its superior capability in accurate prediction and analyses of traffic incidents with high precision and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A Novel Approach for Privacy Preserving Object Re-Identification on Edge Devices
- Author
-
Robert Kathrein, Oliver Zeilerbauer, Johannes Georg Larcher, and Mario Döller
- Subjects
data privacy ,object location ,spatial encoding ,object tracking ,object re-identification ,Telecommunication ,TK5101-6720 - Abstract
Computer vision approaches have been widely used in mobility tasks such as visitor counting, traffic analyisis, etc. The European General Data Protection Regulation (GDPR) enforces in-camera processing as storing and transmitting such data violates this regulation. This paper introduces a novel approach for object Re-Identification (Re-ID) on edge devices using a color based encoded virtual plane for location mapping. The method leverages the spatial coding capabilities of the RGB color space to simplify the localisation process. By assigning unique RGB values to spatial coordinates, creating a multidimensional reference image that facilitates instant and accurate object localisation. This reduces computational complexity and allows global referencing across multiple cameras. We present an algorithmic framework for location mapping and demonstrating its capability through experimental validation. The techniques potential is further explored in applications such as object Re-ID, marking a significant advancement in computer vision and expanding the branch of spatial encoding methodologies. This approach represents a shift towards more privacy-oriented multi camera object tracking and Re-ID solutions.
- Published
- 2024
- Full Text
- View/download PDF
24. 3D Point Cloud Object Tracking Based on Multi-level Fusion of Transformer Features
- Author
-
LI Zhijie, LIANG Bowen, DING Xinmiao, GUO Wen
- Subjects
3d point cloud ,siamese network ,object tracking ,transformer ,feature fusion ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
During the 3D point cloud object tracking, some issues such as occlusion, sparsity, and random noise often arise. To address these challenges, this paper proposes a novel approach to 3D point cloud object tracking based on multi-level fusion of Transformer features. The method mainly consists of the point attention embedding module and the point attention enhancement module, which are used for feature extraction and feature matching processes, respectively. Firstly, by embedding two attention mechanisms into each other to form the point attention embedding module and fusing it with the relationship-aware sampling method proposed by PTTR (point relation transformer for tracking), the purpose of fully extracting features is achieved. Subsequently, the feature information is input into the point attention enhancement module, and through cross-attention, features from different levels are matched sequentially to achieve the goal of deep fusion of global and local features. Moreover, to obtain discriminative feature fusion maps, a residual network is employed to connect the fusion results from different layers. Finally, the feature fusion map is input into the target prediction module to achieve precise prediction of the final 3D target object. Experimental validation on KITTI, nuScenes, and Waymo datasets demonstrates the effectiveness of the proposed method. Excluding few-shot data, the proposed method achieves an average improvement of 1.4 percentage points in success and 1.4 percentage points in precision in terms of object tracking.
- Published
- 2024
- Full Text
- View/download PDF
25. Enhancing automatic license plate recognition in Indian scenarios.
- Author
-
Samaga, Abhinav, Lobo, Allen Joel, Nasreen, Azra, Pattar, Ramakanth Kumar, Trivedi, Neeta, Raj, Peehu, and Sreelakshmi, Koratagere
- Abstract
Automatic license plate recognition (ALPR) technology has gained widespread use in many countries, including India. With the explosion in the number of vehicles plying over the roads in the past few years, automating the process of documenting vehicle license plates for use by law enforcement agencies and traffic management authorities has great significance. There have been various advancements in the object detection, object tracking, and optical character recognition domain but integrated pipelines for ALPR in Indian scenarios are a rare occurrence. This paper proposes an architecture that can track vehicles across multiple frames, detect number plates and perform optical character recognition (OCR) on them. A dataset consisting of Indian vehicles for the detection of oblique license plates is collected and a framework to increase the accuracy of OCR using the data across multiple frames is proposed. The proposed system can record license plate readings of vehicles averaging 527.99 and 2157.09 ms per frame using graphics processing unit (GPU) and central processing unit (CPU) respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Improved multi object tracking with locality sensitive hashing.
- Author
-
Chemmanam, Ajai John, Jose, Bijoy, and Moopan, Asif
- Abstract
Object tracking is one of the most advanced applications of computer vision algorithms. While various tracking approaches have been previously developed, they often use many approximations and assumptions to enable real-time performance within the resource constraints in terms of memory, time and computational requirements. In order to address these limitations, we investigate the bottlenecks of existing tracking frameworks and propose a solution to enhance tracking efficiency. The proposed method uses Locality Sensitive Hashing (LSH) to efficiently store and retrieve nearest neighbours and then utilizes a bipartite cost matching based on the predicted positions, size, aspect ratio, appearance description, and uncertainty in motion estimation. The LSH algorithm helps reduce the dimensionality of the data while preserving their relative distances. LSH hashes the features in constant time and facilitates rapid nearest neighbour retrieval by considering features falling into the same hash buckets as similar. The effectiveness of the method was evaluated on the MOT benchmark dataset and achieved Multiple Object Tracker Accuracy (MOTA) of 67.1% (train) and 62.7% (test). Furthermore, our framework exhibits the highest Multiple Object Tracker Precision (MOTP), mostly tracked objects, and the lowest values for mostly lost objects and identity switches among the state-of-the-art trackers. The incorporation of LSH implementation reduced identity switches by approximately 7% and fragmentation by around 13%. We used the framework for real-time tracking applications on edge devices for an industry partner. We found that the LSH integration resulted in a notable reduction in track ID switching, with only a marginal increase in computation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Hybrid Online Visual Tracking of Non-rigid Objects.
- Author
-
Bagherzadeh, Mohammad Amin, Seyedarabi, Hadi, and Razavi, Seyed Naser
- Subjects
- *
COMPUTER vision , *VISUAL learning , *ONLINE education , *JOB performance , *DETECTORS - Abstract
Visual object tracking has been a fundamental topic of machine vision in recent years. Most trackers can hardly top the performance and work in real time. This paper presents a tracking framework based on the SiamFC network, which can be taught online from the beginning of tracking and is real time. SiamFC network has a high tracking speed but cannot be trained online. This limitation made it unable to track the target for a long time. Hybrid-Siam can be trained online to distinguish target and background by switching traditional tracking and deep learning methods. Using the traditional tracking method and a target detector based on saliency detection has led to long-term tracking. Our method runs at more than 60 frame per second during test time and achieves state-of-the-art performance on tracking benchmarks, while robust results for long-term tracking. Hybrid-Siam improves SiamFC and achieves AUC score 81.7% on LaSOT, 72.3% on OTB100, and average overlap of 66.2% on GOT-10 k. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Enhancing visual monitoring via multi-feature fusion and template update strategies.
- Author
-
Rafique, Fahad, Zheng, Liying, Benarab, Acheraf, and Javed, Muhammad Hafeez
- Abstract
Recent advancements in computer vision, particularly deep learning, have significantly influenced visual monitoring across varied scenes. However, traditional machine learning approaches, particularly those based on correlation filtering (CF), remain valuable due to their efficiency in data collection, lower computational needs and improved explain ability. While CF-based tracking methods have become popular for analyzing complex scenes, they often rely on single features, limiting their ability to capture dynamic target appearances and resulting in inaccurate target tracking. Traditional template update techniques might also result in low accuracy and inaccurate feature extraction. In contrast, we introduces a location fusion mechanism incorporating multiple feature information streams to improve real-time monitoring in complex scenes. These strategies periodically extract four types of features and fuse their response maps, ensuring robust target tracking with high accuracy. Further innovations, such as dynamic spatial regularization and a multi-memory tracking framework, enable filters to focus on more reliable regions and suppress response deviations across consecutive frames. On the basis of confidence score a novel template update, storage and retrieval mechanism is implemented. Extensive testing across datasets like OTB100, VOT2016 and VOT2018 confirms that these integrated approaches outperform 26 state-of-the-art algorithms by balancing tracking success and computational efficiency in complex scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Four mathematical modeling forms for correlation filter object tracking algorithms and the fast calculation for the filter
- Author
-
Yingpin Chen and Kaiwei Chen
- Subjects
correlation filter ,object tracking ,diagonalization of circulant matrix ,convolution operator ,correlation operator ,Mathematics ,QA1-939 ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
The correlation filter object tracking algorithm has gained extensive attention from scholars in the field of tracking because of its excellent tracking performance and efficiency. However, the mathematical modeling relationships of correlation filter tracking frameworks are unclear. Therefore, many forms of correlation filters are susceptible to confusion and misuse. To solve these problems, we attempted to review various forms of the correlation filter and discussed their intrinsic connections. First, we reviewed the basic definitions of the circulant matrix, convolution, and correlation operations. Then, the relationship among the three operations was discussed. Considering this, four mathematical modeling forms of correlation filter object tracking from the literature were listed, and the equivalence of the four modeling forms was theoretically proven. Then, the fast solution of the correlation filter was discussed from the perspective of the diagonalization property of the circulant matrix and the convolution theorem. In addition, we delved into the difference between the one-dimensional and two-dimensional correlation filter responses as well as the reasons for their generation. Numerical experiments were conducted to verify the proposed perspectives. The results showed that the filters calculated based on the diagonalization property and the convolution property of the cyclic matrix were completely equivalent. The experimental code of this paper is available at https://github.com/110500617/Correlation-filter/tree/main.
- Published
- 2024
- Full Text
- View/download PDF
30. A transformer‐based lightweight method for multiple‐object tracking
- Author
-
Qin Wan, Zhu Ge, Yang Yang, Xuejun Shen, Hang Zhong, Hui Zhang, Yaonan Wang, and Di Wu
- Subjects
computer vision ,object detection ,object tracking ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract At present, the multi‐object tracking method based on transformer generally uses its powerful self‐attention mechanism and global modelling ability to improve the accuracy of object tracking. However, most existing methods excessively rely on hardware devices, leading to an inconsistency between accuracy and speed in practical applications. Therefore, a lightweight transformer joint position awareness algorithm is proposed to solve the above problems. Firstly, a joint attention module to enhance the ShuffleNet V2 network is proposed. This module comprises the spatio‐temporal pyramid module and the convolutional block attention module. The spatio‐temporal pyramid module fuses multi‐scale features to capture information on different spatial and temporal scales. The convolutional block attention module aggregates channel and spatial dimension information to enhance the representation ability of the model. Then, a position encoding generator module and a dynamic template update strategy are proposed to solve the occlusion. Group convolution is adopted in the input sequence through position encoding generator module, with each convolution group responsible for handling the relative positional relationships of a specific range. In order to improve the reliability of the template, dynamic template update strategy is used to update the template at the appropriate time. The effectiveness of the approach is validated on the MOT16, MOT17, and MOT20 datasets.
- Published
- 2024
- Full Text
- View/download PDF
31. Ant algorithms for finding weighted and unweighted maximum cliques in d-division graphs
- Author
-
Krzysztof Schiff
- Subjects
d-division graph ,ant algorithm ,maximum clique ,machine control ,object tracking ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Telecommunication ,TK5101-6720 - Abstract
This article deals with the problem of finding the maximum number of maximum cliques in a weighted graph with all edges between vertices from different d-division of a graph with the minimum total weight of all these cliques, and the problem of finding the maximum number of maximum cliques in a nonweighted graph with not all edges between vertices from different d-division of the graph. This article presents new ant algorithms with new desire functions for these problems. These algorithms were tested for their purpose with different changing input parameters, the test results were tabulated and discussed, the best algorithms were indicated.
- Published
- 2024
- Full Text
- View/download PDF
32. Probabilistic 3D motion model for object tracking in aerial applications
- Author
-
Seyed Hojat Mirtajadini, MohammadAli Amiri Atashgah, and Mohammad Shahbazi
- Subjects
computer vision ,motion estimation ,object tracking ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Visual object tracking, crucial in aerial applications such as surveillance, cinematography, and chasing, faces challenges despite AI advancements. Current solutions lack full reliability, leading to common tracking failures in the presence of fast motions or long‐term occlusions of the subject. To tackle this issue, a 3D motion model is proposed that employs camera/vehicle states to locate a subject in the inertial coordinates. Next, a probability distribution is generated over future trajectories and they are sampled using a Monte Carlo technique to provide search regions that are fed into an online appearance learning process. This 3D motion model incorporates machine‐learning approaches for direct range estimation from monocular images. The model adapts computationally by adjusting search areas based on tracking confidence. It is integrated into DiMP, an online and deep learning‐based appearance model. The resulting tracker is evaluated on the VIOT dataset with sequences of both images and camera states, achieving a 68.9% tracking precision compared to DiMP's 49.7%. This approach demonstrates increased tracking duration, improved recovery after occlusions, and faster motions. Additionally, this strategy outperforms random searches by about 3.0%.
- Published
- 2024
- Full Text
- View/download PDF
33. Spatial feature embedding for robust visual object tracking
- Author
-
Kang Liu, Long Liu, Shangqi Yang, and Zhihao Fu
- Subjects
computer vision ,distance learning ,image motion analysis ,object tracking ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Recently, the offline‐trained Siamese pipeline has drawn wide attention due to its outstanding tracking performance. However, the existing Siamese trackers utilise offline training to extract ‘universal’ features, which is insufficient to effectively distinguish between the target and fluctuating interference in embedding the information of the two branches, leading to inaccurate classification and localisation. In addition, the Siamese trackers employ a pre‐defined scale for cropping the search candidate region based on the previous frame's result, which might easily introduce redundant background noise (clutter, similar objects etc.), affecting the tracker's robustness. To solve these problems, the authors propose two novel sub‐network spatial employed to spatial feature embedding for robust object tracking. Specifically, the proposed spatial remapping (SRM) network enhances the feature discrepancy between target and distractor categories by online remapping, and improves the discriminant ability of the tracker on the embedding space. The MAML is used to optimise the SRM network to ensure its adaptability to complex tracking scenarios. Moreover, a temporal information proposal‐guided (TPG) network that utilises a GRU model to dynamically predict the search scale based on temporal motion states to reduce potential background interference is introduced. The proposed two network is integrated into two popular trackers, namely SiamFC++ and TransT, which achieve superior performance on six challenging benchmarks, including OTB100, VOT2019, UAV123, GOT10K, TrackingNet and LaSOT, TrackingNet and LaSOT denoting them as SiamSRMC and SiamSRMT, respectively. Moreover, the proposed trackers obtain competitive tracking performance compared with the state‐of‐the‐art trackers in the attribute of background clutter and similar object, validating the effectiveness of our method.
- Published
- 2024
- Full Text
- View/download PDF
34. SiamEFT: adaptive-time feature extraction hybrid network for RGBE multi-domain object tracking.
- Author
-
Shuqi Liu, Gang Wang, Yong Song, Jinxiang Huang, Yiqian Huang, Ya Zhou, and Shiqiang Wang
- Subjects
ARTIFICIAL neural networks ,FEATURE extraction ,CAMERAS - Abstract
Integrating RGB and Event (RGBE) multi-domain information obtained by high-dynamic-range and temporal-resolution event cameras has been considered an effective scheme for robust object tracking. However, existing RGBE tracking methods have overlooked the unique spatio-temporal features over different domains, leading to object tracking failure and ineffeciency, especally for objects against complex backgrounds. To address this problem, we propose a novel tracker based on adaptive-time feature extraction hybrid networks, namely Siamese Event Frame Tracker (SiamEFT), which focuses on the effective representation and utilization of the diverse spatio-temporal features of RGBE. We first design an adaptive-time attention module to aggregate event data into frames based on adaptive-time weights to enhance information representation. Subsequently, the SiamEF module and cross-network fusion module combining artificial neural networks and spiking neural networks hybrid network are designed to effectively extract and fuse the spatio-temporal features of RGBE. Extensive experiments on two RGBE datasets (VisEvent and COESOT) show that the SiamEFT achieves a success rate of 0.456 and 0.574, outperforming the state-of-the-art competing methods and exhibiting a 2.3-fold enhancement in effeciency. These results validate the superior accuracy and effeciency of SiamEFT in diverse and challenging scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. FQTrack:Object Tracking Method Based on a Feature-Enhanced Memory Network and Memory Quality Selection Mechanism.
- Author
-
Zhang, Jianwei, Zhang, Mengya, Zhang, Huanlong, Cai, Zengyu, and Zhu, Liang
- Subjects
VIRTUAL reality ,TIME-varying networks ,MEMORY ,OBJECT tracking (Computer vision) - Abstract
Visual object tracking technology is widely used in intelligent security, automatic driving and other fields, and also plays an important role in frontier fields such as human–computer interactions and virtual reality. The memory network improves the stability and accuracy of tracking by using historical frame information to assist in the positioning of the current frame in object tracking. However, the memory network is still insufficient in feature mining and the accuracy and robustness of the model may be reduced when using noisy observation samples to update it. In view of the above problems, we propose a new tracking framework, which uses the attention mechanism to establish a feature-enhanced memory network and combines cross-attention to aggregate the spatial and temporal context information of the target. The former introduces spatio-temporal adaptive attention and cross-spatial attention, embeds spatial location information into channels, realizes multi-scale feature fusion, dynamically emphasizes target location information, and obtains richer feature maps. The latter guides the tracker to focus on the area with the largest amount of information in the current frame to better distinguish the foreground and background. In addition, through the memory quality selection mechanism, the accuracy and richness of the feature samples are improved, thereby enhancing the adaptability and discrimination ability of the tracking model. Experiments on benchmark test sets such as OTB2015, TrackingNet, GOT-10k, LaSOT and UAV 123 show that this method achieves comparable performance with advanced trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. GCAT: graph calibration attention transformer for robust object tracking.
- Author
-
Chen, Si, Hu, Xinxin, Wang, Da-Han, Yan, Yan, and Zhu, Shunzhi
- Subjects
- *
CALIBRATION , *GENERALIZATION , *NOISE - Abstract
Recent Siamese trackers have taken advantage of transformers to achieve impressive advancements. However, existing transformer trackers ignore considering the positional and structural information between tokens, and traditional template update strategies easily introduce noises to the dynamic templates during tracking. In order to alleviate this issue, this paper develops a novel end-to-end graph calibration attention transformer network (GCAT) to enhance tracking robustness and accuracy. A graph calibration attention mechanism is first designed to calibrate and aggregate template information, for effectively updating dynamic templates during the tracking process. Specifically, each token is considered as a node in the graph, and then, we calculate the weight relationships between each node and their adjacent nodes. Thus, this mechanism can aggregate the global context information of the template and search nodes and activate feature channels based on weights and biases to obtain more discriminative feature information. Moreover, we leverage a multi-level dropout mechanism to perform the data dropout, the layer dropout, and the feature dropout on the data, network, and attention levels, respectively, to avoid overfitting of local-specific information and improve the generalization ability. Extensive experiments show the proposed method achieves superior performance on seven challenging benchmark datasets, i.e., OTB100, OTB2013, UAV123, LaSOT, GOT10K, VOT2020, and TrackingNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Impact of Perception Errors in Vision-Based Detection and Tracking Pipelines on Pedestrian Trajectory Prediction in Autonomous Driving Systems.
- Author
-
Chen, Wen-Hui, Wu, Jiann-Cherng, Davydov, Yury, Yeh, Wei-Chen, and Lin, Yu-Chen
- Subjects
- *
OBJECT recognition (Computer vision) , *AUTONOMOUS vehicles , *FORECASTING , *ALGORITHMS - Abstract
Pedestrian trajectory prediction is crucial for developing collision avoidance algorithms in autonomous driving systems, aiming to predict the future movement of the detected pedestrians based on their past trajectories. The traditional methods for pedestrian trajectory prediction involve a sequence of tasks, including detection and tracking to gather the historical movement of the observed pedestrians. Consequently, the accuracy of trajectory prediction heavily relies on the accuracy of the detection and tracking models, making it susceptible to their performance. The prior research in trajectory prediction has mainly assessed the model performance using public datasets, which often overlook the errors originating from detection and tracking models. This oversight fails to capture the real-world scenario of inevitable detection and tracking inaccuracies. In this study, we investigate the cumulative effect of errors within integrated detection, tracking, and trajectory prediction pipelines. Through empirical analysis, we examine the errors introduced at each stage of the pipeline and assess their collective impact on the trajectory prediction accuracy. We evaluate these models across various custom datasets collected in Taiwan to provide a comprehensive assessment. Our analysis of the results derived from these integrated pipelines illuminates the significant influence of detection and tracking errors on downstream tasks, such as trajectory prediction and distance estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Query-Based Object Visual Tracking with Parallel Sequence Generation.
- Author
-
Liu, Chang, Zhang, Bin, Bo, Chunjuan, and Wang, Dong
- Subjects
- *
SPINE , *SPEED , *PLAINS - Abstract
Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. UAV Tracking via Saliency-Aware and Spatial–Temporal Regularization Correlation Filter Learning.
- Author
-
Liu, Liqiang, Feng, Tiantian, Fu, Yanfang, Yang, Lingling, Cai, Dongmei, and Cao, Zijian
- Subjects
- *
DISCRETE Fourier transforms , *DRONE aircraft , *SYMMETRY , *TRACKING algorithms - Abstract
Due to their great balance between excellent performance and high efficiency, discriminative correlation filter (DCF) tracking methods for unmanned aerial vehicles (UAVs) have gained much attention. Due to these correlations being capable of being efficiently computed in a Fourier domain by discrete Fourier transform (DFT), the DFT of an image has symmetry in the Fourier domain. However, DCF tracking methods easily generate unwanted boundary effects where the tracking object suffers from challenging situations, such as deformation, fast motion and occlusion. To tackle the above issue, this work proposes a novel saliency-aware and spatial–temporal regularized correlation filter (SSTCF) model for visual object tracking. First, the introduced spatial–temporal regularization helps build a more robust correlation filter (CF) and improve the temporal continuity and consistency of the model to effectively lower boundary effects and enhance tracking performance. In addition, the relevant objective function can be optimized into three closed-form subproblems which can be addressed by using the alternating direction method of multipliers (ADMM) competently. Furthermore, utilizing a saliency detection method to acquire a saliency-aware weight enables the tracker to adjust to variations in appearance and mitigate disturbances from the surroundings environment. Finally, we conducted numerous experiments based on three different benchmarks, and the results showed that our proposed model had better performance and higher efficiency compared to the most advanced trackers. For example, the distance precision (DP) score was 0.883, and the area under the curve (AUC) score was 0.676 on the OTB2015 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. AI-ENHANCED TRACKSEGNET AN ADVANCED MACHINE LEARNING TECHNIQUE FOR VIDEO SEGMENTATION AND OBJECT TRACKING.
- Author
-
Kushwah, Jitendra Singh, Dave, Maitriben Harshadbhai, Sharma, Ankita, Shrivastava, Keerti, Sharma, Rajeev, and Ahmed, Mohammad Nadeem
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,DEEP learning ,MACHINE learning ,FEATURE extraction ,OBJECT tracking (Computer vision) - Abstract
Video segmentation and object tracking are critical tasks in computer vision with applications spanning surveillance, autonomous driving, and interactive media. Traditional methods often struggle with the dynamic nature of video data, where object occlusions, variations in illumination, and complex motion patterns present significant challenges. Existing segmentation and tracking systems frequently suffer from inaccuracies in handling real-time video sequences, particularly in distinguishing and tracking multiple overlapping objects. The limitations of current models in addressing these issues necessitate the development of more advanced techniques that can effectively manage dynamic scenes and improve tracking accuracy. To address these challenges, we propose an advanced machine learning technique, AI-Enhanced TrackSegNet, which integrates deep learning with novel attention mechanisms for improved video segmentation and object tracking. Our method utilizes a combination of Convolutional Neural Networks (CNNs) for feature extraction and Long Short-Term Memory (LSTM) networks for temporal sequence modeling. We introduce an attention-based mechanism to dynamically focus on relevant features, enhancing the model's ability to handle occlusions and varying object appearances. The model was trained on a diverse dataset of video sequences, incorporating both synthetic and real-world footage. The AI-Enhanced TrackSegNet demonstrated significant improvements in performance compared to existing techniques. Our method achieved an average Intersection over Union (IoU) score of 86.7% for segmentation and a tracking precision rate of 91.3% on the MOT17 benchmark dataset. These results represent a 10.2% improvement in IoU and a 7.5% increase in tracking precision compared to state-of-the-art methods. The model also exhibited enhanced robustness in complex scenes, handling occlusions and motion variations with greater accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Meta Learning based Object Tracking Technology: A Survey.
- Author
-
Ji-Won Baek and Kyungyong Chung
- Abstract
Recently, image analysis research has been actively conducted due to the accumulation of big image data and the development of deep learning. Image analytics research has different characteristics from other data such as data size, real-time, image quality diversity, structural complexity, and security issues. In addition, a large amount of data is required to effectively analyze images with deep-learning models. However, in many fields, the data that can be collected is limited, so there is a need for meta learning based image analysis technology that can effectively train models with a small amount of data. This paper presents a comprehensive survey of meta-learning-based object-tracking techniques. This approach comprehensively explores object tracking methods and research that can achieve high performance in data-limited situations, including key challenges and future directions. It provides useful information for researchers in the field and can provide insights into future research directions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. 基于双分支在线优化和特征融合的 视频目标跟踪算法.
- Author
-
李新鹏, 王 鹏, 李晓艳, 孙梦宇, 陈遵田, and 郜 辉
- Subjects
TRACKING algorithms ,ELECTRONIC equipment ,TEST reliability ,RELIABILITY in engineering ,RESEARCH institutes - Abstract
Copyright of Chinese Journal of Liquid Crystal & Displays is the property of Chinese Journal of Liquid Crystal & Displays and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
43. A Novel Three-Stage Collision-Risk Pre-Warning Model for Construction Vehicles and Workers.
- Author
-
Gan, Wenxia, Gu, Kedi, Geng, Jing, Qiu, Canzhi, Yang, Ruqin, Wang, Huini, and Hu, Xiaodi
- Subjects
BUILDING sites ,CONSTRUCTION workers ,COMPUTER vision ,PREDICTION models ,ACQUISITION of data ,WARNINGS - Abstract
Collision accidents involving construction vehicles and workers frequently occur at construction sites. Computer vision (CV) technology presents an efficient solution for collision-risk pre-warning. However, CV-based methods are still relatively rare and need an enhancement of their performance. Therefore, a novel three-stage collision-risk pre-warning model for construction vehicles and workers is proposed in this paper. This model consists of an object-sensing module (OSM), a trajectory prediction module (TPM), and a collision-risk assessment module (CRAM). In the OSM, the YOLOv5 algorithm is applied to identify and locate construction vehicles and workers; meanwhile, the DeepSORT algorithm is applied to the real-time tracking of the construction vehicles and workers. As a result, the historical trajectories of vehicles and workers are sensed. The original coordinates of the data are transformed to common real-world coordinate systems for convenient subsequent data acquisition, comparison, and analysis. Subsequently, the data are provided to a second stage (TPM). In the TPM, the optimized transformer algorithm is used for a real-time trajectory prediction of the construction vehicles and workers. In this paper, we enhance the reliability of the general object detection and trajectory prediction methods in the construction environments. With the assistance afforded by the optimization of the model's hyperparameters, the prediction horizon is extended, and this gives the workers more time to take preventive measures. Finally, the prediction module indicates the possible trajectories of the vehicles and workers in the future and provides these trajectories to the CRAM. In the CRAM, the worker's collision-risk level is assessed by a multi-factor-based collision-risk assessment rule, which is innovatively proposed in the present work. The multi-factor-based assessment rule is quantitatively involved in three critical risk factors, i.e., velocity, hazardous zones, and proximity. Experiments are performed within two different construction site scenarios to evaluate the effectiveness of the collision-risk pre-warning model. The research results show that the proposed collision pre-warning model can accurately predict the collision-risk level of workers at construction sites, with good tracking and predicting effect and an efficient collision-risk pre-warning strategy. Compared to the classical models, such as social-GAN and social-LSTM, the transformer-based trajectory prediction model demonstrates a superior accuracy, with an average displacement error of 0.53 m on the construction sites. Additionally, the optimized transformer model is capable of predicting six additional time steps, which equates to approximately 1.8 s. The collision pre-warning model proposed in this paper can help improve the safety of construction vehicles and workers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. A Diffusion Neural Network-Enhanced Object Tracking Approach Under Sports Scenarios.
- Author
-
Li, Wen, Hu, Yingyue, Liu, Ruixiang, Shi, Xiaofeng, and Su, Xuefeng
- Subjects
- *
NEURAL computers , *COMPUTER vision , *VISUAL fields , *DEEP learning , *SPORTS , *OBJECT tracking (Computer vision) - Abstract
Object tracking for motion scenes is a common research concern in field of computer vision. Its goal is to accurately track targets in different time periods and predict their future states by utilizing the motion information in video sequences. However, traditional target-tracking methods in motion scenes often face challenges such as target blur, occlusion, and changes in lighting. To deal with this issue, this paper proposes a diffusion neural network-enhanced object-tracking approach under sports scenarios. In order to further improve tracking performance, the diffusion convolution operation is introduced, which propagates features at different time steps to enhance the modeling ability of target motion. Then, suitable influencing factors are selected based on motion scene object feature parameters. Finally, a target tracking method is established by integrating these two methods. In the experiment, we used a large number of real motion scene datasets to evaluate the proposed method. The experimental results show that by comparing with traditional moving object tracking methods, the proposal achieves significant improvement in tracking accuracy and methodology robustness. In addition, we also conducted stability experiments, proving that this method has good stability for models with varying kernel numbers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A computer vision approach to vehicle detection, classification, and tracking from UAV data for Indian traffic analysis.
- Author
-
Rathod, Vaishnavee V., Rana, Dipti P., Mehta, Rupa G., and Nath, Vijay
- Subjects
- *
COMPUTER vision , *TRAFFIC monitoring , *CITIES & towns , *INTELLIGENT transportation systems , *ROAD safety measures - Abstract
Traffic surveillance is crucial for road safety and efficiency. This study examines object detection techniques tailored for Indian traffic, highlighting the challenges faced. The aim is to track vehicles, pedestrians, and cyclists using drone-captured data in Indian cities. Urban areas in India grapple with traffic problems due to rapid growth and inadequate infrastructure. Effective monitoring can address these. Recent advancements in object detection, a critical computer vision task, show promise for this application. Employing the YOLOv8 model, trained on our drone-collected dataset, which improved detection accuracy for the Indian Traffic Scenario, is shown. This dataset labelling of vehicle bounding boxes underwent preprocessing using Gaussian Filter, resizing, normalization, and augmentation. Testing showcased the model's real-world applicability in areas like traffic management and autonomous driving. The proposed model thus enhances vehicle detection systems, fostering better safety and decision-making, yielding promising results, as evidenced by 0.86 mAP50 after training and testing the model under own UAV Dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. DOT-SLAM: A Stereo Visual Simultaneous Localization and Mapping (SLAM) System with Dynamic Object Tracking Based on Graph Optimization.
- Author
-
Zhu, Yuan, An, Hao, Wang, Huaide, Xu, Ruidong, Sun, Zhipeng, and Lu, Ke
- Subjects
- *
NONHOLONOMIC constraints , *DYNAMICAL systems , *SINGLE-degree-of-freedom systems , *TRACKING algorithms , *AUTONOMOUS vehicles , *PLANE geometry , *MODEL airplanes - Abstract
Most visual simultaneous localization and mapping (SLAM) systems are based on the assumption of a static environment in autonomous vehicles. However, when dynamic objects, particularly vehicles, occupy a large portion of the image, the localization accuracy of the system decreases significantly. To mitigate this challenge, this paper unveils DOT-SLAM, a novel stereo visual SLAM system that integrates dynamic object tracking through graph optimization. By integrating dynamic object pose estimation into the SLAM system, the system can effectively utilize both foreground and background points for ego vehicle localization and obtain a static feature points map. To rectify the inaccuracies in depth estimation from stereo disparity directly on the foreground points of dynamic objects due to their self-similarity characteristics, a coarse-to-fine depth estimation method based on camera–road plane geometry is presented. This method uses rough depth to guide fine stereo matching, thereby obtaining the 3 dimensions (3D)spatial positions of feature points on dynamic objects. Subsequently, by establishing constraints on the dynamic object's pose using the road plane and non-holonomic constraints (NHCs) of the vehicle, reducing the initial pose uncertainty of dynamic objects leads to more accurate dynamic object initialization. Finally, by considering foreground points, background points, the local road plane, the ego vehicle pose, and dynamic object poses as optimization nodes, through the establishment and joint optimization of a nonlinear model based on graph optimization, accurate six degrees of freedom (DoFs) pose estimations are obtained for both the ego vehicle and dynamic objects. Experimental validation on the KITTI-360 dataset demonstrates that DOT-SLAM effectively utilizes features from the background and dynamic objects in the environment, resulting in more accurate vehicle trajectory estimation and a static environment map. Results obtained from a real-world dataset test reinforce the effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. SiamDCFF: Dynamic Cascade Feature Fusion for Vision Tracking.
- Author
-
Lu, Jinbo, Wu, Na, and Hu, Shuo
- Subjects
- *
ARTIFICIAL neural networks - Abstract
Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot establish global dependencies on the feature maps of a search area. This paper proposes a dynamic cascade feature fusion (DCFF) module by introducing a local feature guidance (LFG) module and dynamic attention modules (DAMs) after the depth-wise cross-correlation module to enhance the global dependency modeling capability during the feature fusion process. In this paper, a set of verification experiments is designed to investigate whether establishing global dependencies for the features output by the depth-wise cross-correlation operation can significantly improve the performance of fully convolutional trackers based on a Siamese network, providing experimental support for rational design of the structure of a dynamic cascade feature fusion module. Secondly, we integrate the dynamic cascade feature fusion module into the tracking framework based on a Siamese network, propose SiamDCFF, and evaluate it using public datasets. Compared with the baseline model, SiamDCFF demonstrated significant improvements. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Learning a Context-Aware Environmental Residual Correlation Filter via Deep Convolution Features for Visual Object Tracking.
- Author
-
Kuppusami Sakthivel, Sachin Sakthi, Moorthy, Sathishkumar, Arthanari, Sathiyamoorthi, Jeong, Jae Hoon, and Joo, Young Hoon
- Subjects
- *
MACHINE learning , *VIDEO surveillance , *AUTONOMOUS vehicles , *ROBOTS , *VIDEOS - Abstract
Visual tracking has become widespread in swarm robots for intelligent video surveillance, navigation, and autonomous vehicles due to the development of machine learning algorithms. Discriminative correlation filter (DCF)-based trackers have gained increasing attention owing to their efficiency. This study proposes "context-aware environmental residual correlation filter tracking via deep convolution features (CAERDCF)" to enhance the performance of the tracker under ambiguous environmental changes. The objective is to address the challenges posed by intensive environment variations that confound DCF-based trackers, resulting in undesirable tracking drift. We present a selective spatial regularizer in the DCF to suppress boundary effects and use the target's context information to improve tracking performance. Specifically, a regularization term comprehends the environmental residual among video sequences, enhancing the filter's discrimination and robustness in unpredictable tracking conditions. Additionally, we propose an efficient method for acquiring environmental data using the current observation without additional computation. A multi-feature integration method is also introduced to enhance the target's presence by combining multiple metrics. We demonstrate the efficiency and feasibility of our proposed CAERDCF approach by comparing it with existing methods using the OTB2015, TempleColor128, UAV123, LASOT, and GOT10K benchmark datasets. Specifically, our method increased the precision score by 12.9% in OTB2015 and 16.1% in TempleColor128 compared to BACF. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Long 3D-POT: A Long-Term 3D Drosophila-Tracking Method for Position and Orientation with Self-Attention Weighted Particle Filters.
- Author
-
Yin, Chengkai, Liu, Xiang, Zhang, Xing, Wang, Shuohong, and Su, Haifeng
- Subjects
SWARM intelligence ,GRAPH neural networks ,DROSOPHILA ,INSECT behavior ,MODEL airplanes - Abstract
The study of the intricate flight patterns and behaviors of swarm insects, such as drosophilas, has long been a subject of interest in both the biological and computational realms. Tracking drosophilas is an essential and indispensable method for researching drosophilas' behaviors. Still, it remains a challenging task due to the highly dynamic nature of these drosophilas and their partial occlusion in multi-target environments. To address these challenges, particularly in environments where multiple targets (drosophilas) interact and overlap, we have developed a long-term Trajectory 3D Position and Orientation Tracking Method (Long 3D-POT) that combines deep learning with particle filtering. Our approach employs a detection model based on an improved Mask-RCNN to accurately detect the position and state of drosophilas from frames, even when they are partially occluded. Following detection, improved particle filtering is used to predict and update the motion of the drosophilas. To further enhance accuracy, we have introduced a prediction module based on the self-attention backbone that predicts the drosophila's next state and updates the particles' weights accordingly. Compared with previous methods by Ameni, Cheng, and Wang, our method has demonstrated a higher degree of accuracy and robustness in tracking the long-term trajectories of drosophilas, even those that are partially occluded. Specifically, Ameni employs the Interacting Multiple Model (IMM) combined with the Global Nearest Neighbor (GNN) assignment algorithm, primarily designed for tracking larger, more predictable targets like aircraft, which tends to perform poorly with small, fast-moving objects like drosophilas. The method by Cheng then integrates particle filtering with LSTM networks to predict particle weights, enhancing trajectory prediction under kinetic uncertainties. Wang's approach builds on Cheng's by incorporating an estimation of the orientation of drosophilas in order to refine tracking further. Compared with those methods, our method performs with higher accuracy on detection, which increases by more than 10% on the F1 Score, and tracks more long-term trajectories, showing stability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Improved Mobile Robot Manoeuvring Using Bayes Filter Algorithm Within the Planned Path.
- Author
-
Saad, Mohammed and Alazzawi, Yarub
- Subjects
- *
COMPUTER vision , *MOBILE robots , *CONSTRAINT algorithms , *KALMAN filtering , *COORDINATE transformations - Abstract
This research introduces a novel approach for object tracking, capitalizing on the Bayes filter algorithm within the constraints of a single-camera setup. Object tracking is a pivotal aspect of computer vision, significantly influencing system performance in diverse applications. The integration of the Bayes filter algorithm provides a probabilistic framework, effectively addressing challenges posed by occlusions, lighting variations, and unpredictable object movements in real-world scenarios. Our methodology not only streamlines the tracking setup by utilizing a single camera but also enhances practicality, making it particularly relevant for applications with resource constraints. The paper offers a comprehensive exploration of this approach, delving into the theoretical foundations and technical intricacies that underlie the fusion of advanced object tracking techniques with the Bayes filter algorithm. Through empirical evaluations across varied tracking scenarios, our approach demonstrates superior effectiveness compared to traditional methods, showcasing the algorithm's ingenuity in improving tracking accuracy and adaptability. It achieved a dynamic simulation efficiency of 97.025%, a sensitivity of 96.2616%, and an overall system quality (F-score) of 97.0493%. This research contributes valuable insights to the evolving landscape of object tracking methodologies, presenting a practical and efficient solution that combines the Bayes filter algorithm's power with the simplicity of a single camera setup. The findings presented herein offer a nuanced perspective for researchers and practitioners seeking to elevate the precision and real-time adaptability of object tracking systems in diverse applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.