Publication Year Range: This year / Publisher: mdpi / Topic: algorithms and object recognition (computer vision) - Searchworks@Jio Institute Digital Library Search Results

Showing total 64 results

Start Over Topic algorithms Topic object recognition (computer vision) Publication Year Range This year Publisher mdpi

64 results

1. Robot Operating Systems–You Only Look Once Version 5–Fleet Efficient Multi-Scale Attention: An Improved You Only Look Once Version 5-Lite Object Detection Algorithm Based on Efficient Multi-Scale Attention and Bounding Box Regression Combined with Robot Operating Systems

Author: Wang, Haiyan, Shi, Zhan, Gao, Guiyuan, Li, Chuang, Zhao, Jian, and Xu, Zhiwei
Subjects: OBJECT recognition (Computer vision), COMPUTER performance, ALGORITHMS, ROBOTICS, ROBOTS
Abstract: This paper primarily investigates enhanced object detection techniques for indoor service mobile robots. Robot operating systems (ROS) supply rich sensor data, which boost the models' ability to generalize. However, the model's performance might be hindered by constraints in the processing power, memory capacity, and communication capabilities of robotic devices. To address these issues, this paper proposes an improved you only look once version 5 (YOLOv5)-Lite object detection algorithm based on efficient multi-scale attention and bounding box regression combined with ROS. The algorithm incorporates efficient multi-scale attention (EMA) into the traditional YOLOv5-Lite model and replaces the C3 module with a lightweight C3Ghost module to reduce computation and model size during the convolution process. To enhance bounding box localization accuracy, modified precision-defined intersection over union (MPDIoU) is employed to optimize the model, resulting in the ROS–YOLOv5–FleetEMA model. The results indicated that relative to the conventional YOLOv5-Lite model, the ROS–YOLOv5–FleetEMA model enhanced the mean average precision (mAP) by 2.7% post-training, reduced giga floating-point operations per second (GFLOPS) by 13.2%, and decreased the params by 15.1%. In light of these experimental findings, the model was incorporated into ROS, leading to the development of a ROS-based object detection platform that offers rapid and precise object detection capabilities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. A Lightweight Remote Sensing Small Target Image Detection Algorithm Based on Improved YOLOv8.

Author: Nie, Haijiao, Pang, Huanli, Ma, Mingyang, and Zheng, Ruikai
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, REMOTE-sensing images, REMOTE sensing
Abstract: In response to the challenges posed by small objects in remote sensing images, such as low resolution, complex backgrounds, and severe occlusions, this paper proposes a lightweight improved model based on YOLOv8n. During the detection of small objects, the feature fusion part of the YOLOv8n algorithm retrieves relatively fewer features of small objects from the backbone network compared to large objects, resulting in low detection accuracy for small objects. To address this issue, firstly, this paper adds a dedicated small object detection layer in the feature fusion network to better integrate the features of small objects into the feature fusion part of the model. Secondly, the SSFF module is introduced to facilitate multi-scale feature fusion, enabling the model to capture more gradient paths and further improve accuracy while reducing model parameters. Finally, the HPANet structure is proposed, replacing the Path Aggregation Network with HPANet. Compared to the original YOLOv8n algorithm, the recognition accuracy of mAP@0.5 on the VisDrone data set and the AI-TOD data set has increased by 14.3% and 17.9%, respectively, while the recognition accuracy of mAP@0.5:0.95 has increased by 17.1% and 19.8%, respectively. The proposed method reduces the parameter count by 33% and the model size by 31.7% compared to the original model. Experimental results demonstrate that the proposed method can quickly and accurately identify small objects in complex backgrounds. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Deep Learning-Based Intelligent Detection Device for Insulation Pull Rod Defects.

Author: Yu, Hua, Niu, Shu, Li, Shuai, Yang, Gang, Wang, Xuan, Luo, Hanhua, Fan, Xianhao, and Li, Chuanyang
Subjects: OBJECT recognition (Computer vision), INTELLIGENT buildings, DEEP learning, ALGORITHMS, SPEED, HARDWARE
Abstract: This paper proposes a deep learning-based intelligent detection device for insulation pull rod defects, addressing the issues of low detection accuracy, poor timeliness of intelligent analysis, and the difficulty in preserving detection results. Firstly, by constructing the pull rod defects dataset and training the YOLOv5s network, along with commonly used object detection algorithms in industrial defect detection, the feasibility of deep learning networks for insulation pull rod defects detection is explored. Secondly, the trained model is combined to build an intelligent detection device for pull rod defects, integrating insulation pull rod image acquisition and defect detection into a unified system. The research results demonstrate that the YOLOv5s network can quickly and accurately detect pull rod defects. On the test set constructed in this paper, the detection performance metric mAP@0.5:0.95 of the trained model reached 54.7%. Specifically, the mAP@0.5 score was 86.9% at a threshold of 0.5. The detection speed FPS reached 169.5, significantly improving the detection efficiency and accuracy compared to traditional object detection algorithms. By establishing an organic connection between the image hardware acquisition device and the deep learning network, the existing problems of inefficient detection and difficult storage of detection results in pull rod defects detection methods are effectively addressed. This research provides new insights for detecting insulation pull rod defects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. IRBEVF-Q: Optimization of Image–Radar Fusion Algorithm Based on Bird's Eye View Features.

Author: Cai, Ganlin, Chen, Feng, and Guo, Ente
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, VIDEO coding, AUTONOMOUS vehicles, CAMERAS, PROBLEM solving
Abstract: In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly consists of BEV (Bird's Eye View) fusion coding module and an object decoder module.The BEV fusion coding module solves the problem of unified representation of different modal information by fusing the image and radar features through 3D spatial reference points as a medium. The query in the object decoder, as a core component, plays an important role in detection. In this paper, Heat Map-Guided Query Initialization (HGQI) and Dynamic Position Encoding (DPE) are proposed in query construction to increase the a priori information of the query. The Auxiliary Noise Query (ANQ) then helps to stabilize the matching. The experimental results demonstrate that the proposed fusion model IRBEVF-Q achieves an NDS of 0.575 and a mAP of 0.476 on the nuScenes test set. Compared to recent state-of-the-art methods, our model shows significant advantages, thus indicating that our approach contributes to improving detection accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. HeMoDU: High-Efficiency Multi-Object Detection Algorithm for Unmanned Aerial Vehicles on Urban Roads.

Author: Shi, Hanyi, Wang, Ningzhi, Xu, Xinyao, Qian, Yue, Zeng, Lingbin, and Zhu, Yi
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, DEEP learning, TRAFFIC monitoring
Abstract: Unmanned aerial vehicle (UAV)-based object detection methods are widely used in traffic detection due to their high flexibility and extensive coverage. In recent years, with the increasing complexity of the urban road environment, UAV object detection algorithms based on deep learning have gradually become a research hotspot. However, how to further improve algorithmic efficiency in response to the numerous and rapidly changing road elements, and thus achieve high-speed and accurate road object detection, remains a challenging issue. Given this context, this paper proposes the high-efficiency multi-object detection algorithm for UAVs (HeMoDU). HeMoDU reconstructs a state-of-the-art, deep-learning-based object detection model and optimizes several aspects to improve computational efficiency and detection accuracy. To validate the performance of HeMoDU in urban road environments, this paper uses the public urban road datasets VisDrone2019 and UA-DETRAC for evaluation. The experimental results show that the HeMoDU model effectively improves the speed and accuracy of UAV object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm.

Author: Qin, Yuchen, Kou, Ziming, Han, Cong, and Wang, Yutong
Subjects: ENERGY consumption, ALGORITHMS, OBJECT recognition (Computer vision), COAL, X-ray imaging
Abstract: Intelligent gangue sorting with high precision is of vital importance for improving coal quality. To tackle the challenges associated with coal gangue target detection, including algorithm performance imbalance and hardware deployment difficulties, in this paper, an intelligent gangue separation system that adopts the elevated YOLO-v5 algorithm and dual-energy X-rays is proposed. Firstly, images of dual-energy X-ray transmission coal gangue mixture under the actual operation of a coal mine were collected, and datasets for training and validation were self-constructed. Then, in the YOLOv5 backbone network, the EfficientNetv2 was used to replace the original cross stage partial darknet (CSPDarknet) to achieve the lightweight of the backbone network; in the neck, a light path aggregation network (LPAN) was designed based on PAN, and a convolutional block attention module (CBAM) was integrated into the BottleneckCSP of the feature fusion block to raise the feature acquisition capability of the network and maximize the learning effect. Subsequently, to accelerate the rate of convergence, an efficient intersection over union (EIOU) was used instead of the complete intersection over union (CIOU) loss function. Finally, to address the problem of low resolution of small targets leading to missed detection, an L2 detection head was introduced to the head section to improve the multi-scale target detection performance of the algorithm. The experimental results indicate that in comparison with YOLOv5-S, the same version of the algorithm proposed in this paper increases by 19.2% and 32.4% on mAP @.5 and mAP @.5:.95, respectively. The number of parameters decline by 51.5%, and the calculation complexity declines by 14.7%. The algorithm suggested in this article offers new ideas for the design of identification algorithms for coal gangue sorting systems, which is expected to save energy and reduce consumption, reduce labor, improve efficiency, and be more friendly to the embedded platform. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. LRMSNet: A New Lightweight Detection Algorithm for Multi-Scale SAR Objects.

Author: Wu, Hailang, Sang, Hanbo, Zhang, Zenghui, and Guo, Weiwei
Subjects: OBJECT recognition (Computer vision), DEEP learning, ALGORITHMS, SENSOR networks, FEATURE extraction, SYNTHETIC aperture radar
Abstract: In recent years, deep learning has found widespread application in SAR image object detection. However, when detecting multi-scale targets against complex backgrounds, these models often struggle to strike a balance between accuracy and speed. Furthermore, there is a continuous need to enhance the performance of current models. Hence, this paper proposes LRMSNet, a new multi-scale target detection model designed specifically for SAR images in complex backgrounds. Firstly, the paper introduces an attention module designed to enhance contextual information aggregation and capture global features, which is integrated into a backbone network with an expanded receptive field for improving SAR image feature extraction. Secondly, this paper develops an information aggregation module to effectively fuse different feature layers of the backbone network. Lastly, to better integrate feature information at various levels, this paper designs a multi-scale aggregation network. We validate the effectiveness of our method on three different SAR object detection datasets (MSAR-1.0, SSDD, and HRSID). Experimental results demonstrate that LRMSNet achieves outstanding performance with a mean average accuracy (mAP) of 95.2%, 98.9%, and 93.3% on the MSAR-1.0, SSDD, and HRSID datasets, respectively, with only 3.46 M parameters and 12.6 G floating-point operation cost (FLOPs). When compared with existing SAR object detection models on the MSAR-1.0 dataset, LRMSNet achieves state-of-the-art (SOTA) performance, showcasing its superiority in addressing SAR detection challenges in large-scale complex environments and across various object scales. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. MRD-YOLO: A Multispectral Object Detection Algorithm for Complex Road Scenes.

Author: Sun, Chaoyue, Chen, Yajun, Qiu, Xiaoyang, Li, Rongzhen, and You, Longxiang
Subjects: OBJECT recognition (Computer vision), FEATURE extraction, INFRARED imaging, ALGORITHMS, DETECTION alarms
Abstract: Object detection is one of the core technologies for autonomous driving. Current road object detection mainly relies on visible light, which is prone to missed detections and false alarms in rainy, night-time, and foggy scenes. Multispectral object detection based on the fusion of RGB and infrared images can effectively address the challenges of complex and changing road scenes, improving the detection performance of current algorithms in complex scenarios. However, previous multispectral detection algorithms suffer from issues such as poor fusion of dual-mode information, poor detection performance for multi-scale objects, and inadequate utilization of semantic information. To address these challenges and enhance the detection performance in complex road scenes, this paper proposes a novel multispectral object detection algorithm called MRD-YOLO. In MRD-YOLO, we utilize interaction-based feature extraction to effectively fuse information and introduce the BIC-Fusion module with attention guidance to fuse different modal information. We also incorporate the SAConv module to improve the model's detection performance for multi-scale objects and utilize the AIFI structure to enhance the utilization of semantic information. Finally, we conduct experiments on two major public datasets, FLIR_Aligned and M3FD. The experimental results demonstrate that compared to other algorithms, the proposed algorithm achieves superior detection performance in complex road scenes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Lightweight Underwater Object Detection Algorithm for Embedded Deployment Using Higher-Order Information and Image Enhancement.

Author: Liu, Changhong, Wen, Jiawen, Huang, Jinshan, Lin, Weiren, Wu, Bochun, Xie, Ning, and Zou, Tao
Subjects: OBJECT recognition (Computer vision), IMAGE intensifiers, COMPUTER vision, ALGORITHMS, ATTENUATION of light
Abstract: Underwater object detection is crucial in marine exploration, presenting a challenging problem in computer vision due to factors like light attenuation, scattering, and background interference. Existing underwater object detection models face challenges such as low robustness, extensive computation of model parameters, and a high false detection rate. To address these challenges, this paper proposes a lightweight underwater object detection method integrating deep learning and image enhancement. Firstly, FUnIE-GAN is employed to perform data enhancement to restore the authentic colors of underwater images, and subsequently, the restored images are fed into an enhanced object detection network named YOLOv7-GN proposed in this paper. Secondly, a lightweight higher-order attention layer aggregation network (ACC3-ELAN) is designed to improve the fusion perception of higher-order features in the backbone network. Moreover, the head network is enhanced by leveraging the interaction of multi-scale higher-order information, additionally fusing higher-order semantic information from features at different scales. To further streamline the entire network, we also introduce the AC-ELAN-t module, which is derived from pruning based on ACC3-ELAN. Finally, the algorithm undergoes practical testing on a biomimetic sea flatworm underwater robot. The experimental results on the DUO dataset show that our proposed method improves the performance of object detection in underwater environments. It provides a valuable reference for realizing object detection in underwater embedded devices with great practical potential. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Lightweight Meter Pointer Recognition Method Based on Improved YOLOv5.

Author: Zhang, Chi, Wang, Kai, Zhang, Jie, Zhou, Fan, and Zou, Le
Subjects: OBJECT recognition (Computer vision), CIRCLE, DEEP learning, ALGORITHMS
Abstract: In substation lightning rod meter reading data taking, the classical object detection model is not suitable for deployment in substation monitoring hardware devices due to its large size, large number of parameters, and slow detection speed, while is difficult to balance detection accuracy and real-time requirements with the existing lightweight object detection model. To address this problem, this paper constructs a lightweight object detection algorithm, YOLOv5-Meter Reading Lighting (YOLOv5-MRL), based on the improved YOLOv5 model's speed while maintaining accuracy. Then, the YOLOv5s are pruned based on the convolutional kernel channel soft pruning algorithm, which greatly reduces the number of parameters in the YOLOv5-MRL model while maintaining a certain accuracy loss. Finally, in order to facilitate the dial reading, the dial external circle fitting method is proposed to calculate the dial reading using the circular angle algorithm. The experimental results on the self-built dataset show that the YOLOv5-MRL object detection model achieves a mean average precision of 96.9%, a detection speed of 5 ms/frame, and a model weight size of 5.5 MB, making it better than other advanced dial reading models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. RCDAM-Net: A Foreign Object Detection Algorithm for Transmission Tower Lines Based on RevCol Network.

Author: Zhang, Wenli, Li, Yingna, and Liu, Ailian
Subjects: OBJECT recognition (Computer vision), FOREIGN bodies, ELECTRIC lines, FEATURE extraction, ALGORITHMS, ASPECT ratio (Images)
Abstract: As an important part of the power system, it is necessary to ensure the safe and stable operation of transmission lines. Due to long-term exposure to the outdoors, the lines face many insecurity factors, and foreign object intrusion is one of them. Traditional foreign object (bird's nest, kite, balloon, trash bag) detection algorithms suffer from low efficiency, poor accuracy, and small coverage, etc. To address the above problems, this paper introduces the RCDAM-Net. In order to prevent feature loss or useful feature compression, the RevCol (Reversible Column Networks) is used as the backbone network to ensure that the total information remains unchanged during feature decoupling. DySnakeConv (Dynamic Snake Convolution) is adopted and embedded into the C2f structure, which is named C2D and integrates low-level features and high-level features. Compared to the original BottleNeck structure of C2f, the DySnakeConv enhances the feature extraction ability for elongated and weak targets. In addition, MPDIoU (Maximum Performance Diagonal Intersection over Union) is used to improve the regression performance of model bounding boxes, solving the problem of predicted bounding boxes having the same aspect ratio as true bounding boxes, but with different values. Further, we adopt Decoupled Head for detection and add additional auxiliary training heads to improve the detection accuracy of the model. The experimental results show that the model achieves mAP50, Precision, and Recall of 97.98%, 98.15%, and 95.16% on the transmission tower line foreign object dataset, which is better to existing multi-target detection algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. GMS-YOLO: An Algorithm for Multi-Scale Object Detection in Complex Environments in Confined Compartments.

Author: Ding, Qixiang, Li, Weichao, Xu, Chengcheng, Zhang, Mingyuan, Sheng, Changchong, He, Min, and Shan, Nanliang
Subjects: OBJECT recognition (Computer vision), COMPUTATIONAL complexity, ALGORITHMS, FASTENERS, HAZARDS
Abstract: Many compartments are prone to pose safety hazards such as loose fasteners or object intrusion due to their confined space, making manual inspection challenging. To address the challenges of complex inspection environments, diverse target categories, and variable scales in confined compartments, this paper proposes a novel GMS-YOLO network, based on the improved YOLOv8 framework. In addition to the lightweight design, this network accurately detects targets by leveraging more precise high-level and low-level feature representations obtained from GhostHGNetv2, which enhances feature-extraction capabilities. To handle the issue of complex environments, the backbone employs GhostHGNetv2 to capture more accurate high-level and low-level feature representations, facilitating better distinction between background and targets. In addition, this network significantly reduces both network parameter size and computational complexity. To address the issue of varying target scales, the first layer of the feature fusion module introduces Multi-Scale Convolutional Attention (MSCA) to capture multi-scale contextual information and guide the feature fusion process. A new lightweight detection head, Shared Convolutional Detection Head (SCDH), is designed to enable the model to achieve higher accuracy while being lighter. To evaluate the performance of this algorithm, a dataset for object detection in this scenario was constructed. The experiment results indicate that compared to the original model, the parameter number of the improved model decreased by 37.8%, the GFLOPs decreased by 27.7%, and the average accuracy increased from 82.7% to 85.0%. This validates the accuracy and applicability of the proposed GMS-YOLO network. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Lightweight Single-Stage Ship Object Detection Algorithm for Unmanned Surface Vessels Based on Improved YOLOv5.

Author: Sun, Hui, Zhang, Weizhe, Yang, Shu, and Wang, Hongbo
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, WAREHOUSES, SHIPS
Abstract: Object detection is applied extensively in various domains, including industrial manufacturing, road traffic management, warehousing and logistics, and healthcare. In ship object detection tasks, detection networks are frequently deployed on devices with limited computational resources, e.g., unmanned surface vessels. This creates a need to balance accuracy with a low parameter count and low computational load. This paper proposes an improved object detection network based on YOLOv5. To reduce the model parameter count and computational load, we utilize an enhanced ShuffleNetV2 network as the backbone. In addition, a split-DLKA module is devised and implemented in the small object detection layer to improve detection accuracy. Finally, we introduce the WIOUv3 loss function to minimize the impact of low-quality samples on the model. Experiments conducted on the SeaShips dataset demonstrate that the proposed method reduces parameters by 71% and computational load by 58% compared to YOLOv5s. In addition, the proposed method increases the mAP@0.5 and mAP@0.5:0.95 values by 3.9% and 3.3%, respectively. Thus, the proposed method exhibits excellent performance in both real-time processing and accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Improved Architecture and Training Strategies of YOLOv7 for Remote Sensing Image Object Detection.

Author: Zhao, Dewei, Shao, Faming, Liu, Qiang, Zhang, Heng, Zhang, Zihan, and Yang, Li
Subjects: OBJECT recognition (Computer vision), REMOTE sensing, FEATURE extraction, NETWORK performance, ALGORITHMS
Abstract: The technology for object detection in remote sensing images finds extensive applications in production and people's lives, and improving the accuracy of image detection is a pressing need. With that goal, this paper proposes a range of improvements, rooted in the widely used YOLOv7 algorithm, after analyzing the requirements and difficulties in the detection of remote sensing images. Specifically, we strategically remove some standard convolution and pooling modules from the bottom of the network, adopting stride-free convolution to minimize the loss of information for small objects in the transmission. Simultaneously, we introduce a new, more efficient attention mechanism module for feature extraction, significantly enhancing the network's semantic extraction capabilities. Furthermore, by adding multiple cross-layer connections in the network, we more effectively utilize the feature information of each layer in the backbone network, thereby enhancing the network's overall feature extraction capability. During the training phase, we introduce an auxiliary network to intensify the training of the underlying network and adopt a new activation function and a more efficient loss function to ensure more effective gradient feedback, thereby elevating the network performance. In the experimental results, our improved network achieves impressive mAP scores of 91.2% and 80.8% on the DIOR and DOTA version 1.0 remote sensing datasets, respectively. These represent notable improvements of 4.5% and 7.0% over the original YOLOv7 network, significantly enhancing the efficiency of detecting small objects in particular. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. KCS-YOLO: An Improved Algorithm for Traffic Light Detection under Low Visibility Conditions.

Author: Zhou, Qinghui, Zhang, Diyi, Liu, Haoshi, and He, Yuping
Subjects: OBJECT recognition (Computer vision), TRAFFIC monitoring, TRAFFIC signs & signals, AUTONOMOUS vehicles, ALGORITHMS
Abstract: Autonomous vehicles face challenges in small-target detection and, in particular, in accurately identifying traffic lights under low visibility conditions, e.g., fog, rain, and blurred night-time lighting. To address these issues, this paper proposes an improved algorithm, namely KCS-YOLO (you only look once), to increase the accuracy of detecting and recognizing traffic lights under low visibility conditions. First, a comparison was made to assess different YOLO algorithms. The benchmark indicates that the YOLOv5n algorithm achieves the highest mean average precision (mAP) with fewer parameters. To enhance the capability for detecting small targets, the algorithm built upon YOLOv5n, namely KCS-YOLO, was developed using the K-means++ algorithm for clustering marked multi-dimensional target frames, embedding the convolutional block attention module (CBAM) attention mechanism, and constructing a small-target detection layer. Second, an image dataset of traffic lights was generated, which was preprocessed using the dark channel prior dehazing algorithm to enhance the proposed algorithm's recognition capability and robustness. Finally, KCS-YOLO was evaluated through comparison and ablation experiments. The experimental results showed that the mAP of KCS-YOLO reaches 98.87%, an increase of 5.03% over its counterpart of YOLOv5n. This indicates that KCS-YOLO features high accuracy in object detection and recognition, thereby enhancing the capability of traffic light detection and recognition for autonomous vehicles in low visibility conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. A Novel Grasp Detection Algorithm with Multi-Target Semantic Segmentation for a Robot to Manipulate Cluttered Objects.

Author: Zhong, Xungao, Chen, Yijun, Luo, Jiaguo, Shi, Chaoquan, and Hu, Huosheng
Subjects: TRANSFORMER models, OBJECT recognition (Computer vision), ROBOTS, ALGORITHMS, GENERALIZATION, ROBOT hands
Abstract: Objects in cluttered environments may have similar sizes and shapes, which remains a huge challenge for robot grasping manipulation. The existing segmentation methods, such as Mask R-CNN and Yolo-v8, tend to lose the shape details of objects when dealing with messy scenes, and this loss of detail limits the grasp performance of robots in complex environments. This paper proposes a high-performance grasp detection algorithm with a multi-target semantic segmentation model, which can effectively improve a robot's grasp success rate in cluttered environments. The algorithm consists of two cascades: Semantic Segmentation and Grasp Detection modules (SS-GD), in which the backbone network of the semantic segmentation module is developed by using the state-of-the-art Swin Transformer structure. It can extract the detailed features of objects in cluttered environments and enable a robot to understand the position and shape of the candidate object. To construct the grasp schema SS-GD focused on important vision features, a grasp detection module is designed based on the Squeeze-and-Excitation (SE) attention mechanism, to predict the corresponding grasp configuration accurately. The grasp detection experiments were conducted on an actual UR5 robot platform to verify the robustness and generalization of the proposed SS-GD method in cluttered environments. A best grasp success rate of 91.7% was achieved for cluttered multi-target workspaces. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. A Detection Algorithm for Citrus Huanglongbing Disease Based on an Improved YOLOv8n.

Author: Xie, Wu, Feng, Feihong, and Zhang, Huimin
Subjects: CITRUS greening disease, OBJECT recognition (Computer vision), CITRUS, FEATURE extraction, ALGORITHMS, ORCHARD management, ORCHARDS
Abstract: Given the severe impact of Citrus Huanglongbing on orchard production, accurate detection of the disease is crucial in orchard management. In the natural environments, due to factors such as varying light intensities, mutual occlusion of citrus leaves, the extremely small size of Huanglongbing leaves, and the high similarity between Huanglongbing and other citrus diseases, there remains an issue of low detection accuracy when using existing mainstream object detection models for the detection of citrus Huanglongbing. To address this issue, we propose YOLO-EAF (You Only Look Once–Efficient Asymptotic Fusion), an improved model based on YOLOv8n. Firstly, the Efficient Multi-Scale Attention Module with cross-spatial learning (EMA) is integrated into the backbone feature extraction network to enhance the feature extraction and integration capabilities of the model. Secondly, the adaptive spatial feature fusion (ASFF) module is used to enhance the feature fusion ability of different levels of the model so as to improve the generalization ability of the model. Finally, the focal and efficient intersection over union (Focal–EIOU) is utilized as the loss function, which accelerates the convergence process of the model and improves the regression precision and robustness of the model. In order to verify the performance of the YOLO-EAF method, we tested it on the self-built citrus Huanglongbing image dataset. The experimental results showed that YOLO-EAF achieved an 8.4% higher precision than YOLOv8n on the self-built dataset, reaching 82.7%. The F1-score increased by 3.33% to 77.83%, and the mAP (0.5) increased by 3.3% to 84.7%. Through experimental comparisons, the YOLO-EAF model proposed in this paper offers a new technical route for the monitoring and management of Huanglongbing in smart orange orchards. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Development of a Lightweight Floating Object Detection Algorithm.

Author: Xian, Rundong, Tang, Lijun, and Liu, Shenbo
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, NETWORK performance, GRAPHICS processing units, PARAMETERIZATION, EXTRACTION techniques
Abstract: YOLOv5 is currently one of the mainstream algorithms for object detection. In this paper, we propose the FRL-YOLO model specifically for river floating object detection. The algorithm integrates the Fasternet block into the C3 module, conducting convolutions only on a subset of input channels to reduce computational load. Simultaneously, it effectively captures spatial features, incorporates reparameterization techniques into the feature extraction network, and introduces the RepConv design to enhance model training efficiency. To further optimize network performance, the ACON-C activation function is employed. Finally, by employing a structured non-destructive pruning approach, redundant channels in the model are trimmed, significantly reducing the model's volume. Experimental results indicate that the algorithm achieves an average precision value (mAP) of 79.3%, a 0.4% improvement compared to yolov5s. The detection speed on the NVIDIA GeForce RTX 4070 graphics card reaches 623.5 fps/s, a 22.8% increase over yolov5s. The improved model is compressed to a volume of 2 MB, representing only 14.7% of yolov5s. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Research on a Fast Image-Matching Algorithm Based on Nonlinear Filtering.

Author: Yin, Chenglong, Zhang, Fei, Hao, Bin, Fu, Zijian, and Pang, Xiaoyu
Subjects: OBJECT recognition (Computer vision), FILTERS & filtration, TIME complexity, COMPUTER vision, ALGORITHMS
Abstract: Computer vision technology is being applied at an unprecedented speed in various fields such as 3D scene reconstruction, object detection and recognition, video content tracking, pose estimation, and motion estimation. To address the issues of low accuracy and high time complexity in traditional image feature point matching, a fast image-matching algorithm based on nonlinear filtering is proposed. By applying nonlinear diffusion filtering to scene images, details and edge information can be effectively extracted. The feature descriptors of the feature points are transformed into binary form, occupying less storage space and thus reducing matching time. The adaptive RANSAC algorithm is utilized to eliminate mismatched feature points, thereby improving matching accuracy. Our experimental results on the Mikolajcyzk image dataset comparing the SIFT algorithm with SURF-, BRISK-, and ORB-improved algorithms based on the SIFT algorithm conclude that the fast image-matching algorithm based on nonlinear filtering reduces matching time by three-quarters, with an overall average accuracy of over 7% higher than other algorithms. These experiments demonstrate that the fast image-matching algorithm based on nonlinear filtering has better robustness and real-time performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Deep Learning-Based Multiple Droplet Contamination Detector for Vision Systems Using a You Only Look Once Algorithm.

Author: Kim, Youngkwang, Kim, Woochan, Yoon, Jungwoo, Chung, Sangkug, and Kim, Daegeun
Subjects: DEEP learning, OBJECT recognition (Computer vision), DIGITAL cameras, ALGORITHMS, DRONE surveillance, DETECTORS, PHOTOGRAPHIC lenses
Abstract: This paper presents a practical contamination detection system for camera lenses using image analysis with deep learning. The proposed system can detect contamination in camera digital images through contamination learning utilizing deep learning, and it aims to prevent performance degradation of intelligent vision systems due to lens contamination in cameras. This system is based on the object detection algorithm YOLO (v5n, v5s, v5m, v5l, and v5x), which is trained with 4000 images captured under different lighting and background conditions. The trained models showed that the average precision improves as the algorithm size increases, especially for YOLOv5x, which showed excellent efficiency in detecting droplet contamination within 23 ms. They also achieved an average precision (mAP@0.5) of 87.46%, recall (mAP@0.5:0.95) of 51.90%, precision of 90.28%, recall of 81.47%, and F1 score of 85.64%. As a proof of concept, we demonstrated the identification and removal of contamination on camera lenses by integrating a contamination detection system and a transparent heater-based cleaning system. The proposed system is anticipated to be applied to autonomous driving systems, public safety surveillance cameras, environmental monitoring drones, etc., to increase operational safety and reliability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Ship-Fire Net: An Improved YOLOv8 Algorithm for Ship Fire Detection.

Author: Zhang, Ziyang, Tan, Lingye, and Tiong, Robert Lee Kong
Subjects: FIRE detectors, MACHINE learning, OBJECT recognition (Computer vision), DEEP learning, ALGORITHMS, COMPUTATIONAL complexity, SHIPS
Abstract: Ship fire may result in significant damage to its structure and large economic loss. Hence, the prompt identification of fires is essential in order to provide prompt reactions and effective mitigation strategies. However, conventional detection systems exhibit limited efficacy and accuracy in detecting targets, which has been mostly attributed to limitations imposed by distance constraints and the motion of ships. Although the development of deep learning algorithms provides a potential solution, the computational complexity of ship fire detection algorithm pose significant challenges. To solve this, this paper proposes a lightweight ship fire detection algorithm based on YOLOv8n. Initially, a dataset, including more than 4000 unduplicated images and their labels, is established before training. In order to ensure the performance of algorithms, both fire inside ship rooms and also fire on board are considered. Then after tests, YOLOv8n is selected as the model with the best performance and fastest speed from among several advanced object detection algorithms. GhostnetV2-C2F is then inserted in the backbone of the algorithm for long-range attention with inexpensive operation. In addition, spatial and channel reconstruction convolution (SCConv) is used to reduce redundant features with significantly lower complexity and computational costs for real-time ship fire detection. For the neck part, omni-dimensional dynamic convolution is used for the multi-dimensional attention mechanism, which also lowers the parameters. After these improvements, a lighter and more accurate YOLOv8n algorithm, called Ship-Fire Net, was proposed. The proposed method exceeds 0.93, both in precision and recall for fire and smoke detection in ships. In addition, the mAP@0.5 reaches about 0.9. Despite the improvement in accuracy, Ship-Fire Net also has fewer parameters and lower FLOPs compared to the original, which accelerates its detection speed. The FPS of Ship-Fire Net also reaches 286, which is helpful for real-time ship fire monitoring. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Transmission Tower Re-Identification Algorithm Based on Machine Vision.

Author: Chen, Lei, Yang, Zuowei, Huang, Fengyun, Dai, Yiwei, Liu, Rui, and Li, Jiajia
Subjects: COMPUTER vision, OBJECT recognition (Computer vision), DEEP learning, IMAGE transmission, TRAFFIC signs & signals, ALGORITHMS
Abstract: Featured Application: This work can be potentially applied to the recognition of traffic signs in intelligent driving vehicles and automatic inspection of power systems. Transmission tower re-identification refers to the recognition of the location and identity of transmission towers, facilitating the rapid localization of transmission towers during power system inspection. Although there are established methods for the defect detection of transmission towers and accessories (such as crossarms and insulators), there is a lack of automated methods for transmission tower identity matching. This paper proposes an identity-matching method for transmission towers that integrates machine vision and deep learning. Initially, the method requires the creation of a template library. Firstly, the YOLOv8 object detection algorithm is employed to extract the transmission tower images, which are then mapped into a d-dimensional feature vector through a matching network. During the training process of the matching network, a strategy for the online generation of triplet samples is introduced. Secondly, a template library is built upon these d-dimensional feature vectors, which forms the basis of transmission tower re-identification. Subsequently, our method re-identifies the input images. Firstly, we propose that the YOLOv5n-conv head detects and crops the transmission towers in images. Secondly, images without transmission towers are skipped; for those with transmission towers, The matching network maps transmission tower instances into feature vectors. Ultimately, transmission tower re-identification is realized by comparing feature vectors with those in the template library using Euclidean distance. Concurrently, it can be combined with GPS information to narrow down the comparison range. Experiments show that the YOLOv5n-conv head model achieved a mean Average Precision at an Intersection Over Union threshold of 0.5 (mAP@0.5) score of 0.974 in transmission tower detection, reducing the detection speed by 2.4 ms compared to the original YOLOv5n. Integrating the online triplet sample generation into the matching network training with Inception-ResNet-v1 (d = 128) as the backbone enhanced the network's rank-1 performance by 3.86%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. YOLOv7-Ship: A Lightweight Algorithm for Ship Object Detection in Complex Marine Environments.

Author: Jiang, Zhikai, Su, Li, and Sun, Yuxin
Subjects: OBJECT recognition (Computer vision), FEATURE extraction, ALGORITHMS, SHIPS, MARITIME management
Abstract: Accurate ship object detection ensures navigation safety and effective maritime traffic management. Existing ship target detection models often have the problem of missed detection in complex marine environments, and it is hard to achieve high accuracy and real-time performance simultaneously. To address these issues, this paper proposes a lightweight ship object detection model called YOLOv7-Ship to perform end-to-end ship detection in complex marine environments. At first, we insert the improved "coordinate attention mechanism" (CA-M) in the backbone of the YOLOv7-Tiny model at the appropriate location. Then, the feature extraction capability of the convolution module is enhanced by embedding omnidimensional dynamic convolution (ODconv) into the efficient layer aggregation network (ELAN). Furthermore, content-aware feature reorganization (CARAFE) and SIoU are introduced into the model to improve its convergence speed and detection precision for small targets. Finally, to handle the scarcity of ship data in complex marine environments, we build the ship dataset, which contains 5100 real ship images. Experimental results show that, compared with the baseline YOLOv7-Tiny model, YOLOv7-Ship improves the mean average precision (mAP) by 2.2% on the self-built dataset. The model also has a lightweight feature with a detection speed of 75 frames per second, which can meet the need for real-time detection in complex marine environments to a certain extent, highlighting its advantages for the safety of maritime navigation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. DetTrack: An Algorithm for Multiple Object Tracking by Improving Occlusion Object Detection.

Author: Gao, Xinyue, Wang, Zhengyou, Wang, Xiaofan, Zhang, Shuo, Zhuang, Shanna, and Wang, Hui
Subjects: OBJECT recognition (Computer vision), OBJECT tracking (Computer vision), TRACKING algorithms, COMPUTER vision, ALGORITHMS, KALMAN filtering
Abstract: Multi-object tracking (MOT) is an important problem in computer vision that has a wide range of applications. Currently, object occlusion detecting is still a serious challenge in multi-object tracking tasks. In this paper, we propose a method to simultaneously improve occluded object detection and occluded object tracking, as well as propose a tracking method for when the object is completely occluded. First, motion track prediction is utilized to improve the upper limit of occluded object detection. Then, the spatio-temporal feature information between the object and the surrounding environment is used for multi-object tracking. Finally, we use the hypothesis frame to continuously track the completely occluded object. Our study shows that we achieve competitive performances compared to the current state-of-the-art methods on popular multi-object tracking benchmarks such as MOT16, MOT17, and MOT20. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm.

Author: Chen, Hongmei, Wang, Haifeng, Liu, Zilong, Gu, Dongbing, and Ye, Wen
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, DATA scrubbing, POINT cloud, AUTONOMOUS vehicles, FEATURE extraction
Abstract: Cooperative perception in the field of connected autonomous vehicles (CAVs) aims to overcome the inherent limitations of single-vehicle perception systems, including long-range occlusion, low resolution, and susceptibility to weather interference. In this regard, we propose a high-precision 3D object detection V2V cooperative perception algorithm. The algorithm utilizes a voxel grid-based statistical filter to effectively denoise point cloud data to obtain clean and reliable data. In addition, we design a feature extraction network based on the fusion of voxels and PointPillars and encode it to generate BEV features, which solves the spatial feature interaction problem lacking in the PointPillars approach and enhances the semantic information of the extracted features. A maximum pooling technique is used to reduce the dimensionality and generate pseudoimages, thereby skipping complex 3D convolutional computation. To facilitate effective feature fusion, we design a feature level-based crossvehicle feature fusion module. Experimental validation is conducted using the OPV2V dataset to assess vehicle coperception performance and compare it with existing mainstream coperception algorithms. Ablation experiments are also carried out to confirm the contributions of this approach. Experimental results show that our architecture achieves lightweighting with a higher average precision (AP) than other existing models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Lightweight YOLOv7 Algorithm for Multi-Object Recognition on Contrabands in Terahertz Images.

Author: Ge, Zihao, Zhang, Yuan, Jiang, Yuying, Ge, Hongyi, Wu, Xuyang, Jia, Zhiyuan, Wang, Heng, and Jia, Keke
Subjects: TERAHERTZ technology, OBJECT recognition (Computer vision), FLAMMABLE materials, FEATURE extraction, ALGORITHMS, SELECTIVITY (Psychology), IMAGE processing
Abstract: With the strengthening of worldwide counter-terrorism initiatives, it is increasingly important to detect contrabands such as controlled knives and flammable materials hidden in clothes and bags. Terahertz (THz) imaging technology is widely used in the field of contraband detection due to its advantages of high imaging speed and strong penetration. However, the terahertz images are of poor qualities and lack texture details. Traditional target detection methods suffer from low detection speeds, misdetection, and omission of contraband. This work pre-processes the original dataset using a variety of image processing methods and validates the effect of these methods on the detection results of YOLOv7. Meanwhile, the lightweight and multi-object detection YOLOv7 (LWMD-YOLOv7) algorithm is proposed. Firstly, to meet the demand of real-time for multi-target detection, we propose the space-to-depth mobile (SPD_Mobile) network as the lightweight feature extraction network. Secondly, the selective attention module large selective kernel (LSK) network is integrated into the output of the multi-scale feature map of the LWMD-YOLOv7 network, which enhances the effect of feature fusion and strengthens the network's attention to salient features. Finally, Distance Intersection over Union (DIOU) is used as the loss function to accelerate the convergence of the model and to have a better localisation effect for small targets. The experimental results show that the YOLOv7 algorithm achieves the best detection results on the terahertz image dataset after the non-local mean filtering process. The LWMD-YOLOv7 algorithm achieves a detection accuracy P of 98.5%, a recall R of 97.5%, and a detection speed of 112.4 FPS, which is 26.9 FPS higher than that of the YOLOv7 base network. The LWMD-YOLOv7 achieves a better balance between detection accuracy and detection speed. It provides a technological reference for the automated detection of contraband in terahertz images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. GOI-YOLOv8 Grouping Offset and Isolated GiraffeDet Low-Light Target Detection.

Author: Mei, Mengqing, Zhou, Ziyu, Liu, Wei, and Ye, Zhiwei
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, DETECTORS, ALGORITHMS, NECK
Abstract: In the realm of computer vision, object detection holds significant importance and has demonstrated commendable performance across various scenarios. However, it typically requires favorable visibility conditions within the scene. Therefore, it is imperative to explore methodologies for conducting object detection under low-visibility circumstances. With its balanced combination of speed and accuracy, the state-of-the-art YOLOv8 framework has been recognized as one of the top algorithms for object detection, demonstrating outstanding performance results across a range of standard datasets. Nonetheless, current YOLO-series detection algorithms still face a significant challenge in detecting objects under low-light conditions. This is primarily due to the significant degradation in performance when detectors trained on illuminated data are applied to low-light datasets with limited visibility. To tackle this problem, we suggest a new model named Grouping Offset and Isolated GiraffeDet Target Detection-YOLO based on the YOLOv8 architecture. The proposed model demonstrates exceptional performance under low-light conditions. We employ the repGFPN feature pyramid network in the design of the feature fusion layer neck to enhance hierarchical fusion and deepen the integration of low-light information. Furthermore, we refine the repGFPN feature fusion layer by introducing a sampling map offset to address its limitations in terms of weight and efficiency, thereby better adapting it to real-time applications in low-light environments and emphasizing the potential features of such scenes. Additionally, we utilize group convolution to isolate interference information from detected object edges, resulting in improved detection performance and model efficiency. Experimental results demonstrate that our GOI-YOLO reduces the parameter count by 11% compared to YOLOv8 while decreasing computational requirements by 28%. This optimization significantly enhances real-time performance while achieving a competitive increase of 2.1% in Map50 and 0.6% in Map95 on the ExDark dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. MSPV3D: Multi-Scale Point-Voxels 3D Object Detection Net.

Author: Zhang, Zheng, Bao, Zhiping, Wei, Yun, Zhou, Yongsheng, Li, Ming, and Tian, Qing
Subjects: OBJECT recognition (Computer vision), POINT cloud, DEEP learning, ALGORITHMS, CYCLISTS
Abstract: Autonomous vehicle technology is advancing, with 3D object detection based on point clouds being crucial. However, point clouds' irregularity, sparsity, and large data volume, coupled with irrelevant background points, hinder detection accuracy. We propose a two-stage multi-scale 3D object detection network. Firstly, considering that a large number of useless background points are usually generated by the ground during detection, we propose a new ground filtering algorithm to increase the proportion of foreground points and enhance the accuracy and efficiency of the two-stage detection. Secondly, given that different types of targets to be detected vary in size, and the use of a single-scale voxelization may result in excessive loss of detailed information, the voxels of different scales are introduced to extract relevant features of objects of different scales in the point clouds and integrate them into the second-stage detection. Lastly, a multi-scale feature fusion module is proposed, which simultaneously enhances and integrates features extracted from voxels of different scales. This module fully utilizes the valuable information present in the point cloud across various scales, ultimately leading to more precise 3D object detection. The experiment is conducted on the KITTI dataset and the nuScenes dataset. Compared with our baseline, "Pedestrian" detection improved by 3.37–2.72% and "Cyclist" detection by 3.79–1.32% across difficulty levels on KITTI, and was boosted by 2.4% in NDS and 3.6% in mAP on nuScenes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. A Small-Object-Detection Algorithm Based on LiDAR Point-Cloud Clustering for Autonomous Vehicles.

Author: Duan, Zhibing, Shao, Jinju, Zhang, Meng, Zhang, Jinlei, and Zhai, Zhipeng
Subjects: OBJECT recognition (Computer vision), POINT cloud, LIDAR, ALGORITHMS, CYCLISTS, PEDESTRIANS
Abstract: 3D object-detection based on LiDAR point clouds can help driverless vehicles detect obstacles. However, the existing point-cloud-based object-detection methods are generally ineffective in detecting small objects such as pedestrians and cyclists. Therefore, a small-object-detection algorithm based on clustering is proposed. Firstly, a new segmented ground-point clouds segmentation algorithm is proposed, which filters out the object point clouds according to the heuristic rules and realizes the ground segmentation by multi-region plane-fitting. Then, the small-object point cloud is clustered using an improved DBSCAN clustering algorithm. The K-means++ algorithm for pre-clustering is used, the neighborhood radius is adaptively adjusted according to the distance, and the core point search method of the original algorithm is improved. Finally, the detection of small objects is completed using the directional wraparound box model. After extensive experiments, it was shown that the precision and recall of our proposed ground-segmentation algorithm reached 91.86% and 92.70%, respectively, and the improved DBSCAN clustering algorithm improved the recall of pedestrians and cyclists by 15.89% and 9.50%, respectively. In addition, visualization experiments confirmed that our proposed small-object-detection algorithm based on the point-cloud clustering method can realize the accurate detection of small objects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. A UAV Aerial Image Target Detection Algorithm Based on YOLOv7 Improved Model.

Author: Qin, Jie, Yu, Weihua, Feng, Xiaoxi, Meng, Zuqiang, and Tan, Chaohong
Subjects: OBJECT recognition (Computer vision), FEATURE extraction, ALGORITHMS
Abstract: To address the challenges of multi-scale objects, dense distributions, occlusions, and numerous small targets in UAV image detection, we present CMS-YOLOv7, a real-time target detection method based on an enhanced YOLOv7 model. Firstly, the detection layer P2 for small targets was added to YOLOv7 to enhance the detection ability of small and medium-sized targets, and the deep detection head P5 was taken out to mitigate the influence of excessive downsampling on small target images. The anchor frame was calculated by the K-means++ method. Using the concept of Inner-IoU, the Inner-MPDIoU loss function was constructed to control the range of the auxiliary border and improve detection performance. Furthermore, the CARAFE module was introduced to replace traditional upsampling methods, offering improved integration of semantic information during the image upsampling process and enhancing feature mapping accuracy. Simultaneously, during the feature extraction stage, a non-strided convolutional SPD-Conv module was constructed using space-to-depth techniques. This module replaced certain convolutional operations to minimize the loss of fine-grained information and improve the model's ability to extract features from small targets. Experiments on the UAV aerial photo dataset VisDrone2019 demonstrated that compared with the baseline YOLOv7 object detection algorithm, CMS-YOLOv7 achieved an improvement of 3.5% mAP@0.5, 3.0% mAP@0.5:0.95, and the number of parameters decreased by 18.54 M. The ability of small target detection was significantly enhanced. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Impact of Perception Errors in Vision-Based Detection and Tracking Pipelines on Pedestrian Trajectory Prediction in Autonomous Driving Systems.

Author: Chen, Wen-Hui, Wu, Jiann-Cherng, Davydov, Yury, Yeh, Wei-Chen, and Lin, Yu-Chen
Subjects: OBJECT recognition (Computer vision), AUTONOMOUS vehicles, FORECASTING, ALGORITHMS
Abstract: Pedestrian trajectory prediction is crucial for developing collision avoidance algorithms in autonomous driving systems, aiming to predict the future movement of the detected pedestrians based on their past trajectories. The traditional methods for pedestrian trajectory prediction involve a sequence of tasks, including detection and tracking to gather the historical movement of the observed pedestrians. Consequently, the accuracy of trajectory prediction heavily relies on the accuracy of the detection and tracking models, making it susceptible to their performance. The prior research in trajectory prediction has mainly assessed the model performance using public datasets, which often overlook the errors originating from detection and tracking models. This oversight fails to capture the real-world scenario of inevitable detection and tracking inaccuracies. In this study, we investigate the cumulative effect of errors within integrated detection, tracking, and trajectory prediction pipelines. Through empirical analysis, we examine the errors introduced at each stage of the pipeline and assess their collective impact on the trajectory prediction accuracy. We evaluate these models across various custom datasets collected in Taiwan to provide a comprehensive assessment. Our analysis of the results derived from these integrated pipelines illuminates the significant influence of detection and tracking errors on downstream tasks, such as trajectory prediction and distance estimation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Improved YOLOv8 Algorithm for Water Surface Object Detection.

Author: Wang, Jie and Zhao, Hong
Subjects: OBJECT recognition (Computer vision), WATER waves, GEOMETRIC shapes, ALGORITHMS, NECK
Abstract: To address the issues of decreased detection accuracy, false detections, and missed detections caused by scale differences between near and distant targets and environmental factors (such as lighting and water waves) in surface target detection tasks for uncrewed vessels, the YOLOv8-MSS algorithm is proposed to be used to optimize the detection of water surface targets. By adding a small target detection head, the model becomes more sensitive and accurate in recognizing small targets. To reduce noise interference caused by complex water surface environments during the downsampling process in the backbone network, C2f_MLCA is used to enhance the robustness and stability of the model. The lightweight model SENetV2 is employed in the neck component to improve the model's performance in detecting small targets and its anti-interference capability. The SIoU loss function enhances detection accuracy and bounding box regression precision through shape awareness and geometric information integration. Experiments on the publicly available dataset FloW-Img show that the improved algorithm achieves an mAP@0.5 of 87.9% and an mAP@0.5:0.95 of 47.6%, which are improvements of 5% and 2.6%, respectively, compared to the original model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba.

Author: Wu, Shixiao, Lu, Xingyuan, Guo, Chengcheng, and Guo, Hong
Subjects: OBJECT recognition (Computer vision), FEATURE extraction, DEEP learning, PYRAMIDS, ALGORITHMS
Abstract: (1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, these objects occupy a limited area in the image, resulting in a scarcity of effective features for detection. (2) Methods: To address the detection of small objects in UAV imagery, we introduce a novel algorithm called High-Resolution Feature Pyramid Network Mamba-Based YOLO (HRMamba-YOLO). This algorithm leverages the strengths of a High-Resolution Network (HRNet), EfficientVMamba, and YOLOv8, integrating a Double Spatial Pyramid Pooling (Double SPP) module, an Efficient Mamba Module (EMM), and a Fusion Mamba Module (FMM) to enhance feature extraction and capture contextual information. Additionally, a new Multi-Scale Feature Fusion Network, High-Resolution Feature Pyramid Network (HRFPN), and FMM improved feature interactions and enhanced the performance of small object detection. (3) Results: For the VisDroneDET dataset, the proposed algorithm achieved a 4.4% higher Mean Average Precision (mAP) compared to YOLOv8-m. The experimental results showed that HRMamba achieved a mAP of 37.1%, surpassing YOLOv8-m by 3.8% (Dota1.5 dataset). For the UCAS_AOD dataset and the DIOR dataset, our model had a mAP 1.5% and 0.3% higher than the YOLOv8-m model, respectively. To be fair, all the models were trained without a pre-trained model. (4) Conclusions: This study not only highlights the exceptional performance and efficiency of HRMamba-YOLO in small object detection tasks but also provides innovative solutions and valuable insights for future research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. HP-YOLOv8: High-Precision Small Object Detection Algorithm for Remote Sensing Images.

Author: Yao, Guangzhen, Zhu, Sandong, Zhang, Long, and Qi, Miao
Subjects: OBJECT recognition (Computer vision), REMOTE sensing, ALGORITHMS, NOISE, BRASSIERES
Abstract: YOLOv8, as an efficient object detection method, can swiftly and precisely identify objects within images. However, traditional algorithms encounter difficulties when detecting small objects in remote sensing images, such as missing information, background noise, and interactions among multiple objects in complex scenes, which may affect performance. To tackle these challenges, we propose an enhanced algorithm optimized for detecting small objects in remote sensing images, named HP-YOLOv8. Firstly, we design the C2f-D-Mixer (C2f-DM) module as a replacement for the original C2f module. This module integrates both local and global information, significantly improving the ability to detect features of small objects. Secondly, we introduce a feature fusion technique based on attention mechanisms, named Bi-Level Routing Attention in Gated Feature Pyramid Network (BGFPN). This technique utilizes an efficient feature aggregation network and reparameterization technology to optimize information interaction between different scale feature maps, and through the Bi-Level Routing Attention (BRA) mechanism, it effectively captures critical feature information of small objects. Finally, we propose the Shape Mean Perpendicular Distance Intersection over Union (SMPDIoU) loss function. The method comprehensively considers the shape and size of detection boxes, enhances the model's focus on the attributes of detection boxes, and provides a more accurate bounding box regression loss calculation method. To demonstrate our approach's efficacy, we conducted comprehensive experiments across the RSOD, NWPU VHR-10, and VisDrone2019 datasets. The experimental results show that the HP-YOLOv8 achieves 95.11%, 93.05%, and 53.49% in the mAP@0.5 metric, and 72.03%, 65.37%, and 38.91% in the more stringent mAP@0.5:0.95 metric, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. SPA: Annotating Small Object with a Single Point in Remote Sensing Images.

Author: Zhao, Wenjie, Fang, Zhenyu, Cao, Jun, and Ju, Zhangfeng
Subjects: OBJECT recognition (Computer vision), REMOTE sensing, DETECTORS, ANNOTATIONS, ALGORITHMS
Abstract: Detecting oriented small objects is a critical task in remote sensing, but the development of high-performance deep learning-based detectors is hindered by the need for large-scale and well-annotated datasets. The high cost of creating these datasets, due to the dense and numerous distribution of small objects, significantly limits the application and development of such detectors. To address this problem, we propose a single-point-based annotation approach (SPA) based on the graph cut method. In this framework, user annotations act as the origin of positive sample points, and a similarity matrix, computed from feature maps extracted by deep learning networks, facilitates an intuitive and efficient annotation process for building graph elements. Utilizing the Maximum Flow algorithm, SPA derives positive sample regions from these points and generates oriented bounding boxes (OBBOXs). Experimental results demonstrate the effectiveness of SPA, with at least a 50% improvement in annotation efficiency. Furthermore, the intersection-over-union (IoU) metric of our OBBOX is 3.6% higher than existing methods such as the "Segment Anything Model". When applied in training, the model annotated with SPA shows a 4.7% higher mean average precision (mAP) compared to models using traditional annotation methods. These results confirm the technical advantages and practical impact of SPA in advancing small object detection in remote sensing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. ESE-YOLOv8: A Novel Object Detection Algorithm for Safety Belt Detection during Working at Heights.

Author: Zhou, Qirui, Liu, Dandan, and An, Kang
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, FEATURE extraction, BELTS (Clothing)
Abstract: To address the challenges associated with supervising workers who wear safety belts while working at heights, this study proposes a solution involving the utilization of an object detection model to replace manual supervision. A novel object detection model, named ESE-YOLOv8, is introduced. The integration of the Efficient Multi-Scale Attention (EMA) mechanism within this model enhances information entropy through cross-channel interaction and encodes spatial information into the channels, thereby enabling the model to obtain rich and significant information during feature extraction. By employing GSConv to reconstruct the neck into a slim-neck configuration, the computational load of the neck is reduced without the loss of information entropy, allowing the attention mechanism to function more effectively, thereby improving accuracy. During the model training phase, a regression loss function named the Efficient Intersection over Union (EIoU) is employed to further refine the model's object localization capabilities. Experimental results demonstrate that the ESE-YOLOv8 model achieves an average precision of 92.7% at an IoU threshold of 50% and an average precision of 75.7% within the IoU threshold range of 50% to 95%. These results surpass the performance of the baseline model, the widely utilized YOLOv5 and demonstrate competitiveness among state-of-the-art models. Ablation experiments further confirm the effectiveness of the model's enhancements. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n.

Author: Tian, Yongqiang, Zhao, Chunjiang, Zhang, Taihong, Wu, Huarui, and Zhao, Yunjie
Subjects: OBJECT recognition (Computer vision), DEEP learning, CABBAGE, SPINE, ALGORITHMS
Abstract: To address the problems of low recognition accuracy and slow processing speed when identifying harvest-stage cabbage heads in complex environments, this study proposes a lightweight harvesting period cabbage head recognition algorithm that improves upon YOLOv8n. We propose a YOLOv8n-Cabbage model, integrating an enhanced backbone network, the DyHead (Dynamic Head) module insertion, loss function optimization, and model light-weighting. To assess the proposed method, a comparison with extant mainstream object detection models is conducted. The experimental results indicate that the improved cabbage head recognition model proposed in this study can adapt cabbage head recognition under different lighting conditions and complex backgrounds. With a compact size of 4.8 MB, this model achieves 91% precision, 87.2% recall, and a mAP@50 of 94.5%—the model volume has been reduced while the evaluation metrics have all been improved over the baseline model. The results demonstrate that this model can be applied to the real-time recognition of harvest-stage cabbage heads under complex field environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. YOLO-Based 3D Perception for UVMS Grasping.

Author: Chen, Yanhu, Zhao, Fuqiang, Ling, Yucheng, and Zhang, Suohang
Subjects: OBJECT recognition (Computer vision), DEEP learning, MARINE organisms, ROBOTICS, ALGORITHMS
Abstract: This study develops a YOLO (You Only Look Once)-based 3D perception algorithm for UVMS (Underwater Vehicle-Manipulator Systems) for precise object detection and localization, crucial for enhanced grasping tasks. The object detection algorithm, YOLOv5s-CS, integrates an enhanced YOLOv5s model with C3SE attention and SPPFCSPC feature fusion, optimized for precise detection and two-dimensional localization in underwater environments with sparse features. Distance measurement is further improved by refining the SGBM (Semi-Global Block Matching) algorithm with Census transform and subpixel interpolation. Ablation studies highlight the YOLOv5s-CS model's enhanced performance, with a 3.5% increase in mAP and a 6.4% rise in F1 score over the base YOLOv5s, and a 2.1% mAP improvement with 15% faster execution than YOLOv8s. Implemented on a UVMS, the algorithm successfully conducted pool grasping experiments, proving its applicability for autonomous underwater robotics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. ICDW-YOLO: An Efficient Timber Construction Crack Detection Algorithm.

Author: Zhou, Jieyang, Ning, Jing, Xiang, Zhiyang, and Yin, Pengfei
Subjects: WOODEN building, BUILDING protection, OBJECT recognition (Computer vision), ALGORITHMS, REMOTE sensing, FUSION reactors
Abstract: A robust wood material crack detection algorithm, sensitive to small targets, is indispensable for production and building protection. However, the precise identification and localization of cracks in wooden materials present challenges owing to significant scale variations among cracks and the irregular quality of existing data. In response, we propose a crack detection algorithm tailored to wooden materials, leveraging advancements in the YOLOv8 model, named ICDW-YOLO (improved crack detection for wooden material-YOLO). The ICDW-YOLO model introduces novel designs for the neck network and layer structure, along with an anchor algorithm, which features a dual-layer attention mechanism and dynamic gradient gain characteristics to optimize and enhance the original model. Initially, a new layer structure was crafted using GSConv and GS bottleneck, improving the model's recognition accuracy by maximizing the preservation of hidden channel connections. Subsequently, enhancements to the network are achieved through the gather–distribute mechanism, aimed at augmenting the fusion capability of multi-scale features and introducing a higher-resolution input layer to enhance small target recognition. Empirical results obtained from a customized wooden material crack detection dataset demonstrate the efficacy of the proposed ICDW-YOLO algorithm in effectively detecting targets. Without significant augmentation in model complexity, the mAP50–95 metric attains 79.018%, marking a 1.869% improvement over YOLOv8. Further validation of our algorithm's effectiveness is conducted through experiments on fire and smoke detection datasets, aerial remote sensing image datasets, and the coco128 dataset. The results showcase that ICDW-YOLO achieves a mAP50 of 69.226% and a mAP50–95 of 44.210%, indicating robust generalization and competitiveness vis-à-vis state-of-the-art detectors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Path Planning for Autonomous Mobile Robot Using Intelligent Algorithms.

Author: Galarza-Falfan, Jorge, García-Guerrero, Enrique Efrén, Aguirre-Castro, Oscar Adrian, López-Bonilla, Oscar Roberto, Tamayo-Pérez, Ulises Jesús, Cárdenas-Valdez, José Ricardo, Hernández-Mejía, Carlos, Borrego-Dominguez, Susana, and Inzunza-Gonzalez, Everardo
Subjects: MOBILE robots, AUTONOMOUS robots, ARTIFICIAL intelligence, PATTERN recognition systems, OBJECT recognition (Computer vision), ALGORITHMS
Abstract: Machine learning technologies are being integrated into robotic systems faster to enhance their efficacy and adaptability in dynamic environments. The primary goal of this research was to propose a method to develop an Autonomous Mobile Robot (AMR) that integrates Simultaneous Localization and Mapping (SLAM), odometry, and artificial vision based on deep learning (DL). All are executed on a high-performance Jetson Nano embedded system, specifically emphasizing SLAM-based obstacle avoidance and path planning using the Adaptive Monte Carlo Localization (AMCL) algorithm. Two Convolutional Neural Networks (CNNs) were selected due to their proven effectiveness in image and pattern recognition tasks. The ResNet18 and YOLOv3 algorithms facilitate scene perception, enabling the robot to interpret its environment effectively. Both algorithms were implemented for real-time object detection, identifying and classifying objects within the robot's environment. These algorithms were selected to evaluate their performance metrics, which are critical for real-time applications. A comparative analysis of the proposed DL models focused on enhancing vision systems for autonomous mobile robots. Several simulations and real-world trials were conducted to evaluate the performance and adaptability of these models in navigating complex environments. The proposed vision system with CNN ResNet18 achieved an average accuracy of 98.5%, a precision of 96.91%, a recall of 97%, and an F1-score of 98.5%. However, the YOLOv3 model achieved an average accuracy of 96%, a precision of 96.2%, a recall of 96%, and an F1-score of 95.99%. These results underscore the effectiveness of the proposed intelligent algorithms, robust embedded hardware, and sensors in robotic applications. This study proves that advanced DL algorithms work well in robots and could be used in many fields, such as transportation and assembly. As a consequence of the findings, intelligent systems could be implemented more widely in the operation and development of AMRs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Object Detection in Hazy Environments, Based on an All-in-One Dehazing Network and the YOLOv5 Algorithm.

Author: Li, Aijuan, Xu, Guangpeng, Yue, Wenpeng, Xu, Chuanyan, Gong, Chunpeng, and Cao, Jiaping
Subjects: OBJECT recognition (Computer vision), FEATURE extraction, ALGORITHMS, GEOGRAPHICAL perception, AUTONOMOUS vehicles, INTELLIGENT transportation systems
Abstract: This study introduces an advanced algorithm for intelligent vehicle target detection in hazy conditions, aiming to bolster the environmental perception capabilities of autonomous vehicles. The proposed approach integrates a hybrid convolutional module (HDC) into an all-in-one dehazing network, AOD-Net, to expand the perceptual domain for image feature extraction and refine the clarity of dehazed images. To accelerate model convergence and enhance generalization, the loss function has been optimized. For practical deployment in intelligent vehicle systems, the ShuffleNetv2 lightweight network module is incorporated into the YOLOv5s network backbone, and the feature pyramid network (FPN) within the neck network has been refined. Additionally, the network employs a global shuffle convolution (GSconv) to balance accuracy with parameter count. To further focus on the target, a convolutional block attention module (CBAM) is introduced, which helps in reducing the network's parameter count without compromising accuracy. A comparative experiment was conducted, and the results indicated that our algorithm achieved an impressive mean average precision (mAP) of 76.8% at an intersection-over-union (IoU) threshold of 0.5 in hazy conditions, outperforming YOLOv5 by 7.4 percentage points. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. MCF-YOLOv5: A Small Target Detection Algorithm Based on Multi-Scale Feature Fusion Improved YOLOv5.

Author: Gao, Song, Gao, Mingwang, and Wei, Zhihui
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, DATA augmentation, COMPUTATIONAL complexity, FEATURE extraction, DEEP learning
Abstract: In recent years, many deep learning-based object detection methods have performed well in various applications, especially in large-scale object detection. However, when detecting small targets, previous object detection algorithms cannot achieve good results due to the characteristics of the small targets themselves. To address the aforementioned issues, we propose the small object algorithm model MCF-YOLOv5, which has undergone three improvements based on YOLOv5. Firstly, a data augmentation strategy combining Mixup and Mosaic is used to increase the number of small targets in the image and reduce the interference of noise and changes in detection. Secondly, in order to accurately locate the position of small targets and reduce the impact of unimportant information on small targets in the image, the attention mechanism coordinate attention is introduced in YOLOv5's neck network. Finally, we improve the Feature Pyramid Network (FPN) structure and add a small object detection layer to enhance the feature extraction ability of small objects and improve the detection accuracy of small objects. The experimental results show that, with a small increase in computational complexity, the proposed MCF-YOLOv5 achieves better performance than the baseline on both the VisDrone2021 dataset and the Tsinghua Tencent100K dataset. Compared with YOLOv5, MCF-YOLOv5 has improved detection APsmall by 3.3% and 3.6%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. An Oracle Bone Inscriptions Detection Algorithm Based on Improved YOLOv8.

Author: Zhen, Qianqian, Wu, Liang, and Liu, Guoying
Subjects: INSCRIPTIONS, ALGORITHMS, CHINESE characters, DEEP learning, OBJECT recognition (Computer vision)
Abstract: Ancient Chinese characters known as oracle bone inscriptions (OBIs) were inscribed on turtle shells and animal bones, and they boast a rich history dating back over 3600 years. The detection of OBIs is one of the most basic tasks in OBI research. The current research aimed to determine the precise location of OBIs with rubbing images. Given the low clarity, severe noise, and cracks in oracle bone inscriptions, the mainstream networks within the realm of deep learning possess low detection accuracy on the OBI detection dataset. To address this issue, this study analyzed the significant research progress in oracle bone script detection both domestically and internationally. Then, based on the YOLOv8 algorithm, according to the characteristics of OBI rubbing images, the algorithm was improved accordingly. The proposed algorithm added a small target detection head, modified the loss function, and embedded a CBAM. The results show that the improved model achieves an F-measure of 84.3%, surpassing the baseline model by approximately 1.8%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. EF-UODA: Underwater Object Detection Based on Enhanced Feature.

Author: Zu, Yunqin, Zhang, Lixun, Li, Siqi, Fan, Yuhe, and Liu, Qijia
Subjects: OBJECT recognition (Computer vision), MARINE engineering, FEATURE extraction, PYRAMIDS, ENVIRONMENTAL engineering, ALGORITHMS
Abstract: The ability to detect underwater objects accurately is important in marine environmental engineering. Although many kinds of underwater object detection algorithms with relatively high accuracy have been proposed, they involve a large number of parameters and floating point operations (FLOPs), and often fail to yield satisfactory results in complex underwater environments. In light of the demand for an algorithm with the capability to extract high-quality features in complex underwater environments, we proposed a one-stage object detection algorithm called the enhanced feature-based underwater object detection algorithm (EF-UODA), which was based on the architecture of Next-ViT, the loss function of YOLOv8, and Ultralytics. First, we developed a highly efficient module for convolutions, called efficient multi-scale pointwise convolution (EMPC). Second, we proposed a feature pyramid architecture called the multipath fast fusion-feature pyramid network (M2F-FPN) based on different modes of feature fusion. Finally, we integrated the Next-ViT and the minimum point distance intersection over union loss functions in our proposed algorithm. Specifically, on the URPC2020 dataset, EF-UODA surpasses the state-of-the-art (SOTA) convolution-based object detection algorithm YOLOv8X by 2.9% mean average precision (mAP), and surpasses the SOTA ViT-based object detection algorithm real-time detection transformer (RT-DETR) by 2.1%. Meanwhile, it achieves the lowest FLOPs and parameters. The results of extensive experiments showed that EF-UODA had excellent feature extraction capability, and was adequately balanced in terms of the number of FLOPs and parameters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. PDT-YOLO: A Roadside Object-Detection Algorithm for Multiscale and Occluded Targets.

Author: Liu, Ruoying, Huang, Miaohua, Wang, Liangzi, Bi, Chengcheng, and Tao, Ye
Subjects: OBJECT recognition (Computer vision), ROADSIDE improvement, FEATURE extraction, ALGORITHMS, TRACKING algorithms, QUALITY function deployment
Abstract: To tackle the challenges of weak sensing capacity for multi-scale objects, high missed detection rates for occluded targets, and difficulties for model deployment in detection tasks of intelligent roadside perception systems, the PDT-YOLO algorithm based on YOLOv7-tiny is proposed. Firstly, we introduce the intra-scale feature interaction module (AIFI) and reconstruct the feature pyramid structure to enhance the detection accuracy of multi-scale targets. Secondly, a lightweight convolution module (GSConv) is introduced to construct a multi-scale efficient layer aggregation network module (ETG), enhancing the network feature extraction ability while maintaining weight. Thirdly, multi-attention mechanisms are integrated to optimize the feature expression ability of occluded targets in complex scenarios, Finally, Wise-IoU with a dynamic non-monotonic focusing mechanism improves the accuracy and generalization ability of model sensing. Compared with YOLOv7-tiny, PDT-YOLO on the DAIR-V2X-C dataset improves mAP50 and mAP50:95 by 4.6% and 12.8%, with a parameter count of 6.1 million; on the IVODC dataset by 15.7% and 11.1%. We deployed the PDT-YOLO in an actual traffic environment based on a robot operating system (ROS), with a detection frame rate of 90 FPS, which can meet the needs of roadside object detection and edge deployment in complex traffic scenes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Defect Detection Algorithm for Battery Cell Casings Based on Dual-Coordinate Attention and Small Object Loss Feedback.

Author: Li, Tianjian, Ren, Jiale, Yang, Qingping, Chen, Long, and Sun, Xizhi
Subjects: OBJECT recognition (Computer vision), WRINKLES (Skin), FEATURE extraction, ALGORITHMS, PSYCHOLOGICAL feedback
Abstract: To address the issue of low accuracy in detecting defects of battery cell casings with low space ratio and small object characteristics, the low space ratio feature and small object feature are studied, and an object detection algorithm based on dual-coordinate attention and small object loss feedback is proposed. Firstly, the EfficientNet-B1 backbone network is employed for feature extraction. Secondly, a dual-coordinate attention module is introduced to preserve more positional information through dual branches and embed the positional information into channel attention for precise localization of the low space ratio features. Finally, a small object loss feedback module is incorporated after the bidirectional feature pyramid network (BiFPN) for feature fusion, balancing the contribution of small object loss to the overall loss. Experimental comparisons on a battery cell casing dataset demonstrate that the proposed algorithm outperforms the EfficientDet-D1 object detection algorithm, with an average precision improvement of 4.23%. Specifically, for scratches with low space ratio features, the improvement is 13.21%; for wrinkles with low space ratio features, the improvement is 9.35%; and for holes with small object features, the improvement is 3.81%. Moreover, the detection time of 47.6 ms meets the requirements of practical production. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Improved YOLOv7 Algorithm for Small Object Detection in Unmanned Aerial Vehicle Image Scenarios.

Author: Li, Xinmin, Wei, Yingkun, Li, Jiahui, Duan, Wenwen, Zhang, Xiaoqiang, and Huang, Yi
Subjects: OBJECT recognition (Computer vision), DEEP learning, ALGORITHMS, DRONE aircraft, ENERGY consumption
Abstract: Object detection in unmanned aerial vehicle (UAV) images has become a popular research topic in recent years. However, UAV images are captured from high altitudes with a large proportion of small objects and dense object regions, posing a significant challenge to small object detection. To solve this issue, we propose an efficient YOLOv7-UAV algorithm in which a low-level prediction head (P2) is added to detect small objects from the shallow feature map, and a deep-level prediction head (P5) is removed to reduce the effect of excessive down-sampling. Furthermore, we modify the bidirectional feature pyramid network (BiFPN) structure with a weighted cross-level connection to enhance the fusion effectiveness of multi-scale feature maps in UAV images. To mitigate the mismatch between the prediction box and ground-truth box, the SCYLLA-IoU (SIoU) function is employed in the regression loss to accelerate the training convergence process. Moreover, the proposed YOLOv7-UAV algorithm has been quantified and compiled in the Vitis-AI development environment and validated in terms of power consumption and hardware resources on the FPGA platform. The experiments show that the resource consumption of YOLOv7-UAV is reduced by 28%, the mAP is improved by 3.9% compared to YOLOv7, and the FPGA implementation improves the energy efficiency by 12 times compared to the GPU. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Traffic Sign Detection and Recognition Using YOLO Object Detection Algorithm: A Systematic Review.

Author: Flores-Calero, Marco, Astudillo, César A., Guevara, Diego, Maza, Jessica, Lita, Bryan S., Defaz, Bryan, Ante, Juan S., Zabala-Blanco, David, and Armingol Moreno, José María
Subjects: TRAFFIC monitoring, TRAFFIC signs & signals, ARTIFICIAL neural networks, INTELLIGENT transportation systems, ALGORITHMS, OBJECT recognition (Computer vision), MOBILE operating systems, IRIS recognition
Abstract: Context: YOLO (You Look Only Once) is an algorithm based on deep neural networks with real-time object detection capabilities. This state-of-the-art technology is widely available, mainly due to its speed and precision. Since its conception, YOLO has been applied to detect and recognize traffic signs, pedestrians, traffic lights, vehicles, and so on. Objective: The goal of this research is to systematically analyze the YOLO object detection algorithm, applied to traffic sign detection and recognition systems, from five relevant aspects of this technology: applications, datasets, metrics, hardware, and challenges. Method: This study performs a systematic literature review (SLR) of studies on traffic sign detection and recognition using YOLO published in the years 2016–2022. Results: The search found 115 primary studies relevant to the goal of this research. After analyzing these investigations, the following relevant results were obtained. The most common applications of YOLO in this field are vehicular security and intelligent and autonomous vehicles. The majority of the sign datasets used to train, test, and validate YOLO-based systems are publicly available, with an emphasis on datasets from Germany and China. It has also been discovered that most works present sophisticated detection, classification, and processing speed metrics for traffic sign detection and recognition systems by using the different versions of YOLO. In addition, the most popular desktop data processing hardwares are Nvidia RTX 2080 and Titan Tesla V100 and, in the case of embedded or mobile GPU platforms, Jetson Xavier NX. Finally, seven relevant challenges that these systems face when operating in real road conditions have been identified. With this in mind, research has been reclassified to address these challenges in each case. Conclusions: This SLR is the most relevant and current work in the field of technology development applied to the detection and recognition of traffic signs using YOLO. In addition, insights are provided about future work that could be conducted to improve the field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Equal Emphasis on Data and Network: A Two-Stage 3D Point Cloud Object Detection Algorithm with Feature Alignment.

Author: Xiao, Kai, Li, Teng, Li, Jun, Huang, Da, and Peng, Yuanxi
Subjects: OBJECT recognition (Computer vision), POINT cloud, COMPUTER vision, ALGORITHMS, AERONAUTICAL navigation, DEEP learning, MULTISPECTRAL imaging
Abstract: Three-dimensional object detection is a pivotal research topic in computer vision, aiming to identify and locate objects in three-dimensional space. It has wide applications in various fields such as geoscience, autonomous driving, and drone navigation. The rapid development of deep learning techniques has led to significant advancements in 3D object detection. However, with the increasing complexity of applications, 3D object detection faces a series of challenges such as data imbalance and the effectiveness of network models. Specifically, in an experiment, our investigation revealed a notable discrepancy in the LiDAR reflection intensity within a point cloud scene, with stronger intensities observed in proximity and weaker intensities observed at a distance. Furthermore, we have also noted a substantial disparity in the number of foreground points compared to the number of background points. Especially in 3D object detection, the foreground point is more important than the background point, but it is usually downsampled without discrimination in the subsequent processing. With the objective of tackling these challenges, we work from both data and network perspectives, designing a feature alignment filtering algorithm and a two-stage 3D object detection network. Firstly, in order to achieve feature alignment, we introduce a correction equation to decouple the relationship between distance and intensity and eliminate the attenuation effect of intensity caused by distance. Then, a background point filtering algorithm is designed by using the aligned data to alleviate the problem of data imbalance. At the same time, we take into consideration the fact that the accuracy of semantic segmentation plays a crucial role in 3D object detection. Therefore, we propose a two-stage deep learning network that integrates spatial and spectral information, in which a feature fusion branch is designed and embedded in the semantic segmentation backbone. Through a series of experiments on the KITTI dataset, it is proven that the proposed method achieves the following average precision (AP_R40) values for easy, moderate, and hard difficulties, respectively: car (Iou 0.7)—89.23%, 80.14%, and 77.89%; pedestrian (Iou 0.5)—52.32%, 45.47%, and 38.78%; and cyclist (Iou 0.5)—76.41%, 61.92%, and 56.39%. By emphasizing both data quality optimization and efficient network architecture, the performance of the proposed method is made comparable to other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. A Lightweight Detection Method for Blueberry Fruit Maturity Based on an Improved YOLOv5 Algorithm.

Author: Xiao, Feng, Wang, Haibin, Xu, Yueqin, and Shi, Zhen
Subjects: FRUIT, ALGORITHMS, OBJECT recognition (Computer vision), BLUEBERRIES, COMPUTER vision
Abstract: In order to achieve accurate, fast, and robust recognition of blueberry fruit maturity stages for edge devices such as orchard inspection robots, this research proposes a lightweight detection method based on an improved YOLOv5 algorithm. In the improved YOLOv5 algorithm, the ShuffleNet module is used to achieve lightweight deep-convolutional neural networks. The Convolutional Block Attention Module (CBAM) is also used to enhance the feature fusion capability of lightweight deep-convolutional neural networks. The effectiveness of this method is evaluated using the blueberry fruit dataset. The experimental results demonstrate that this method can effectively detect blueberry fruits and recognize their maturity stages in orchard environments. The average recall (R) of the detection is 92.0%. The mean average precision (mAP) of the detection at a threshold of 0.5 is 91.5%. The average speed of the detection is 67.1 frames per second (fps). Compared to other detection algorithms, such as YOLOv5, SSD, and Faster R-CNN, this method has a smaller model size, smaller network parameters, lower memory usage, lower computation usage, and faster detection speed while maintaining high detection performance. It is more suitable for migration and deployment on edge devices. This research can serve as a reference for the development of fruit detection systems for intelligent orchard devices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

64 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources