549 results
Search Results
2. Multitask Learning for Extensive Object Description to Improve Scene Understanding on Monocular Video
- Author
-
Basharov, Ilya, Yudin, Dmitry, Kacprzyk, Janusz, Series Editor, Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, Redko, Vladimir, editor, and Tiumentsev, Yury, editor
- Published
- 2023
- Full Text
- View/download PDF
3. Semi-automated Generation of Accurate Ground-Truth for 3D Object Detection
- Author
-
Zwemer, M. H., Scholte, D., de With, P. H. N., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, de Sousa, A. Augusto, editor, Debattista, Kurt, editor, Paljic, Alexis, editor, Ziat, Mounia, editor, Hurter, Christophe, editor, Purchase, Helen, editor, Farinella, Giovanni Maria, editor, Radeva, Petia, editor, and Bouatouch, Kadi, editor
- Published
- 2023
- Full Text
- View/download PDF
4. GridPointNet: Grid and Point-Based 3D Object Detection from Point Cloud
- Author
-
Wu, Quanming, Yu, Yuanlong, Luo, Tao, Lu, Peiyuan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Sun, Fuchun, editor, Hu, Dewen, editor, Wermter, Stefan, editor, Yang, Lei, editor, Liu, Huaping, editor, and Fang, Bin, editor
- Published
- 2022
- Full Text
- View/download PDF
5. Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection
- Author
-
Ouyang, Erli, Zhang, Li, Chen, Mohan, Arnab, Anurag, Fu, Yanwei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ishikawa, Hiroshi, editor, Liu, Cheng-Lin, editor, Pajdla, Tomas, editor, and Shi, Jianbo, editor
- Published
- 2021
- Full Text
- View/download PDF
6. Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning
- Author
-
Gay, Paul, Stuart, James, Del Bue, Alessio, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Jawahar, C. V., editor, Li, Hongdong, editor, Mori, Greg, editor, and Schindler, Konrad, editor
- Published
- 2019
- Full Text
- View/download PDF
7. Dynamic categorization of 3D objects for mobile service robots
- Author
-
Rios-Cabrera, Reyes, Lopez-Juarez, Ismael, Maldonado-Ramirez, Alejandro, Alvarez-Hernandez, Arturo, and Maldonado-Ramirez, Alan de Jesus
- Published
- 2021
- Full Text
- View/download PDF
8. EFMF-pillars: 3D object detection based on enhanced features and multi-scale fusion.
- Author
-
Zhang, Wenbiao, Chen, Gang, Wang, Hongyan, Yang, Lina, and Sun, Tao
- Subjects
OBJECT recognition (Computer vision) ,FEATURE extraction ,TRAFFIC safety ,AUTONOMOUS vehicles ,ALGORITHMS - Abstract
As unmanned vehicle technology advances rapidly, obstacle recognition and target detection are crucial links, which directly affect the driving safety and efficiency of unmanned vehicles. In response to the inaccurate localization of small targets such as pedestrians in current object detection tasks and the problem of losing local features in the PointPillars, this paper proposes a three-dimensional object detection method based on improved PointPillars. Firstly, addressing the issue of lost spatial and local information in the PointPillars, the feature encoding part of the PointPillars is improved, and a new pillar feature enhancement extraction module, CSM-Module, is proposed. Channel encoding and spatial encoding are introduced in the new pillar feature enhancement extraction module, fully considering the spatial information and local detailed geometric information of each pillar, thereby enhancing the feature representation capability of each pillar. Secondly, based on the fusion of CSPDarknet and SENet, a new backbone network CSE-Net is designed in this paper, enabling the extraction of rich contextual semantic information and multi-scale global features, thereby enhancing the feature extraction capability. Our method achieves higher detection accuracy when validated on the KITTI dataset. Compared to the original network, the improved algorithm's average detection accuracy is increased by 3.42%, it shows that the method is reasonable and valuable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. OA-Net: outlier weakening and adaptive voxel encoding-based 3d object detection network.
- Author
-
Wang, Chuanxu, Qin, Jianwei, and Fu, Xiaoshan
- Abstract
This paper focuses on the adverse impact of outlier points and the ambiguity of candidate localizations in 3D object detection in terms of point cloud dataset. First, outlier points can disperse real feature extracting and mislead object detection, we propose an outlier weakening strategy. The neighborhood points of each point in the point set can be established via multi-directional search algorithm, and the correlations among points in the neighborhood are figured out via self-attention mechanism, then each point representation can be enhanced with the key information from its neighborhood, thus the negative impact of outlier points will be weakened due to obtaining real knowledge of object from neighborhood context. Second, multiple proposed boxes for object localization usually containing the same sampling points, this causes vagueness in differing them from each other and leads to incorrect object positioning. This paper proposes a voxel coding strategy with adaptive pooling, the candidate boxes are divided into voxels, and each voxel is further divided into multiple columns, then they are weighted and aggregated according to the importance of each column, thus can pop out the most confident spatial voxel encodings as reliable object localization nominees. This algorithm achieves an average accuracy of 82.98% and 93.2% on the KITTI dataset Car category and ModelNet40 dataset, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection.
- Author
-
Peicheng Shi, Zhiqiang Liu, Heng Qi, and Aixi Yang
- Subjects
OBJECT recognition (Computer vision) ,MAP projection ,POINT cloud ,AUTONOMOUS vehicles - Abstract
In complex traffic environment scenarios, it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance. The accuracy of 3D object detection will be affected by problems such as illumination changes, object occlusion, and object detection distance. To this purpose, we face these challenges by proposing a multimodal feature fusion network for 3D object detection (MFF-Net). In this research, this paper first uses the spatial transformation projection algorithm to map the image features into the feature space, so that the image features are in the same spatial dimension when fused with the point cloud features. Then, feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features, suppress useless features, and increase the directionality of the network to features. Finally, this paper increases the probability of false detection and missed detection in the non-maximum suppression algorithm by increasing the one-dimensional threshold. So far, this paper has constructed a complete 3D target detection network based on multimodal feature fusion. The experimental results show that the proposed achieves an average accuracy of 82.60% on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset, outperforming previous state-of-the-art multimodal fusion networks. In Easy, Moderate, and hard evaluation indicators, the accuracy rate of this paper reaches 90.96%, 81.46%, and 75.39%. This shows that the MFF-Net network has good performance in 3D object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. IRBEVF-Q: Optimization of Image–Radar Fusion Algorithm Based on Bird's Eye View Features.
- Author
-
Cai, Ganlin, Chen, Feng, and Guo, Ente
- Subjects
OBJECT recognition (Computer vision) ,ALGORITHMS ,VIDEO coding ,AUTONOMOUS vehicles ,CAMERAS ,PROBLEM solving - Abstract
In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly consists of BEV (Bird's Eye View) fusion coding module and an object decoder module.The BEV fusion coding module solves the problem of unified representation of different modal information by fusing the image and radar features through 3D spatial reference points as a medium. The query in the object decoder, as a core component, plays an important role in detection. In this paper, Heat Map-Guided Query Initialization (HGQI) and Dynamic Position Encoding (DPE) are proposed in query construction to increase the a priori information of the query. The Auxiliary Noise Query (ANQ) then helps to stabilize the matching. The experimental results demonstrate that the proposed fusion model IRBEVF-Q achieves an NDS of 0.575 and a mAP of 0.476 on the nuScenes test set. Compared to recent state-of-the-art methods, our model shows significant advantages, thus indicating that our approach contributes to improving detection accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Rethinking the Non-Maximum Suppression Step in 3D Object Detection from a Bird's-Eye View.
- Author
-
Li, Bohao, Song, Shaojing, and Ai, Luxia
- Abstract
In camera-based bird's-eye view (BEV) 3D object detection, non-maximum suppression (NMS) plays a crucial role. However, traditional NMS methods become ineffective in BEV scenarios where the predicted bounding boxes of small object instances often have no overlapping areas. To address this issue, this paper proposes a BEV intersection over union (IoU) computation method based on relative position and absolute spatial information, referred to as B-IoU. Additionally, a BEV circular search method, called B-Grouping, is introduced to handle prediction boxes of varying scales. Utilizing these two methods, a novel NMS strategy called BEV-NMS is developed to handle the complex prediction boxes in BEV perspectives. This BEV-NMS strategy is implemented in several existing algorithms. Based on the results from the nuScenes validation set, there was an average increase of 7.9% in mAP when compared to the strategy without NMS. The NDS also showed an average increase of 7.9% under the same comparison. Furthermore, compared to the Scale-NMS strategy, the mAP increased by an average of 3.4%, and the NDS saw an average improvement of 3.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Recent Advances in 3D Object Detection for Self-Driving Vehicles: A Survey.
- Author
-
Fawole, Oluwajuwon A. and Rawat, Danda B.
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,MULTISENSOR data fusion ,DEEP learning ,ALGORITHMS - Abstract
The development of self-driving or autonomous vehicles has led to significant advancements in 3D object detection technologies, which are critical for the safety and efficiency of autonomous driving. Despite recent advances, several challenges remain in sensor integration, handling sparse and noisy data, and ensuring reliable performance across diverse environmental conditions. This paper comprehensively surveys state-of-the-art 3D object detection techniques for autonomous vehicles, emphasizing the importance of multi-sensor fusion techniques and advanced deep learning models. Furthermore, we present key areas for future research, including enhancing sensor fusion algorithms, improving computational efficiency, and addressing ethical, security, and privacy concerns. The integration of these technologies into real-world applications for autonomous driving is presented by highlighting potential benefits and limitations. We also present a side-by-side comparison of different techniques in a tabular form. Through a comprehensive review, this paper aims to provide insights into the future directions of 3D object detection and its impact on the evolution of autonomous driving. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Real-Time Multimodal 3D Object Detection with Transformers.
- Author
-
Liu, Hengsong and Duan, Tongle
- Subjects
CONVOLUTIONAL neural networks ,OBJECT recognition (Computer vision) ,TRANSFORMER models ,LIDAR ,CAMERAS - Abstract
The accuracy and real-time performance of 3D object detection are key factors limiting its widespread application. While cameras capture detailed color and texture features, they lack depth information compared to LiDAR. Multimodal detection combining both can improve results but incurs significant computational overhead, affecting real-time performance. To address these challenges, this paper presents a real-time multimodal fusion model called Fast Transfusion that combines the benefits of LiDAR and camera sensors and reduces the computational burden of their fusion. Specifically, our Fast Transfusion method uses QConv (Quick Convolution) to replace the convolutional backbones compared to other models. QConv concentrates the convolution operations at the feature map center, where the most information resides, to expedite inference. It also utilizes deformable convolution to better match the actual shapes of detected objects, enhancing accuracy. And the model incorporates EH Decoder (Efficient and Hybrid Decoder) which decouples multiscale fusion into intra-scale interaction and cross-scale fusion, efficiently decoding and integrating features extracted from multimodal data. Furthermore, our proposed semi-dynamic query selection refines the initialization of object queries. On the KITTI 3D object detection dataset, our proposed approach reduced the inference time by 36 ms and improved 3D AP by 1.81% compared to state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Enhancing urban landscape analysis through combined LiDAR and visual image data preprocessing
- Author
-
Saravanarajan, Vani Suthamathi, Chen, Rung-Ching, and Manongga, William Eric
- Published
- 2024
- Full Text
- View/download PDF
16. Geometric relation-based feature aggregation for 3D small object detection
- Author
-
Yang, Wenbin, Yu, Hang, Luo, Xiangfeng, and Xie, Shaorong
- Published
- 2024
- Full Text
- View/download PDF
17. 图像语义特征引导与点云跨模态融合的三维目标检测方法.
- Author
-
李辉, 王俊印, 程远志, 刘健, 赵国伟, and 陈双敏
- Abstract
Copyright of Journal of Computer-Aided Design & Computer Graphics / Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
18. Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement.
- Author
-
Li, Yangyang, Ou, Zejun, Liu, Guangyuan, Yang, Zichen, Chen, Yanqiao, Shang, Ronghua, and Jiao, Licheng
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,OPTICAL radar ,LIDAR ,FEATURE extraction - Abstract
With the continuous emergence and development of 3D sensors in recent years, it has become increasingly convenient to collect point cloud data for 3D object detection tasks, such as the field of autonomous driving. But when using these existing methods, there are two problems that cannot be ignored: (1) The bird's eye view (BEV) is a widely used method in 3D objective detection; however, the BEV usually compresses dimensions by combined height, dimension, and channels, which makes the process of feature extraction in feature fusion more difficult. (2) Light detection and ranging (LiDAR) has a much larger effective scanning depth, which causes the sector to become sparse in deep space and the uneven distribution of point cloud data. This results in few features in the distribution of neighboring points around the key points of interest. The following is the solution proposed in this paper: (1) This paper proposes multi-scale feature fusion composed of feature maps at different levels made of Deep Layer Aggregation (DLA) and a feature fusion module for the BEV. (2) A point completion network is used to improve the prediction results by completing the feature points inside the candidate boxes in the second stage, thereby strengthening their position features. Supervised contrastive learning is applied to enhance the segmentation results, improving the discrimination capability between the foreground and background. Experiments show these new additions can achieve improvements of 2.7%, 2.4%, and 2.5%, respectively, on KITTI easy, moderate, and hard tasks. Further ablation experiments show that each addition has promising improvement over the baseline. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. S2S-Sim: A Benchmark Dataset for Ship Cooperative 3D Object Detection.
- Author
-
Yang, Wenbin, Wang, Xinzhi, Luo, Xiangfeng, Xie, Shaorong, and Chen, Junxi
- Subjects
OBJECT recognition (Computer vision) ,CONTAINER ships ,NAVIGATION in shipping ,SHIP models ,SHIPS ,CRUISE ships ,AUTONOMOUS vehicles - Abstract
The rapid development of vehicle cooperative 3D object-detection technology has significantly improved the perception capabilities of autonomous driving systems. However, ship cooperative perception technology has received limited research attention compared to autonomous driving, primarily due to the lack of appropriate ship cooperative perception datasets. To address this gap, this paper proposes S2S-sim, a novel ship cooperative perception dataset. Ship navigation scenarios were constructed using Unity3D, and accurate ship models were incorporated while simulating sensor parameters of real LiDAR sensors to collect data. The dataset comprises three typical ship navigation scenarios, including ports, islands, and open waters, featuring common ship classes such as container ships, bulk carriers, and cruise ships. It consists of 7000 frames with 96,881 annotated ship bounding boxes. Leveraging this dataset, we assess the performance of mainstream vehicle cooperative perception models when transferred to ship cooperative perception scenes. Furthermore, considering the characteristics of ship navigation data, we propose a regional clustering fusion-based ship cooperative 3D object-detection method. Experimental results demonstrate that our approach achieves state-of-the-art performance in 3D ship object detection, indicating its suitability for ship cooperative perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Three-Dimensional Outdoor Object Detection in Quadrupedal Robots for Surveillance Navigations.
- Author
-
Tanveer, Muhammad Hassan, Fatima, Zainab, Mariam, Hira, Rehman, Tanazzah, and Voicu, Razvan Cristian
- Abstract
Quadrupedal robots are confronted with the intricate challenge of navigating dynamic environments fraught with diverse and unpredictable scenarios. Effectively identifying and responding to obstacles is paramount for ensuring safe and reliable navigation. This paper introduces a pioneering method for 3D object detection, termed viewpoint feature histograms, which leverages the established paradigm of 2D detection in projection. By translating 2D bounding boxes into 3D object proposals, this approach not only enables the reuse of existing 2D detectors but also significantly increases the performance with less computation required, allowing for real-time detection. Our method is versatile, targeting both bird's eye view objects (e.g., cars) and frontal view objects (e.g., pedestrians), accommodating various types of 2D object detectors. We showcase the efficacy of our approach through the integration of YOLO3D, utilizing LiDAR point clouds on the KITTI dataset, to achieve real-time efficiency aligned with the demands of autonomous vehicle navigation. Our model selection process, tailored to the specific needs of quadrupedal robots, emphasizes considerations such as model complexity, inference speed, and customization flexibility, achieving an accuracy of up to 99.93%. This research represents a significant advancement in enabling quadrupedal robots to navigate complex and dynamic environments with heightened precision and safety. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. CPU 环境下多传感器数据融合的机器人 3D 目标检测方法.
- Author
-
楼 进, 刘恩博, 唐 炜, and 张仁远
- Subjects
OBJECT recognition (Computer vision) ,OPTICAL radar ,MULTISENSOR data fusion ,MOBILE robots ,DATA mining - Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
22. DMFusion: LiDAR-camera fusion framework with depth merging and temporal aggregation.
- Author
-
Yu, Xinyi, Lu, Ke, Yang, Yang, and Ou, Linlin
- Subjects
OBJECT recognition (Computer vision) ,THREE-dimensional imaging ,POINT cloud ,AUTONOMOUS vehicles ,PROBLEM solving - Abstract
Multimodal 3D object detection is an active research topic in the field of autonomous driving. Most existing methods utilize both camera and LiDAR modalities but fuse their features through simple and insufficient mechanisms. Additionally, these approaches lack reliable positional and temporal information due to their reliance on single-frame camera data. In this paper, a novel end-to-end framework for 3D object detection was proposed to solve these problems through spatial and temporal fusion. The spatial information of bird's-eye view (BEV) features is enhanced by integrating depth features from point clouds during the conversion of image features into 3D space. Moreover, positional and temporal information is augmented by aggregating multi-frame features. This framework is named as DMFusion, which consists of the following components: (i) a novel depth fusion view transform module (referred to as DFLSS), (ii) a simple and easily adjustable temporal fusion module based on 3D convolution (referred to as 3DMTF), and (iii) a LiDAR-temporal fusion module based on channel attention mechanism. On the nuScenes benchmark, DMFusion improves mAP by 1.42% and NDS by 1.26% compared with the baseline model, which demonstrates the effectiveness of our proposed method. The code will be released at https://github.com/lilkeker/DMFusion. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. CenRadfusion: fusing image center detection and millimeter wave radar for 3D object detection.
- Author
-
Shi, Peicheng, Jiang, Tong, Yang, Aixi, and Liu, Zhiqiang
- Abstract
The fusion of visual and millimeter-wave radar data has emerged as a prominent solution for precise 3D object detection. This paper focuses on the fusion of visual and mmWave radar information and presents an enhanced fusion method called CenRadfusion. This method represents an evolution and improvement over the classic CenterFusion network by leveraging the fused features from mmWave radar and camera data to achieve accurate 3D object detection. The key features of this method are as follows:To ensure the integrity of the fusion architecture, mmWave radar point clouds are initially projected onto the image plane and added as an additional channel to the input of the CenterNet image detection network. This process forms preliminary 3D detection boxes.Subsequently, mmWave radar point clouds are subjected to density-based clustering, which results in the acquisition of labels and the elimination of irrelevant point clouds and white noise. This step enhances data quality and the reliability of object detection.Finally, an attention module, known as the Squeeze-and-Excitation Networks, is incorporated to weight each feature channel, thereby enhancing the importance of crucial features in the network.Experimental results demonstrate that compared to the original CenterFusion algorithm, the detection Average Precision (AP) values for cars, trucks, and motorcycles have improved by 7.8%, 5.5%, and 5.4%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Depth-Enhanced Deep Learning Approach For Monocular Camera Based 3D Object Detection.
- Author
-
Wang, Chuyao and Aouf, Nabil
- Abstract
Automatic 3D object detection using monocular cameras presents significant challenges in the context of autonomous driving. Precise labeling of 3D object scales requires accurate spatial information, which is difficult to obtain from a single image due to the inherent lack of depth information in monocular images, compared to LiDAR data. In this paper, we propose a novel approach to address this issue by enhancing deep neural networks with depth information for monocular 3D object detection. The proposed method comprises three key components: 1)Feature Enhancement Pyramid Module: We extend the conventional Feature Pyramid Networks (FPN) by introducing a feature enhancement pyramid network. This module fuses feature maps from the original pyramid and captures contextual correlations across multiple scales. To increase the connectivity between low-level and high-level features, additional pathways are incorporated. 2)Auxiliary Dense Depth Estimator: We introduce an auxiliary dense depth estimator that generates dense depth maps to enhance the spatial perception capabilities of the deep network model without adding computational burden. 3)Augmented Center Depth Regression: To aid center depth estimation, we employ additional bounding box vertex depth regression based on geometry. Our experimental results demonstrate the superiority of the proposed technique over existing competitive methods reported in the literature. The approach showcases remarkable performance improvements in monocular 3D object detection, making it a promising solution for autonomous driving applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. 3D Point Cloud Object Detection Algorithm Based on Temporal Information Fusion and Uncertainty Estimation.
- Author
-
Xie, Guangda, Li, Yang, Wang, Yanping, Li, Ziyi, and Qu, Hongquan
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,OPTICAL radar ,LIDAR ,DISTRIBUTION (Probability theory) ,TRACKING algorithms ,COORDINATES - Abstract
In autonomous driving, LiDAR (light detection and ranging) data are acquired over time. Most existing 3D object detection algorithms propose the object bounding box by processing each frame of data independently, which ignores the temporal sequence information. However, the temporal sequence information is usually helpful to detect the object with missing shape information due to long distance or occlusion. To address this problem, we propose a temporal sequence information fusion 3D point cloud object detection algorithm based on the Ada-GRU (adaptive gated recurrent unit). In this method, the feature of each frame for the LiDAR point cloud is extracted through the backbone network and is fed to the Ada-GRU together with the hidden features of the previous frames. Compared to the traditional GRU, the Ada-GRU can adjust the gating mechanism adaptively during the training process by introducing the adaptive activation function. The Ada-GRU outputs the temporal sequence fusion features to predict the 3D object in the current frame and transmits the hidden features of the current frame to the next frame. At the same time, the label uncertainty of the distant and occluded objects affects the training effect of the model. For this problem, this paper proposes a probability distribution model of 3D bounding box coordinates based on the Gaussian distribution function and designs the corresponding bounding box loss function to enable the model to learn and estimate the uncertainty of the positioning of the bounding box coordinates, so as to remove the bounding box with large positioning uncertainty in the post-processing stage to reduce the false positive rate. Finally, the experiments show that the methods proposed in this paper improve the accuracy of the object detection without significantly increasing the complexity of the algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network.
- Author
-
Li, Hai-Sheng and Lu, Yan-Ling
- Abstract
The algorithm design of compatible detection speed and accuracy based on LiDAR point clouds is a challenging issue in various practical applications of 3D object detection, including the field of autonomous driving. This paper designs a single-stage object detection algorithm that is lightweight and compatible with detection speed and accuracy for the above issue. To achieve these objectives, we propose a framework for a 3D object detection algorithm using a single-stage detection network as the backbone network. Firstly, we design a dual feature extraction module to reduce the occurrence of vehicle miss and error detection problems. Then, we use a multi-scale feature fusion scheme to fuse feature information with different scales. Furthermore, we design a data enhancement scheme suitable for this network architecture. Experimental results in the KITTI dataset show that the proposed method achieves improvement ratios of 38.5% for the detection speed and 2.88% ∼ 13.65% in terms of the average precision of vehicle detection compared to the existing algorithm based on single-stage object detection (SECOND). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction.
- Author
-
Karim, Tajbia, Mahayuddin, Zainal Rasyid, and Hasan, Mohammad Kamrul
- Subjects
OBJECT recognition (Computer vision) ,STEREOSCOPIC cameras ,ROBOT vision ,AUTONOMOUS vehicles ,RESEARCH personnel ,CRITICAL analysis - Abstract
Two-dimensional object detection techniques can detect multiscale objects in images. However, they lack depth information. Three-dimensional object detection provides the location of the object in the image along with depth information. To provide depth information, 3D object detection involves the application of depth-perceiving sensors such as LiDAR, stereo cameras, RGB-D, RADAR, etc. The existing review articles on 3D object detection techniques are found to be focusing on either a singular modality (e.g., only LiDAR point cloud-based) or a singular application field (e.g., autonomous vehicle navigation). However, to the best of our knowledge, there is no review paper that discusses the applicability of 3D object detection techniques in other fields such as agriculture, robot vision or human activity detection. This study analyzes both singular and multimodal techniques of 3D object detection techniques applied in different fields. A critical analysis comprising strengths and weaknesses of the 3D object detection techniques is presented. The aim of this study is to facilitate future researchers and practitioners to provide a holistic view of 3D object detection techniques. The critical analysis of the singular and multimodal techniques is expected to help the practitioners find the appropriate techniques based on their requirement. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection.
- Author
-
Huo, Weile, Jing, Tao, and Ren, Shuang
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,NEIGHBORHOODS ,AUTONOMOUS vehicles - Abstract
In this paper, we propose a two-stage framework based on voxel neighborhood feature aggregation for 3D object detection in autonomous driving, named Neighbor Voxels to Point-RCNN (NV2P-RCNN). The point representation of point clouds can encode refined features, and the voxel representation provides an efficient processing framework, so we take advantage of both point representation and voxel representation of the point cloud in this paper. In the first stage, we add point density to the voxel feature encoding and extract voxel features by a 3D sparse convolutional network. In the second stage, the features of the raw point cloud are extracted and fused with the voxel features. To achieve the fast aggregation of voxel-to-point features, we design a neighbor voxels query method named NV-Query to find neighbor voxels directly through the voxel spatial coordinates of the points. The results on the KITTI and ONCE datasets show that NV2P-RCNN achieves higher detection precision compared with other existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Improved 3D Object Detection Based on PointPillars.
- Author
-
Kong, Weiwei, Du, Yusheng, He, Leilei, and Li, Zejiang
- Subjects
OBJECT recognition (Computer vision) ,TRANSFORMER models ,FEATURE extraction ,POINT cloud ,POINT processes - Abstract
Despite the recent advancements in 3D object detection, the conventional 3D point cloud object detection algorithms have been found to exhibit limited accuracy for the detection of small objects. To address the challenge of poor detection of small-scale objects, this paper adopts the PointPillars algorithm as the baseline model and proposes a two-stage 3D target detection approach. As a cutting-edge solution, point cloud processing is performed using Transformer models. Additionally, a redefined attention mechanism is introduced to further enhance the detection capabilities of the algorithm. In the first stage, the algorithm uses PointPillars as the baseline model. The central concept of this algorithm is to transform the point cloud space into equal-sized columns. During the feature extraction stage, when the features from all cylinders are transformed into pseudo-images, the proposed algorithm incorporates attention mechanisms adapted from the Squeeze-and-Excitation (SE) method to emphasize and suppress feature information. Furthermore, the 2D convolution of the traditional backbone network is replaced by dynamic convolution. Concurrently, the addition of the attention mechanism further improves the feature representation ability of the network. In the second phase, the candidate frames generated in the first phase are refined using a Transformer-based approach. The proposed algorithm applies channel weighting in the decoder to enhance channel information, leading to improved detection accuracy and reduced false detections. The encoder constructs the initial point features from the candidate frames for encoding. Meanwhile, the decoder applies channel weighting to enhance the channel information, thereby improving the detection accuracy and reducing false detections. In the KITTI dataset, the experimental results verify the effectiveness of this method in small objects detection. Experimental results show that the proposed method significantly improves the detection capability of small objects compared with the baseline PointPillars. In concrete terms, in the moderate difficulty detection category, cars, pedestrians, and cyclists average precision (AP) values increased by 5.30%, 8.1%, and 10.6%, respectively. Moreover, the proposed method surpasses existing mainstream approaches in the cyclist category. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. SaBi3d—A LiDAR Point Cloud Data Set of Car-to-Bicycle Overtaking Maneuvers.
- Author
-
Odenwald, Christian and Beeking, Moritz
- Subjects
OBJECT recognition (Computer vision) ,CYCLING ,CYCLING safety ,CITY traffic ,POINT cloud ,TRAFFIC safety - Abstract
While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation. Dataset: https://osf.io/k7cg9 (accessed on 18 July 2024). Dataset License: CC-By Attribution 4.0 International. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera.
- Author
-
Liu, Min, Jia, Yuanjun, Lyu, Youhao, Dong, Qi, and Yang, Yanyu
- Subjects
OBJECT recognition (Computer vision) ,LIDAR ,LASER based sensors ,MULTISENSOR data fusion ,MATHEMATICAL optimization ,POINT cloud ,CAMERAS - Abstract
3D object detection is a challenging and promising task for autonomous driving and robotics, benefiting significantly from multi-sensor fusion, such as LiDAR and cameras. Conventional methods for sensor fusion rely on a projection matrix to align the features from LiDAR and cameras. However, these methods often suffer from inadequate flexibility and robustness, leading to lower alignment accuracy under complex environmental conditions. Addressing these challenges, in this paper, we propose a novel Bidirectional Attention Fusion module, named BAFusion, which effectively fuses the information from LiDAR and cameras using cross-attention. Unlike the conventional methods, our BAFusion module can adaptively learn the cross-modal attention weights, making the approach more flexible and robust. Moreover, drawing inspiration from advanced attention optimization techniques in 2D vision, we developed the Cross Focused Linear Attention Fusion Layer (CFLAF Layer) and integrated it into our BAFusion pipeline. This layer optimizes the computational complexity of attention mechanisms and facilitates advanced interactions between image and point cloud data, showcasing a novel approach to addressing the challenges of cross-modal attention calculations. We evaluated our method on the KITTI dataset using various baseline networks, such as PointPillars, SECOND, and Part-A
2 , and demonstrated consistent improvements in 3D object detection performance over these baselines, especially for smaller objects like cyclists and pedestrians. Our approach achieves competitive results on the KITTI benchmark. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
32. LiDAR-Based 3D Temporal Object Detection via Motion-Aware LiDAR Feature Fusion.
- Author
-
Park, Gyuhee, Koh, Junho, Kim, Jisong, Moon, Jun, and Choi, Jun Won
- Subjects
OBJECT recognition (Computer vision) ,LIDAR ,DOPPLER lidar ,POINT set theory ,MOTION capture (Human mechanics) ,AUTONOMOUS vehicles - Abstract
Recently, the growing demand for autonomous driving in the industry has led to a lot of interest in 3D object detection, resulting in many excellent 3D object detection algorithms. However, most 3D object detectors focus only on a single set of LiDAR points, ignoring their potential ability to improve performance by leveraging the information provided by the consecutive set of LIDAR points. In this paper, we propose a novel 3D object detection method called temporal motion-aware 3D object detection (TM3DOD), which utilizes temporal LiDAR data. In the proposed TM3DOD method, we aggregate LiDAR voxels over time and the current BEV features by generating motion features using consecutive BEV feature maps. First, we present the temporal voxel encoder (TVE), which generates voxel representations by capturing the temporal relationships among the point sets within a voxel. Next, we design a motion-aware feature aggregation network (MFANet), which aims to enhance the current BEV feature representation by quantifying the temporal variation between two consecutive BEV feature maps. By analyzing the differences and changes in the BEV feature maps over time, MFANet captures motion information and integrates it into the current feature representation, enabling more robust and accurate detection of 3D objects. Experimental evaluations on the nuScenes benchmark dataset demonstrate that the proposed TM3DOD method achieved significant improvements in 3D detection performance compared with the baseline methods. Additionally, our method achieved comparable performance to state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out.
- Author
-
Yang, Dongsheng, Fan, Xiaojie, Dong, Wei, Huang, Chaosheng, and Li, Jun
- Subjects
OBJECT recognition (Computer vision) ,REAL-time computing ,CAMERA calibration ,TRANSFORMER models ,VEHICLE models - Abstract
The bird's-eye view (BEV) method, which is a vision-centric representation-based perception task, is essential and promising for future Autonomous Vehicle perception. It has advantages of fusion-friendly, intuitive, end-to-end optimization and is cheaper than LiDAR. The performance of existing BEV methods, however, would be deteriorated under the situation of a tire blow-out. This is because they quite rely on accurate camera calibration which may be disabled by noisy camera parameters during blow-out. Therefore, it is extremely unsafe to use existing BEV methods in the tire blow-out situation. In this paper, we propose a geometry-guided auto-resizable kernel transformer (GARKT) method, which is designed especially for vehicles with tire blow-out. Specifically, we establish a camera deviation model for vehicles with tire blow-out. Then we use the geometric priors to attain the prior position in perspective view with auto-resizable kernels. The resizable perception areas are encoded and flattened to generate BEV representation. GARKT predicts the nuScenes detection score (NDS) with a value of 0.439 on a newly created blow-out dataset based on nuScenes. NDS can still obtain 0.431 when the tire is completely flat, which is much more robust compared to other transformer-based BEV methods. Moreover, the GARKT method has almost real-time computing speed, with about 20.5 fps on one GPU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Impact of LiDAR point cloud compression on 3D object detection evaluated on the KITTI dataset.
- Author
-
Martins, Nuno A. B., Cruz, Luís A. da Silva, and Lopes, Fernando
- Subjects
OBJECT recognition (Computer vision) ,MPEG (Video coding standard) ,POINT cloud ,LIDAR ,OPTICAL radar ,COMPUTER vision ,HUFFMAN codes - Abstract
The rapid growth on the amount of generated 3D data, particularly in the form of Light Detection And Ranging (LiDAR) point clouds (PCs), poses very significant challenges in terms of data storage, transmission, and processing. Point cloud (PC) representation of 3D visual information has shown to be a very flexible format with many applications ranging from multimedia immersive communication to machine vision tasks in the robotics and autonomous driving domains. In this paper, we investigate the performance of four reference 3D object detection techniques, when the input PCs are compressed with varying levels of degradation. Compression is performed using two MPEG standard coders based on 2D projections and octree decomposition, as well as two coding methods based on Deep Learning (DL). For the DL coding methods, we used a Joint Photographic Experts Group (JPEG) reference PC coder, that we adapted to accept LiDAR PCs in both Cartesian and cylindrical coordinate systems. The detection performance of the four reference 3D object detection methods was evaluated using both pre-trained models and models specifically trained using degraded PCs reconstructed from compressed representations. It is shown that LiDAR PCs can be compressed down to 6 bits per point with no significant degradation on the object detection precision. Furthermore, employing specifically trained detection models improves the detection capabilities even at compression rates as low as 2 bits per point. These results show that LiDAR PCs can be coded to enable efficient storage and transmission, without significant object detection performance loss. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection.
- Author
-
Xu, Wanpeng and Fu, Zhipeng
- Abstract
To address the problem that image and point cloud features are fused in a coarse fusion way and cannot achieve deep fusion, this paper proposes a multimodal 3D object detection architecture based on a mutual feature gating mechanism. First, since the feature aggregation approach based on the set abstraction layer cannot obtain fine-grained features, a point-based self-attention mechanism module is designed. This module is added to the extraction branch of point cloud features to achieve fine-grained feature aggregation while maintaining accurate location information. Second, a new gating mechanism is designed for the deep fusion of image and point cloud. Deep fusion is achieved by mutual feature weighting between the image and the point cloud. The newly fused features are then fed into a feature refinement network to extract classification confidence and 3D target bounding boxes. Finally, a multi-scale detection architecture is proposed to obtain a more complete object shape. The location-based encoding feature algorithm is also designed to focus the interest points in the region of interest adaptively. The whole architecture shows outstanding performance on the KITTI3D and nuSenece datasets, especially at the difficult level. It shows that the framework solves the problem of low detection rates in LiDAR mode due to the low number of surface points obtained from distant objects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection.
- Author
-
Zhang, Zheng, Xu, Ruyu, and Tian, Qing
- Subjects
OBJECT recognition (Computer vision) ,LASER based sensors ,OPTICAL radar ,LIDAR ,CAMERAS ,TRANSFORMER models - Abstract
In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. CAF-RCNN: multimodal 3D object detection with cross-attention.
- Author
-
Liu, Junting, Liu, Deer, and Zhu, Lei
- Subjects
OBJECT recognition (Computer vision) ,RIVER channels ,MULTIMODAL user interfaces ,LIDAR ,DETECTORS ,LASER based sensors - Abstract
LiDAR and camera are pivotal sensors of 3D (three-dimensional) object detection. As a result of their different characteristics, increasingly multimodal-based object detection methods have been proposed. Now, popular methods are to hardly associate camera features with LiDAR features, but the features are frequently enhanced and aggregated, so there is a major challenge in how to align two features effectively. Therefore, we propose CAF-RCNN. On the basis of PointRCNN, using Feature Pyramid Network (FPN) to extract advanced semantic features at different scales, then fusing these features with the LiDAR features of the Set Abstraction (SA) module output in PointRCNN and subsequent steps. Regarding the features fusion module, we design a module based on the cross-attention mechanism, CAFM (Cross-Attention Fusion Module). It combines two channel attention streams in a cross-over fashion to utilize rich details about significant objects in the Image Stream and Geometric Stream. We did a lot of experiments on the KITTI dataset, and the result shows that our method is 6.43% higher than PointRCNN in 3D accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A Two-Stage Lidar-Based Approach for Enhanced Pedestrian and Cyclist Detection.
- Author
-
Ma, Yue, Miao, Lei, Wang, Haosen, Li, Yan, Lu, Bo, and Wang, Shifeng
- Subjects
OBJECT recognition (Computer vision) ,PEDESTRIANS ,CYCLISTS ,LIDAR ,PROBLEM solving - Abstract
In recent years, the application scope of LIDAR has been continuously expanding, especially in object detection. Yet existing LIDAR-based methods focus on detecting vehicles on regular roadways. Scenarios with a higher prevalence of pedestrians and cyclists, such as university campuses and leisure centers, have recently received limited attention. To solve this problem, in this paper we propose a novel detection algorithm named SecondRcnn, which is built upon the SECOND algorithm and introduces a novel two-stage detection method. In the first stage, it utilizes 3D sparse convolution on the voxel LIDAR points to learn feature representations. In the second stage, regression is employed to refine the detection bounding boxes generated by the Region Of Interest pooling network. The algorithm was evaluated on the widely used KITTI data set and demonstrated significant performance improvements in detecting pedestrians (4.61% improvement) and cyclist (6.5% improvement) compared to baseline networks. Our work highlights the potential for accurate object detection in scenarios characterized by a higher presence of pedestrians and cyclists. Advancing the use of LIDAR in the field of 3D detection. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. 3D Vehicle Detection and Segmentation Based on EfficientNetB3 and CenterNet Residual Blocks.
- Author
-
Kashevnik, Alexey and Ali, Ammar
- Subjects
IMAGE recognition (Computer vision) ,OBJECT recognition (Computer vision) ,DEGREES of freedom ,AUTOMOBILE license plates ,VEHICLES - Abstract
In this paper, we present a two stages solution to 3D vehicle detection and segmentation. The first stage depends on the combination of EfficientNetB3 architecture with multiparallel residual blocks (inspired by CenterNet architecture) for 3D localization and poses estimation for vehicles on the scene. The second stage takes the output of the first stage as input (cropped car images) to train EfficientNet B3 for the image recognition task. Using predefined 3D Models, we substitute each vehicle on the scene with its match using the rotation matrix and translation vector from the first stage to get the 3D detection bounding boxes and segmentation masks. We trained our models on an open-source dataset (ApolloCar3D). Our method outperforms all published solutions in terms of 6 degrees of freedom error (6 DoF err). [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Masked Autoencoder for Pre-Training on 3D Point Cloud Object Detection.
- Author
-
Xie, Guangda, Li, Yang, Qu, Hongquan, and Sun, Zaiming
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,OPTICAL radar ,LIDAR - Abstract
In autonomous driving, the 3D LiDAR (Light Detection and Ranging) point cloud data of the target are missing due to long distance and occlusion. It makes object detection more difficult. This paper proposes Point Cloud Masked Autoencoder (PCMAE), which can provide pre-training for most voxel-based point cloud object detection algorithms. PCMAE improves the feature representation ability of the 3D backbone for long-distance and occluded objects through self-supervised learning. First, a point cloud masking strategy for autonomous driving scenes named PC-Mask is proposed. It is used to simulate the problem of missing point cloud data information due to occlusion and distance in autonomous driving scenarios. Then, a symmetrical encoder–decoder architecture is designed for pre-training. The encoder is used to extract the high-level features of the point cloud after PC-Mask, and the decoder is used to reconstruct the complete point cloud. Finally, the pre-training method proposed in this paper is applied to SECOND (Sparsely Embedded Convolutional Detection) and Part-A2-Net (Part-aware and Aggregate Neural Network) object detection algorithms. The experimental results show that our method can speed up the model convergence speed and improve the detection accuracy, especially the detection effect of long-distance and occluded objects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything.
- Author
-
El Ghazouali, Safouane, Mhirit, Youssef, Oukhrid, Ali, Michelucci, Umberto, and Nouira, Hichem
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,CAMERAS ,POSE estimation (Computer vision) ,COMPUTER systems - Abstract
In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth maps, as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color RGB and depth D channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information in order to improve post-processes such as object 6D pose estimation, Simultanious Localization and Mapping (SLAM) operations, accurate 3D dataset extraction, etc. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. DS-Trans: A 3D Object Detection Method Based on a Deformable Spatiotemporal Transformer for Autonomous Vehicles.
- Author
-
Zhu, Yuan, Xu, Ruidong, Tao, Chongben, An, Hao, Wang, Huaide, Sun, Zhipeng, and Lu, Ke
- Subjects
OBJECT recognition (Computer vision) ,TRANSFORMER models ,AUTONOMOUS vehicles ,FEATURE extraction ,POINT cloud ,WEATHER - Abstract
Facing the significant challenge of 3D object detection in complex weather conditions and road environments, existing algorithms based on single-frame point cloud data struggle to achieve desirable results. These methods typically focus on spatial relationships within a single frame, overlooking the semantic correlations and spatiotemporal continuity between consecutive frames. This leads to discontinuities and abrupt changes in the detection outcomes. To address this issue, this paper proposes a multi-frame 3D object detection algorithm based on a deformable spatiotemporal Transformer. Specifically, a deformable cross-scale Transformer module is devised, incorporating a multi-scale offset mechanism that non-uniformly samples features at different scales, enhancing the spatial information aggregation capability of the output features. Simultaneously, to address the issue of feature misalignment during multi-frame feature fusion, a deformable cross-frame Transformer module is proposed. This module incorporates independently learnable offset parameters for different frame features, enabling the model to adaptively correlate dynamic features across multiple frames and improve the temporal information utilization of the model. A proposal-aware sampling algorithm is introduced to significantly increase the foreground point recall, further optimizing the efficiency of feature extraction. The obtained multi-scale and multi-frame voxel features are subjected to an adaptive fusion weight extraction module, referred to as the proposed mixed voxel set extraction module. This module allows the model to adaptively obtain mixed features containing both spatial and temporal information. The effectiveness of the proposed algorithm is validated on the KITTI, nuScenes, and self-collected urban datasets. The proposed algorithm achieves an average precision improvement of 2.1% over the latest multi-frame-based algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Adaptive learning point cloud and image diversity feature fusion network for 3D object detection.
- Author
-
Yan, Weiqing, Liu, Shile, Liu, Hao, Yue, Guanghui, Wang, Xuan, Song, Yongchao, and Xu, Jindong
- Subjects
POINT cloud ,ARTIFICIAL neural networks ,FEATURE extraction ,OBJECT recognition (Computer vision) - Abstract
3D object detection is a critical task in the fields of virtual reality and autonomous driving. Given that each sensor has its own strengths and limitations, multi-sensor-based 3D object detection has gained popularity. However, most existing methods extract high-level image semantic features and fuse them with point cloud features, focusing solely on consistent information from both sensors while ignoring their complementary information. In this paper, we present a novel two-stage multi-sensor deep neural network, called the adaptive learning point cloud and image diversity feature fusion network (APIDFF-Net), for 3D object detection. Our approach employs the fine-grained image information to complement the point cloud information by combining low-level image features with high-level point cloud features. Specifically, we design a shallow image feature extraction module to learn fine-grained information from images, instead of relying on deep layer features with coarse-grained information. Furthermore, we design a diversity feature fusion (DFF) module that transforms low-level image features into point-wise image features and explores their complementary features through an attention mechanism, ensuring an effective combination of fine-grained image features and point cloud features. Experiments on the KITTI benchmark show that the proposed method outperforms state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. A survey on 3D object detection in real time for autonomous driving.
- Author
-
Contreras, Marcelo, Jain, Aayush, Bhatt, Neel P., Banerjee, Arunava, Hashemi, Ehsan, Weiguo Pan, and Alecsandru, Ciprian
- Subjects
OBJECT recognition (Computer vision) ,MONOCULAR vision ,WEATHER ,AUTONOMOUS vehicles ,DETECTORS - Abstract
This survey reviews advances in 3D object detection approaches for autonomous driving. A brief introduction to 2D object detection is first discussed and drawbacks of the existing methodologies are identified for highly dynamic environments. Subsequently, this paper reviews the state-of-the-art 3D object detection techniques that utilizes monocular and stereo vision for reliable detection in urban settings. Based on depth inference basis, learning schemes, and internal representation, this work presents a method taxonomy of three classes: model-based and geometrically constrained approaches, end-to-end learning methodologies, and hybrid methods. There is highlighted segment for current trend of multi-view detectors as end-to-end methods due to their boosted robustness. Detectors from the last two kinds were specially selected to exploit the autonomous driving context in terms of geometry, scene content and instances distribution. To prove the effectiveness of each method, 3D object detection datasets for autonomous vehicles are described with their unique features, e. g., varying weather conditions, multi-modality, multi camera perspective and their respective metrics associated to different difficulty categories. In addition, we included multi-modal visual datasets, i. e., V2X that may tackle the problems of single-view occlusion. Finally, the current research trends in object detection are summarized, followed by a discussion on possible scope for future research in this domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Path aggregation one-stage anchor free 3D object detection.
- Author
-
Liu, Yanfei, Li, Chao, Ning, Kanglin, and Li, Yali
- Abstract
In recent years, autonomous driving has entered a rapid development phase and put forward more challenging requirements for perception technology. Different from object detection methods for 2D images, 3D object detection, which uses Light Detection And Ranging (LiDAR) point cloud as input, can accurately provide the coordinates, physical size, and orientation of an object in 3D space. This paper constructs a deep learning neural network for 3D visual object recognition inspired by computational neuroscience. Considering that a part of the visual recognition pathway of the human brain tends to serve multiple visual recognition tasks, we set up an auxiliary task branch when training the proposed 3D object detector. Through this auxiliary branch task, the backbone of our 3D object detector can learn more generalizable features from the point cloud input. As the human brain needs to collect information from different visual areas, the proposed model designed a multi-stride residual 3D backbone network and a path aggregation 2D neck network to achieve similar functions. Extensive experiments have been conducted on the KITTI dataset and Waymo Open Dataset. The results show that our methods could achieve an outstanding balance between speed and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Survey and systematization of 3D object detection models and methods.
- Author
-
Drobnitzky, Moritz, Friederich, Jonas, Egger, Bernhard, and Zschech, Patrick
- Subjects
OBJECT recognition (Computer vision) ,FEATURE extraction ,RESEARCH personnel - Abstract
Strong demand for autonomous vehicles and the wide availability of 3D sensors are continuously fueling the proposal of novel methods for 3D object detection. In this paper, we provide a comprehensive survey of recent developments from 2012–2021 in 3D object detection covering the full pipeline from input data, over data representation and feature extraction to the actual detection modules. We introduce fundamental concepts, focus on a broad range of different approaches that have emerged over the past decade, and propose a systematization that provides a practical framework for comparing these approaches with the goal of guiding future development, evaluation, and application activities. Specifically, our survey and systematization of 3D object detection models and methods can help researchers and practitioners to get a quick overview of the field by decomposing 3DOD solutions into more manageable pieces. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. MVTr: multi-feature voxel transformer for 3D object detection.
- Author
-
Ai, Lingmei, Xie, Zhuoyu, Yao, Ruoxia, and Yang, Mengyao
- Subjects
OBJECT recognition (Computer vision) ,TRANSFORMER models ,CONVOLUTIONAL neural networks ,IMAGE segmentation ,POINT cloud - Abstract
Convolutional neural networks have become a powerful tool for partial 3D object detection. However, their power has not been fully realized for focusing on global information, which is crucial for object detection. In this paper, we resolve the problem with a multi-feature voxel transformer (MVTr), an architecture that extracts long-range relationship features through self-attention between multi-feature voxels. In general, converting a point cloud to a voxel representation can reduce a lot of computation, but it would take a long process for the attention network to pay attention to the car voxels in a huge 3D real scene. To this end, we propose a semantic voxel module which takes semantic voxels as input and cooperates with a sparse and a non-empty voxel module to extract features. And the semantic voxels are generated from image segmentation and point cloud projection, which only retains a large number of car voxels. To further enlarge the attention range while maintaining a favorable computational, we propose two attention mechanisms for multi-head attention: local attention and stumpy attention. Finally, we propose the fusion attention module, which can add channel attention and spatial attention to the 2D backbone network. MVTr combines the semantic information of the image and the 3D information of the point cloud and can be applied to most 3D object detection tasks. Experimental results on KITTI dataset show that our method is effective, and the precision has significant advantages compared to other similar feature fusion-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Pre-Segmented Down-Sampling Accelerates Graph Neural Network-Based 3D Object Detection in Autonomous Driving.
- Author
-
Liang, Zhenming, Huang, Yingping, and Bai, Yanbiao
- Subjects
GRAPH neural networks ,OBJECT recognition (Computer vision) ,POINT cloud ,AUTONOMOUS vehicles ,POINT processes ,LIDAR - Abstract
Graph neural networks (GNNs) have been proven to be an ideal approach to deal with irregular point clouds, but involve massive computations for searching neighboring points in the graph, which limits their application in large-scale LiDAR point cloud processing. Down-sampling is a straightforward and indispensable step in current GNN-based 3D detectors to reduce the computational burden of the model, but the commonly used down-sampling methods cannot distinguish the categories of the LiDAR points, which leads to an inability to effectively improve the computational efficiency of the GNN models without affecting their detection accuracy. In this paper, we propose (1) a LiDAR point cloud pre-segmented down-sampling (PSD) method that can selectively reduce background points while preserving the foreground object points during the process, greatly improving the computational efficiency of the model without affecting its 3D detection accuracy. (2) A lightweight GNN-based 3D detector that can extract point features and detect objects from the raw down-sampled LiDAR point cloud directly without any pre-transformation. We test the proposed model on the KITTI 3D Object Detection Benchmark, and the results demonstrate its effectiveness and efficiency for autonomous driving 3D object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. FANet: Improving 3D Object Detection with Position Adaptation.
- Author
-
Ye, Jian, Zuo, Fushan, and Qian, Yuqing
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,FEATURE extraction ,SPATIAL variation ,AUTONOMOUS vehicles - Abstract
Three-dimensional object detection plays a crucial role in achieving accurate and reliable autonomous driving systems. However, the current state-of-the-art two-stage detectors lack flexibility and have limited feature extraction capabilities to effectively handle the disorder and irregularity of point clouds. In this paper, we propose a novel network called FANet, which combines the strengths of PV-RCNN and PAConv (position adaptive convolution). The goal of FANet is to address the irregularity and disorder present in point clouds. In our network, the convolution operation constructs convolutional kernels using a basic weight matrix, and the coefficients of these kernels are adaptively learned by LearnNet from relative points. This approach allows for the flexible modeling of complex spatial variations and geometric structures in 3D point clouds, leading to the improved extraction of point cloud features and generation of high-quality 3D proposal boxes. Compared to other methods, extensive experiments on the KITTI dataset have demonstrated that the FANet exhibits superior 3D object detection accuracy, showcasing a significant improvement in our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. F-3DNet: Extracting inner order of point cloud for 3D object detection in autonomous driving
- Author
-
Xu, Fenglei, Zhao, Haokai, Wu, Yifei, and Tao, Chongben
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.