Author: "Yu, Hongkai" / Publisher: arxiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Hongkai"' showing total 8 results

Start Over Author "Yu, Hongkai" Publisher arxiv

8 results on '"Yu, Hongkai"'

1. S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

Author: Li, Jinlong, Xu, Runsheng, Liu, Xinyu, Li, Baolu, Zou, Qin, Ma, Jiaqi, and Yu, Hongkai
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Due to the lack of real multi-agent data and time-consuming of labeling, existing multi-agent cooperative perception algorithms usually select the simulated sensor data for training and validating. However, the perception performance is degraded when these simulation-trained models are deployed to the real world, due to the significant domain gap between the simulated and real data. In this paper, we propose the first Simulation-to-Reality transfer learning framework for multi-agent cooperative perception using a novel Vision Transformer, named as S2R-ViT, which considers both the Implementation Gap and Feature Gap between simulated and real data. We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Implementation Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap. Our intensive experiments on the public multi-agent cooperative perception datasets OPV2V and V2V4Real demonstrate that the proposed S2R-ViT can effectively bridge the gap from simulation to reality and outperform other methods significantly for point cloud-based 3D object detection., Comment: correct the complie error in Fig.5
Published: 2023
Full Text: View/download PDF

2. RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

Author: Ma, Jin, Li, Jinlong, Guo, Qing, Zhang, Tianyun, Lin, Yuewei, and Yu, Hongkai
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD., Comment: 10 pages
Published: 2023
Full Text: View/download PDF

3. V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception

Author: Xu, Runsheng, Xia, Xin, Li, Jinlong, Li, Hanzhao, Zhang, Shuo, Tu, Zhengzhong, Meng, Zonglin, Xiang, Hao, Dong, Xiaoyu, Song, Rui, Yu, Hongkai, Zhou, Bolei, and Ma, Jiaqi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Modern perception systems of autonomous vehicles are known to be sensitive to occlusions and lack the capability of long perceiving range. It has been one of the key bottlenecks that prevents Level 5 autonomy. Recent research has demonstrated that the Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry. However, the lack of a real-world dataset hinders the progress of this field. To facilitate the development of cooperative perception, we present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception. The data is collected by two vehicles equipped with multi-modal sensors driving together through diverse scenarios. Our V2V4Real dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps that cover all the driving routes. V2V4Real introduces three perception tasks, including cooperative 3D object detection, cooperative 3D object tracking, and Sim2Real domain adaptation for cooperative perception. We provide comprehensive benchmarks of recent cooperative perception algorithms on three tasks. The V2V4Real dataset can be found at https://research.seas.ucla.edu/mobility-lab/v2v4real/., Comment: Accepted by CVPR2023. Website link: https://research.seas.ucla.edu/mobility-lab/v2v4real
Published: 2023
Full Text: View/download PDF

4. Deep Transfer Learning for Intelligent Vehicle Perception: a Survey

Author: Liu, Xinyu, Li, Jinlong, Ma, Jin, Sun, Huiming, Xu, Zhigang, Zhang, Tianyun, and Yu, Hongkai
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning-based intelligent vehicle perception has been developing prominently in recent years to provide a reliable source for motion planning and decision making in autonomous driving. A large number of powerful deep learning-based methods can achieve excellent performance in solving various perception problems of autonomous driving. However, these deep learning methods still have several limitations, for example, the assumption that lab-training (source domain) and real-testing (target domain) data follow the same feature distribution may not be practical in the real world. There is often a dramatic domain gap between them in many real-world cases. As a solution to this challenge, deep transfer learning can handle situations excellently by transferring the knowledge from one domain to another. Deep transfer learning aims to improve task performance in a new domain by leveraging the knowledge of similar tasks learned in another domain before. Nevertheless, there are currently no survey papers on the topic of deep transfer learning for intelligent vehicle perception. To the best of our knowledge, this paper represents the first comprehensive survey on the topic of the deep transfer learning for intelligent vehicle perception. This paper discusses the domain gaps related to the differences of sensor, data, and model for the intelligent vehicle perception. The recent applications, challenges, future researches in intelligent vehicle perception are also explored.
Published: 2023
Full Text: View/download PDF

5. Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review

Author: Azfar, Talha, Li, Jinlong, Yu, Hongkai, Cheu, Ruey Long, Lv, Yisheng, and Ke, Ruimin
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Computer vision applications in intelligent transportation systems (ITS) and autonomous driving (AD) have gravitated towards deep neural network architectures in recent years. While performance seems to be improving on benchmark datasets, many real-world challenges are yet to be adequately considered in research. This paper conducted an extensive literature review on the applications of computer vision in ITS and AD, and discusses challenges related to data, models, and complex urban environments. The data challenges are associated with the collection and labeling of training data and its relevance to real world conditions, bias inherent in datasets, the high volume of data needed to be processed, and privacy concerns. Deep learning (DL) models are commonly too complex for real-time processing on embedded hardware, lack explainability and generalizability, and are hard to test in real-world settings. Complex urban traffic environments have irregular lighting and occlusions, and surveillance cameras can be mounted at a variety of angles, gather dirt, shake in the wind, while the traffic conditions are highly heterogeneous, with violation of rules and complex interactions in crowded scenarios. Some representative applications that suffer from these problems are traffic flow estimation, congestion detection, autonomous driving perception, vehicle interaction, and edge computing for practical deployment. The possible ways of dealing with the challenges are also explored while prioritizing practical deployment.
Published: 2022
Full Text: View/download PDF

6. AdvHaze: Adversarial Haze Attack

Author: Gao, Ruijun, Guo, Qing, Juefei-Xu, Felix, Yu, Hongkai, and Feng, Wei
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, adversarial attacks have drawn more attention for their value on evaluating and improving the robustness of machine learning models, especially, neural network models. However, previous attack methods have mainly focused on applying some $l^p$ norm-bounded noise perturbations. In this paper, we instead introduce a novel adversarial attack method based on haze, which is a common phenomenon in real-world scenery. Our method can synthesize potentially adversarial haze into an image based on the atmospheric scattering model with high realisticity and mislead classifiers to predict an incorrect class. We launch experiments on two popular datasets, i.e., ImageNet and NIPS~2017. We demonstrate that the proposed method achieves a high success rate, and holds better transferability across different classification models than the baselines. We also visualize the correlation matrices, which inspire us to jointly apply different perturbations to improve the success rate of the attack. We hope this work can boost the development of non-noise-based adversarial attacks and help evaluate and improve the robustness of DNNs.
Published: 2021
Full Text: View/download PDF

7. Benchmarking Shadow Removal for Facial Landmark Detection and Beyond

Author: Fu, Lan, Guo, Qing, Juefei-Xu, Felix, Yu, Hongkai, Feng, Wei, Liu, Yang, and Wang, Song
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Facial landmark detection is a very fundamental and significant vision task with many important applications. In practice, facial landmark detection can be affected by a lot of natural degradations. One of the most common and important degradations is the shadow caused by light source blocking. While many advanced shadow removal methods have been proposed to recover the image quality in recent years, their effects to facial landmark detection are not well studied. For example, it remains unclear whether shadow removal could enhance the robustness of facial landmark detection to diverse shadow patterns or not. In this work, for the first attempt, we construct a novel benchmark to link two independent but related tasks (i.e., shadow removal and facial landmark detection). In particular, the proposed benchmark covers diverse face shadows with different intensities, sizes, shapes, and locations. Moreover, to mine hard shadow patterns against facial landmark detection, we propose a novel method (i.e., adversarial shadow attack), which allows us to construct a challenging subset of the benchmark for a comprehensive analysis. With the constructed benchmark, we conduct extensive analysis on three state-of-the-art shadow removal methods and three landmark detectors. The observation of this work motivates us to design a novel detection-aware shadow removal framework, which empowers shadow removal to achieve higher restoration quality and enhance the shadow robustness of deployed facial landmark detectors.
Published: 2021
Full Text: View/download PDF

8. DCDLearn: Multi-order Deep Cross-distance Learning for Vehicle Re-Identification

Author: Zhu, Rixing, Fang, Jianwu, Xu, Hongke, Yu, Hongkai, and Xue, Jianru
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Vehicle re-identification (Re-ID) has become a popular research topic owing to its practicability in intelligent transportation systems. Vehicle Re-ID suffers the numerous challenges caused by drastic variation in illumination, occlusions, background, resolutions, viewing angles, and so on. To address it, this paper formulates a multi-order deep cross-distance learning (\textbf{DCDLearn}) model for vehicle re-identification, where an efficient one-view CycleGAN model is developed to alleviate exhaustive and enumerative cross-camera matching problem in previous works and smooth the domain discrepancy of cross cameras. Specially, we treat the transferred images and the reconstructed images generated by one-view CycleGAN as multi-order augmented data for deep cross-distance learning, where the cross distances of multi-order image set with distinct identities are learned by optimizing an objective function with multi-order augmented triplet loss and center loss to achieve the camera-invariance and identity-consistency. Extensive experiments on three vehicle Re-ID datasets demonstrate that the proposed method achieves significant improvement over the state-of-the-arts, especially for the small scale dataset.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Yu, Hongkai"'

1. S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality

2. RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

3. V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception

4. Deep Transfer Learning for Intelligent Vehicle Perception: a Survey

5. Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review

6. AdvHaze: Adversarial Haze Attack

7. Benchmarking Shadow Removal for Facial Landmark Detection and Beyond

8. DCDLearn: Multi-order Deep Cross-distance Learning for Vehicle Re-Identification

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

8 results on '"Yu, Hongkai"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources