Author: "Nie, Jiahao" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Nie, Jiahao"' showing total 44 results

Start Over Author "Nie, Jiahao"

44 results on '"Nie, Jiahao"'

1. VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

Author: Lu, Yuxuan, Nie, Jiahao, He, Zhiwei, Gu, Hongjie, and Lv, Xudong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current LiDAR point cloud-based 3D single object tracking (SOT) methods typically rely on point-based representation network. Despite demonstrated success, such networks suffer from some fundamental problems: 1) It contains pooling operation to cope with inherently disordered point clouds, hindering the capture of 3D spatial information that is useful for tracking, a regression task. 2) The adopted set abstraction operation hardly handles density-inconsistent point clouds, also preventing 3D spatial information from being modeled. To solve these problems, we introduce a novel tracking framework, termed VoxelTrack. By voxelizing inherently disordered point clouds into 3D voxels and extracting their features via sparse convolution blocks, VoxelTrack effectively models precise and robust 3D spatial information, thereby guiding accurate position prediction for tracked objects. Moreover, VoxelTrack incorporates a dual-stream encoder with cross-iterative feature fusion module to further explore fine-grained 3D spatial information for tracking. Benefiting from accurate 3D spatial information being modeled, our VoxelTrack simplifies tracking pipeline with a single regression loss. Extensive experiments are conducted on three widely-adopted datasets including KITTI, NuScenes and Waymo Open Dataset. The experimental results confirm that VoxelTrack achieves state-of-the-art performance (88.3%, 71.4% and 63.6% mean precision on the three datasets, respectively), and outperforms the existing trackers with a real-time speed of 36 Fps on a single TITAN RTX GPU. The source code and model will be released.
Published: 2024

2. Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Author: An, Wenbin, Tian, Feng, Nie, Jiahao, Shi, Wenkai, Lin, Haonan, Chen, Yan, Wang, QianYing, Wu, Yaqiang, Dai, Guang, and Chen, Ping
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to specify the required knowledge. Specifically, DKA requires LLMs to specify what knowledge they need to answer the question and decompose the original complex question into two simple sub-questions: Image-based sub-question and Knowledge-based sub-question. Then we use the two sub-questions to retrieve knowledge from the image and knowledge base, respectively. In this way, two knowledge acquisition models can focus on the content that corresponds to them and avoid disturbance of irrelevant elements in the original complex question, which can help to provide more precise knowledge and better align the knowledge needs of LLMs to yield correct answers. Experiments on benchmark datasets show that DKA significantly outperforms SOTA models. To facilitate future research, our data and code are available at \url{https://github.com/Lackel/DKA}., Comment: Pre-print
Published: 2024

3. P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds

Author: Nie, Jiahao, Xie, Fei, Zhou, Sifan, Zhou, Xueyi, Chae, Dong-Kyu, and He, Zhiwei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D single object tracking (SOT) methods based on appearance matching has long suffered from insufficient appearance information incurred by incomplete, textureless and semantically deficient LiDAR point clouds. While motion paradigm exploits motion cues instead of appearance matching for tracking, it incurs complex multi-stage processing and segmentation module. In this paper, we first provide in-depth explorations on motion paradigm, which proves that (\textbf{i}) it is feasible to directly infer target relative motion from point clouds across consecutive frames; (\textbf{ii}) fine-grained information comparison between consecutive point clouds facilitates target motion modeling. We thereby propose to perform part-to-part motion modeling for consecutive point clouds and introduce a novel tracking framework, termed \textbf{P2P}. The novel framework fuses each corresponding part information between consecutive point clouds, effectively exploring detailed information changes and thus modeling accurate target-related motion cues. Following this framework, we present P2P-point and P2P-voxel models, incorporating implicit and explicit part-to-part motion modeling by point- and voxel-based representation, respectively. Without bells and whistles, P2P-voxel sets a new state-of-the-art performance ($\sim$\textbf{89\%}, \textbf{72\%} and \textbf{63\%} precision on KITTI, NuScenes and Waymo Open Dataset, respectively). Moreover, under the same point-based representation, P2P-point outperforms the previous motion tracker M$^2$Track by \textbf{3.3\%} and \textbf{6.7\%} on the KITTI and NuScenes, while running at a considerably high speed of \textbf{107 Fps} on a single RTX3090 GPU. The source code and pre-trained models are available at \url{https://github.com/haooozi/P2P}., Comment: The source code and pre-trained models are available at https://github.com/haooozi/P2P
Published: 2024

4. Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models

Author: Xu, Yicheng, Chen, Yuxin, Nie, Jiahao, Wang, Yusong, Zhuang, Huiping, and Okumura, Manabu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code is released at https://github.com/linghan1997/Regression-based-Analytic-Incremental-Learning., Comment: Accepted by NeurIPS 2024
Published: 2024

5. AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Author: An, Wenbin, Tian, Feng, Leng, Sicong, Nie, Jiahao, Lin, Haonan, Wang, QianYing, Dai, Guang, Chen, Ping, and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA.
Published: 2024

6. MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

Author: Nie, Jiahao, Zhang, Gongjie, An, Wenbin, Tan, Yap-Peng, Kot, Alex C., and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the recent advancements in Multi-modal Large Language Models (MLLMs), understanding inter-object relations, i.e., interactions or associations between distinct objects, remains a major challenge for such models. This issue significantly hinders their advanced reasoning capabilities and is primarily due to the lack of large-scale, high-quality, and diverse multi-modal data essential for training and evaluating MLLMs. In this paper, we provide a taxonomy of inter-object relations and introduce Multi-Modal Relation Understanding (MMRel), a comprehensive dataset designed to bridge this gap by providing large-scale, high-quality and diverse data for studying inter-object relations with MLLMs. MMRel features three distinctive attributes: (i) It includes over 15K question-answer pairs, which are sourced from three distinct domains, ensuring large scale and high diversity; (ii) It contains a subset featuring highly unusual relations, on which MLLMs often fail due to hallucinations, thus are very challenging; (iii) It provides manually verified high-quality labels for inter-object relations. Thanks to these features, MMRel is ideal for evaluating MLLMs on relation understanding, as well as being used to fine-tune MLLMs to enhance relation understanding and even benefit overall performance in various vision-language tasks. Extensive experiments on various popular MLLMs validate the effectiveness of MMRel. Both MMRel dataset and the complete labeling scripts have been made publicly available.
Published: 2024

7. Color Space Learning for Cross-Color Person Re-Identification

Author: Nie, Jiahao, Lin, Shan, and Kot, Alex C.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The primary color profile of the same identity is assumed to remain consistent in typical Person Re-identification (Person ReID) tasks. However, this assumption may be invalid in real-world situations and images hold variant color profiles, because of cross-modality cameras or identity with different clothing. To address this issue, we propose Color Space Learning (CSL) for those Cross-Color Person ReID problems. Specifically, CSL guides the model to be less color-sensitive with two modules: Image-level Color-Augmentation and Pixel-level Color-Transformation. The first module increases the color diversity of the inputs and guides the model to focus more on the non-color information. The second module projects every pixel of input images onto a new color space. In addition, we introduce a new Person ReID benchmark across RGB and Infrared modalities, NTU-Corridor, which is the first with privacy agreements from all participants. To evaluate the effectiveness and robustness of our proposed CSL, we evaluate it on several Cross-Color Person ReID benchmarks. Our method surpasses the state-of-the-art methods consistently. The code and benchmark are available at: https://github.com/niejiahao1998/CSL, Comment: Accepted by ICME 2024 (Oral)
Published: 2024

8. Towards Category Unification of 3D Single Object Tracking on Point Clouds

Author: Nie, Jiahao, He, Zhiwei, Lv, Xudong, Zhou, Xueyi, Chae, Dong-Kyu, and Xie, Fei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Category-specific models are provenly valuable methods in 3D single object tracking (SOT) regardless of Siamese or motion-centric paradigms. However, such over-specialized model designs incur redundant parameters, thus limiting the broader applicability of 3D SOT task. This paper first introduces unified models that can simultaneously track objects across all categories using a single network with shared model parameters. Specifically, we propose to explicitly encode distinct attributes associated to different object categories, enabling the model to adapt to cross-category data. We find that the attribute variances of point cloud objects primarily occur from the varying size and shape (e.g., large and square vehicles v.s. small and slender humans). Based on this observation, we design a novel point set representation learning network inheriting transformer architecture, termed AdaFormer, which adaptively encodes the dynamically varying shape and size information from cross-category data in a unified manner. We further incorporate the size and shape prior derived from the known template targets into the model's inputs and learning objective, facilitating the learning of unified representation. Equipped with such designs, we construct two category-unified models SiamCUT and MoCUT.Extensive experiments demonstrate that SiamCUT and MoCUT exhibit strong generalization and training stability. Furthermore, our category-unified models outperform the category-specific counterparts by a significant margin (e.g., on KITTI dataset, 12% and 3% performance gains on the Siamese and motion paradigms). Our code will be available.
Published: 2024

9. Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Author: Nie, Jiahao, Xing, Yun, Zhang, Gongjie, Yan, Pei, Xiao, Aoran, Tan, Yap-Peng, Kot, Alex C., and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the na\"ive fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA., Comment: Accepted by CVPR 2024
Published: 2024

10. Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Author: Xing, Yun, Kang, Jian, Xiao, Aoran, Nie, Jiahao, Shao, Ling, and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data., Comment: NeurIPS 2023. Code is available at https://github.com/xing0047/rewrite
Published: 2023

11. OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

Author: Nie, Jiahao, He, Zhiwei, Yang, Yuxiang, Bao, Zhengyi, Gao, Mingyu, and Zhang, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network's discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.Code will be available at https://github.com/haooozi/OSP2B., Comment: Accepted to IJCAI'23. Code will be available at https://github.com/haooozi/OSP2B
Published: 2023

12. GLT-T++: Global-Local Transformer for 3D Siamese Tracking with Ranking Loss

Author: Nie, Jiahao, He, Zhiwei, Yang, Yuxiang, Lv, Xudong, Gao, Mingyu, and Zhang, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Siamese trackers based on 3D region proposal network (RPN) have shown remarkable success with deep Hough voting. However, using a single seed point feature as the cue for voting fails to produce high-quality 3D proposals. Additionally, the equal treatment of seed points in the voting process, regardless of their significance, exacerbates this limitation. To address these challenges, we propose a novel transformer-based voting scheme to generate better proposals. Specifically, a global-local transformer (GLT) module is devised to integrate object- and patch-aware geometric priors into seed point features, resulting in robust and accurate cues for offset learning of seed points. To train the GLT module, we introduce an importance prediction branch that learns the potential importance weights of seed points as a training constraint. Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds. Moreover, we identify that the highest-scored proposal in the Siamese paradigm may not be the most accurate proposal, which limits tracking performance. Towards this concern, we approach the binary score prediction task as a ranking problem, and design a target-aware ranking loss and a localization-aware ranking loss to produce accurate ranking of proposals. With the ranking losses, we further present GLT-T++, an enhanced version of GLT-T. Extensive experiments on multiple benchmarks demonstrate that our GLT-T and GLT-T++ outperform state-of-the-art methods in terms of tracking accuracy while maintaining a real-time inference speed. The source code will be made available at https://github.com/haooozi/GLT-T., Comment: Need further revision
Published: 2023

13. GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

Author: Nie, Jiahao, He, Zhiwei, Yang, Yuxiang, Gao, Mingyu, and Zhang, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current 3D single object tracking methods are typically based on VoteNet, a 3D region proposal network. Despite the success, using a single seed point feature as the cue for offset learning in VoteNet prevents high-quality 3D proposals from being generated. Moreover, seed points with different importance are treated equally in the voting process, aggravating this defect. To address these issues, we propose a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals. Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning. Subsequently, a simple yet effective training strategy is designed to train the GLT module. We develop an importance prediction branch to learn the potential importance of the seed points and treat the output weights vector as a training constraint term. By incorporating the above components together, we exhibit a superior tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the 3D single object tracking task. Besides, further ablation studies show the advantages of the proposed global-local transformer voting scheme over the original VoteNet. Code and models will be available at https://github.com/haooozi/GLT-T., Comment: Accepted to AAAI 2023. The source code and models will be available at https://github.com/haooozi/GLT-T
Published: 2022

14. Learning Localization-aware Target Confidence for Siamese Visual Tracking

Author: Nie, Jiahao, Wu, Han, He, Zhiwei, Yang, Yuxiang, Gao, Mingyu, and Dong, Zhekang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Siamese tracking paradigm has achieved great success, providing effective appearance discrimination and size estimation by the classification and regression. While such a paradigm typically optimizes the classification and regression independently, leading to task misalignment (accurate prediction boxes have no high target confidence scores). In this paper, to alleviate this misalignment, we propose a novel tracking paradigm, called SiamLA. Within this paradigm, a series of simple, yet effective localization-aware components are introduced, to generate localization-aware target confidence scores. Specifically, with the proposed localization-aware dynamic label (LADL) loss and localization-aware label smoothing (LALS) strategy, collaborative optimization between the classification and regression is achieved, enabling classification scores to be aware of location state, not just appearance similarity. Besides, we propose a separate localization branch, centered on a localization-aware feature aggregation (LAFA) module, to produce location quality scores to further modify the classification scores. Consequently, the resulting target confidence scores, are more discriminative for the location state, allowing accurate prediction boxes tend to be predicted as high scores. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, TNL2K, OTB100 and VOT2018. Our SiamLA achieves state-of-the-art performance in terms of both accuracy and efficiency. Furthermore, a stability analysis reveals that our tracking paradigm is relatively stable, implying the paradigm is potential to real-world applications.
Published: 2022

15. SDANet: Sub-domain adaptive network for multi-fault diagnosis of lithium-ion battery packs

Author: Yang, Zhi, Nie, Jiahao, He, Zhiwei, Guan, Siwei, Zheng, Xiaorong, and Gao, Mingyu
Published: 2024
Full Text: View/download PDF

16. A fine-grained feature decoupling based multi-source domain adaptation network for rotating machinery fault diagnosis

Author: Zheng, Xiaorong, Nie, Jiahao, He, Zhiwei, Li, Ping, Dong, Zhekang, and Gao, Mingyu
Published: 2024
Full Text: View/download PDF

17. A progressive multi-source domain adaptation method for bearing fault diagnosis

Author: Zheng, Xiaorong, He, Zhiwei, Nie, Jiahao, Li, Ping, Dong, Zhekang, and Gao, Mingyu
Published: 2024
Full Text: View/download PDF

18. Learning task-specific discriminative representations for multiple object tracking

Author: Wu, Han, Nie, Jiahao, Zhu, Ziming, He, Zhiwei, and Gao, Mingyu
Published: 2023
Full Text: View/download PDF

19. A reconstruction-based model with transformer and long short-term memory for internal short circuit detection in battery packs

Author: Wang, Han, Nie, Jiahao, He, Zhiwei, Gao, Mingyu, Song, Wenlong, and Dong, Zhekang
Published: 2023
Full Text: View/download PDF

20. Deep Learning-based Visual Multiple Object Tracking:A Review

Author: WU Han, NIE Jiahao, ZHANG Zhaowei, HE Zhiwei, GAO Mingyu
Subjects: multiple object tracking, computer vision, object detection, feature extraction, data association, Computer software, QA76.75-76.765, Technology (General), T1-995
Abstract: Multiple object tracking(MOT)aims to predict trajectories of all targets and maintain their identities from a given video sequence.In recent years,MOT has gained significant attention and become a hot topic in the field of computer vision due to its huge potential in academic research and practical application.Benefiting from the advancement of object detection and re-identification,the current approaches mainly split the MOT task into three subtasks:object detection,re-identification feature extraction,and data association.This idea has achieved remarkable success.However,maintaining robust tracking still remains challenging due to the factors such as occlusion and similar object interference in the tracking process.To meet the requirement of accurate,robust and real-time tracking in complex scenarios,further research and improvement of MOT algorithms are needed.Some review literature on MOT algorithms has been published.However,the existing literatures do not summarize the tracking approaches comprehensively and lack the latest research achievements.In this paper,the principle of MOT is firstly introduced,as well as the challenges in the tracking process.Then,the latest research achievements are summarized and analyzed.According to the tracking paradigm used to complete the three subtasks,the various algorithms are divided into separate detection and embedding,joint detection and embedding,and joint detection and tracking.The main characteristics of various tracking approaches are described.Afterward,the existing mainstream models are compared and analyzed on MOT challenge datasets.Finally,the future research directions are prospected by discussing the advantages and disadvantages of the current algorithms and their development trends.
Published: 2023
Full Text: View/download PDF

21. A global–local context embedding learning based sequence-free framework for state of health estimation of lithium-ion battery

Author: Bao, Zhengyi, Nie, Jiahao, Lin, Huipin, Jiang, Jiahao, He, Zhiwei, and Gao, Mingyu
Published: 2023
Full Text: View/download PDF

22. Leveraging temporal-aware fine-grained features for robust multiple object tracking

Author: Wu, Han, Nie, Jiahao, Zhu, Ziming, He, Zhiwei, and Gao, Mingyu
Published: 2023
Full Text: View/download PDF

23. FAML-RT: Feature alignment-based multi-level similarity metric learning network for a two-stage robust tracker

Author: Nie, Jiahao, Dong, Zhekang, He, Zhiwei, Wu, Han, and Gao, Mingyu
Published: 2023
Full Text: View/download PDF

24. Improved Real-Time Models for Object Detection and Instance Segmentation for Agaricus bisporus Segmentation and Localization System Using RGB-D Panoramic Stitching Images

Author: Shi, Chenbo, primary, Mo, Yuanzheng, additional, Ren, Xiangqun, additional, Nie, Jiahao, additional, Zhang, Chun, additional, Yuan, Jin, additional, and Zhu, Changsheng, additional
Published: 2024
Full Text: View/download PDF

25. DFSA-DAN: Dynamic Fusion of Statistical Metric and Adversarial Learning for Domain Adaptation Network Based Intelligent Fault Diagnosis

Author: Shao, Yining, primary, Zheng, Xiaorong, additional, He, Zhiwei, additional, Gao, Mingyu, additional, and Nie, Jiahao, additional
Published: 2024
Full Text: View/download PDF

26. TTSNet: State-of-Charge Estimation of Li-ion Battery in Electrical Vehicles with Temporal Transformer-based Sequence Network

Author: Bao, Zhengyi, primary, Nie, Jiahao, additional, Lin, Huipin, additional, Gao, Kejie, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2024
Full Text: View/download PDF

27. Dual-task Learning for Joint State-of-Charge and State-of-Energy Estimation of Lithium-ion Battery in Electric Vehicle

Author: Bao, Zhengyi, primary, Nie, Jiahao, additional, Lin, Huipin, additional, Li, Zhi, additional, Gao, Kejie, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2024
Full Text: View/download PDF

28. TPAD: Temporal-Pattern-Based Neural Network Model for Anomaly Detection in Multivariate Time Series

Author: Ma, Shenhui, primary, Guan, Siwei, additional, He, Zhiwei, additional, Nie, Jiahao, additional, and Gao, Mingyu, additional
Published: 2023
Full Text: View/download PDF

29. Identifying function level complex modules based on complex coupled networks

Author: Wang, Hongzhi, Li, Wenlong, Liu, Wei, Fan, Ping, and Nie, Jiahao
Published: 2024
Full Text: View/download PDF

30. A Reliability Analysis Method for Common Cause Failure Based on Extended Dynamic Fault Tree

Author: Xiao, Peng, primary, Deng, Ruiwen, additional, and Nie, Jiahao, additional
Published: 2023
Full Text: View/download PDF

31. OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

Author: Nie, Jiahao, primary, He, Zhiwei, additional, Yang, Yuxiang, additional, Bao, Zhengyi, additional, Gao, Mingyu, additional, and Zhang, Jing, additional
Published: 2023
Full Text: View/download PDF

32. GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

Author: Nie, Jiahao, primary, He, Zhiwei, additional, Yang, Yuxiang, additional, Gao, Mingyu, additional, and Zhang, Jing, additional
Published: 2023
Full Text: View/download PDF

33. BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

Author: Yang, Yuxiang, Deng, Yingqi, Zhang, Jing, Nie, Jiahao, Zha, Zheng-Jun, Yang, Yuxiang, Deng, Yingqi, Zhang, Jing, Nie, Jiahao, and Zha, Zheng-Jun
Abstract: 3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. It remains challenging to localize the target from surroundings due to appearance variations, distractors, and the high sparsity of point clouds. To address these issues, prior Siamese and motion-centric trackers both require elaborate designs and solving multiple subtasks. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance. Besides, to achieve accurate regression for targets with diverse attributes (e.g., sizes and motion patterns), BEVTrack constructs the likelihood function with the learned underlying distributions adapted to different targets, rather than making a fixed Laplacian or Gaussian assumption as in previous works. This provides valuable priors for tracking and thus further boosts performance. While only using a single regression loss with a plain convolutional architecture, BEVTrack achieves state-of-the-art performance on three large-scale datasets, KITTI, NuScenes, and Waymo Open Dataset while maintaining a high inference speed of about 200 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack., Comment: The code will be released at https://github.com/xmm-prio/BEVTrack
Published: 2023

34. Multivariate Time Series Anomaly Detection with Variational Autoencoder and Spatial-Temporal Graph Network in Iot

Author: Guan, Siwei, primary, He, Zhiwei, additional, Nie, Jiahao, additional, and Gao, Mingyu, additional
Published: 2023
Full Text: View/download PDF

35. Learning task-specific discriminative representations for multiple object tracking

Author: Wu, Han, primary, Nie, Jiahao, additional, Zhu, Ziming, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2022
Full Text: View/download PDF

36. MSA-MOT: Multi-Stage Association for 3D Multimodality Multi-Object Tracking

Author: Zhu, Ziming, primary, Nie, Jiahao, additional, Wu, Han, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2022
Full Text: View/download PDF

37. Hierarchical Feature Fusion based Reconstruction Network for Unsupervised Anomaly Detection

Author: Zhao, Binjie, primary, Nie, Jiahao, additional, Guan, Siwei, additional, Wang, Han, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2022
Full Text: View/download PDF

38. Spreading Fine-Grained Prior Knowledge for Accurate Tracking

Author: Nie, Jiahao, primary, Wu, Han, additional, He, Zhiwei, additional, Gao, Mingyu, additional, and Dong, Zhekang, additional
Published: 2022
Full Text: View/download PDF

39. High precision scene stitching and recognition of agaricus bisporusbased on depth camera

Author: Shao, Xiaopeng, Shi, Chenbo, Nie, Jiahao, Mo, Yuanzheng, Zhang, Chun, Zhu, Changsheng, and Zang, Xiangteng
Published: 2023
Full Text: View/download PDF

40. Leveraging temporal-aware fine-grained features for robust multiple object tracking

Author: Wu, Han, primary, Nie, Jiahao, additional, Zhu, Ziming, additional, He, Zhiwei, additional, and Gao, Mingyu, additional
Published: 2022
Full Text: View/download PDF

41. One-Shot Multiple Object Tracking in UAV Videos Using Task-Specific Fine-Grained Features

Author: Wu, Han, primary, Nie, Jiahao, additional, He, Zhiwei, additional, Zhu, Ziming, additional, and Gao, Mingyu, additional
Published: 2022
Full Text: View/download PDF

42. Learning Localization-aware Target Confidence for Siamese Visual Tracking

Author: Nie, Jiahao, primary, He, Zhiwei, additional, Yang, Yuxiang, additional, Gao, Mingyu, additional, and Dong, Zhekang, additional
Published: 2022
Full Text: View/download PDF

43. A Novel 3D Convolutional Neural Network for Action Recognition in Infrared Videos

Author: Nie, Jiahao, primary, Yan, Longbin, additional, Wang, Xiuheng, additional, and Chen, Jie, additional
Published: 2021
Full Text: View/download PDF

44. High precision scene stitching and recognition of agaricus bisporus based on depth camera.

Author: Shi, Chenbo, Nie, Jiahao, Mo, Yuanzheng, Zhang, Chun, Zhu, Changsheng, and Zang, Xiangteng
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

44 results on '"Nie, Jiahao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources