Author: "Cheng, Xuelian" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cheng, Xuelian"' showing total 7 results

Start Over Author "Cheng, Xuelian" Database arXiv

7 results on '"Cheng, Xuelian"'

1. OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Author: Hu, Ming, Xia, Peng, Wang, Lin, Yan, Siyuan, Tang, Feilong, Xu, Zhongxing, Luo, Yimin, Song, Kaimin, Leitner, Jurgen, Cheng, Xuelian, Cheng, Jun, Liu, Chi, Zhou, Kaijing, and Ge, Zongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Surgical scene perception via videos is critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets face challenges such as small scale, lack of diversity in surgery and phase categories, and absence of time-localized annotations. These limitations impede action understanding and model generalization validation in complex and diverse real-world surgical scenarios. To address this gap, we introduce OphNet, a large-scale, expert-annotated video benchmark for ophthalmic surgical workflow understanding. OphNet features: 1) A diverse collection of 2,278 surgical videos spanning 66 types of cataract, glaucoma, and corneal surgeries, with detailed annotations for 102 unique surgical phases and 150 fine-grained operations. 2) Sequential and hierarchical annotations for each surgery, phase, and operation, enabling comprehensive understanding and improved interpretability. 3) Time-localized annotations, facilitating temporal localization and prediction tasks within surgical workflows. With approximately 285 hours of surgical videos, OphNet is about 20 times larger than the largest existing surgical workflow analysis benchmark. Code and dataset are available at: https://minghu0830.github.io/OphNet-benchmark/., Comment: Accepted by ECCV 2024
Published: 2024

2. EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

Author: Zha, Ruyi, Cheng, Xuelian, Li, Hongdong, Harandi, Mehrtash, and Ge, Zongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Reconstructing soft tissues from stereo endoscope videos is an essential prerequisite for many medical applications. Previous methods struggle to produce high-quality geometry and appearance due to their inadequate representations of 3D scenes. To address this issue, we propose a novel neural-field-based method, called EndoSurf, which effectively learns to represent a deforming surface from an RGBD sequence. In EndoSurf, we model surface dynamics, shape, and texture with three neural fields. First, 3D points are transformed from the observed space to the canonical space using the deformation field. The signed distance function (SDF) field and radiance field then predict their SDFs and colors, respectively, with which RGBD images can be synthesized via differentiable volume rendering. We constrain the learned shape by tailoring multiple regularization strategies and disentangling geometry and appearance. Experiments on public endoscope datasets demonstrate that EndoSurf significantly outperforms existing solutions, particularly in reconstructing high-fidelity shapes. Code is available at https://github.com/Ruyi-Zha/endosurf.git., Comment: MICCAI 2023(Oral, Student Travel Award, Top 3%); Ruyi Zha and Xuelian Cheng made equal contributions. Corresponding author: Ruyi Zha (ruyi.zha@gmail.com)
Published: 2023

3. Deep Laparoscopic Stereo Matching with Transformers

Author: Cheng, Xuelian, Zhong, Yiran, Harandi, Mehrtash, Drummond, Tom, Wang, Zhiyong, and Ge, Zongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet., Comment: Accepted to MICCAI 2022; Xuelian Cheng and Yiran Zhong made equal contributions. Code:https://github.com/XuelianCheng/HybridStereoNet-main.git
Published: 2022

4. Implicit Motion Handling for Video Camouflaged Object Detection

Author: Cheng, Xuelian, Xiong, Huan, Fan, Deng-Ping, Zhong, Yiran, Harandi, Mehrtash, Drummond, Tom, and Ge, Zongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. An essential property of camouflaged objects is that they usually exhibit patterns similar to the background and thus make them hard to identify from still images. Therefore, effectively handling temporal dynamics in videos becomes the key for the VCOD task as the camouflaged objects will be noticeable when they move. However, current VCOD methods often leverage homography or optical flows to represent motions, where the detection error may accumulate from both the motion estimation error and the segmentation error. On the other hand, our method unifies motion estimation and object segmentation within a single optimization framework. Specifically, we build a dense correlation volume to implicitly capture motions between neighbouring frames and utilize the final segmentation supervision to optimize the implicit motion estimation and segmentation jointly. Furthermore, to enforce temporal consistency within a video sequence, we jointly utilize a spatio-temporal transformer to refine the short-term predictions. Extensive experiments on VCOD benchmarks demonstrate the architectural effectiveness of our approach. We also provide a large-scale VCOD dataset named MoCA-Mask with pixel-level handcrafted ground-truth masks and construct a comprehensive VCOD benchmark with previous methods to facilitate research in this direction. Dataset Link: https://xueliancheng.github.io/SLT-Net-project., Comment: Accepted to CVPR 2022; Xuelian Cheng and Huan Xiong made equal contributions; Corresponding author: Deng-Ping Fan (dengpfan@gmail.com). Dataset: https://xueliancheng.github.io/SLT-Net-project
Published: 2022

5. Hierarchical Neural Architecture Search for Deep Stereo Matching

Author: Cheng, Xuelian, Zhong, Yiran, Harandi, Mehrtash, Dai, Yuchao, Chang, Xiaojun, Drummond, Tom, Li, Hongdong, and Ge, Zongyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo., Comment: Accepted at NeurIPS 2020; Xuelian Cheng and Yiran Zhong made equal contribution
Published: 2020

6. Noise-Aware Unsupervised Deep Lidar-Stereo Fusion

Author: Cheng, Xuelian, Zhong, Yiran, Dai, Yuchao, Ji, Pan, and Li, Hongdong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps. By introducing a novel "Feedback Loop'' to connect the network input with output, LidarStereoNet could tackle both noisy Lidar points and misalignment between sensors that have been ignored in existing Lidar-stereo fusion studies. Besides, we propose to incorporate a piecewise planar model into network learning to further constrain depths to conform to the underlying 3D geometry. Extensive quantitative and qualitative evaluations on both real and synthetic datasets demonstrate the superiority of our method, which outperforms state-of-the-art stereo matching, depth completion and Lidar-Stereo fusion approaches significantly., Comment: Accepted at CVPR2019
Published: 2019

7. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Author: Li, Bo, He, Mingyi, Cheng, Xuelian, Chen, Yucheng, and Dai, Yuchao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method.
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Cheng, Xuelian"'

1. OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

2. EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

3. Deep Laparoscopic Stereo Matching with Transformers

4. Implicit Motion Handling for Video Camouflaged Object Detection

5. Hierarchical Neural Architecture Search for Deep Stereo Matching

6. Noise-Aware Unsupervised Deep Lidar-Stereo Fusion

7. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

7 results on '"Cheng, Xuelian"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources