Author: "Lai, Shang Hong" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lai, Shang Hong"' showing total 652 results

Start Over Author "Lai, Shang Hong"

652 results on '"Lai, Shang Hong"'

1. HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Author: Faure, Gueter Josmy, Yeh, Jia-Fong, Chen, Min-Hung, Su, Hung-Ting, Lai, Shang-Hong, and Hsu, Winston H.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Existing research often treats long-form videos as extended short videos, leading to several limitations: inadequate capture of long-range dependencies, inefficient processing of redundant information, and failure to extract high-level semantic concepts. To address these issues, we propose a novel approach that more accurately reflects human cognition. This paper introduces HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics, a model that simulates episodic memory accumulation to capture action sequences and reinforces them with semantic knowledge dispersed throughout the video. Our work makes two key contributions: First, we develop an Episodic COmpressor (ECO) that efficiently aggregates crucial representations from micro to semi-macro levels, overcoming the challenge of long-range dependencies. Second, we propose a Semantics ReTRiever (SeTR) that enhances these aggregated representations with semantic information by focusing on the broader context, dramatically reducing feature dimensionality while preserving relevant macro-level information. This addresses the issues of redundancy and lack of high-level concept extraction. Extensive experiments demonstrate that HERMES achieves state-of-the-art performance across multiple long-video understanding benchmarks in both zero-shot and fully-supervised settings., Comment: This is an improved and expanded version of our EVAL-FoMo Workshop at ECCV'24 (v1 of this paper). Project page: https://joslefaure.github.io/assets/html/hermes.html
Published: 2024

2. CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection

Author: Hsieh, Yu-Hsuan and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To improve logical anomaly detection, some previous works have integrated segmentation techniques with conventional anomaly detection methods. Although these methods are effective, they frequently lead to unsatisfactory segmentation results and require manual annotations. To address these drawbacks, we develop an unsupervised component segmentation technique that leverages foundation models to autonomously generate training labels for a lightweight segmentation network without human labeling. Integrating this new segmentation technique with our proposed Patch Histogram module and the Local-Global Student-Teacher (LGST) module, we achieve a detection AUROC of 95.3% in the MVTec LOCO AD dataset, which surpasses previous SOTA methods. Furthermore, our proposed method provides lower latency and higher throughput than most existing approaches.
Published: 2024

3. Spatio-Temporal Context Prompting for Zero-Shot Action Detection

Author: Huang, Wei-Jhe, Chen, Min-Hung, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Spatio-temporal action detection encompasses the tasks of localizing and classifying individual actions within a video. Recent works aim to enhance this process by incorporating interaction modeling, which captures the relationship between people and their surrounding context. However, these approaches have primarily focused on fully-supervised learning, and the current limitation lies in the lack of generalization capability to recognize unseen action categories. In this paper, we aim to adapt the pretrained image-language models to detect unseen actions. To this end, we propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction. Meanwhile, our Context Prompting module will utilize contextual information to prompt labels, thereby enhancing the generation of more representative text features. Moreover, to address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism which employs pretrained visual knowledge to find each person's interest context tokens, and then these tokens will be used for prompting to generate text features tailored to each individual. To evaluate the ability to detect unseen actions, we propose a comprehensive benchmark on J-HMDB, UCF101-24, and AVA datasets. The experiments show that our method achieves superior results compared to previous approaches and can be further extended to multi-action videos, bringing it closer to real-world applications. The code and data can be found in https://webber2933.github.io/ST-CLIP-project-page., Comment: Accepted by WACV2025. Project page: https://webber2933.github.io/ST-CLIP-project-page
Published: 2024

4. Text in the Dark: Extremely Low-Light Text Image Enhancement

Author: Lin, Che-Tsung, Ng, Chun Chet, Tan, Zhi Qin, Nah, Wan Jun, Wang, Xinyu, Kew, Jie Long, Hsu, Pohao, Lai, Shang Hong, Chan, Chee Seng, and Zach, Christopher
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at https://github.com/chunchet-ng/Text-in-the-Dark., Comment: The first two authors contributed equally to this work
Published: 2024

5. Few-Shot Deep Structure-Based Camera Localization with Pose Augmentation

Author: Tsai, Cheng-Yu, Lai, Shang-Hong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
Published: 2025
Full Text: View/download PDF

6. Domain Adaptation for Machinery Fault Diagnosis Based on Critic Classifier GAN

Author: Hung, Tso-Sung, Lai, Shang-Hong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
Published: 2025
Full Text: View/download PDF

7. TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks

Author: Lee, Ho-Weng and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, the focus on anomaly detection and localization in industrial inspection tasks has intensified. While existing studies have demonstrated impressive outcomes, they often rely heavily on extensive training datasets or robust features extracted from pre-trained models trained on diverse datasets like ImageNet. In this work, we propose a novel framework leveraging the visual-linguistic CLIP model to adeptly train a backbone model tailored to the manufacturing domain. Our approach concurrently considers visual and text-aligned embedding spaces for normal and abnormal conditions. The resulting pre-trained backbone markedly enhances performance in industrial downstream tasks, particularly in anomaly detection and localization. Notably, this improvement is substantiated through experiments conducted on multiple datasets such as MVTecAD, BTAD, and KSDD2. Furthermore, using our pre-trained backbone weights allows previous works to achieve superior performance in few-shot scenarios with less training data. The proposed anomaly backbone provides a foundation model for more precise anomaly detection and localization.
Published: 2023

8. KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning

Author: Peng, Jia Luo, Chang, Keng Wei, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Kinship verification is an emerging task in computer vision with multiple potential applications. However, there's no large enough kinship dataset to train a representative and robust model, which is a limitation for achieving better performance. Moreover, face verification is known to exhibit bias, which has not been dealt with by previous kinship verification works and sometimes even results in serious issues. So we first combine existing kinship datasets and label each identity with the correct race in order to take race information into consideration and provide a larger and complete dataset, called KinRace dataset. Secondly, we propose a multi-task learning model structure with attention module to enhance accuracy, which surpasses state-of-the-art performance. Lastly, our fairness-aware contrastive loss function with adversarial learning greatly mitigates racial bias. We introduce a debias term into traditional contrastive loss and implement gradient reverse in race classification task, which is an innovative idea to mix two fairness methods to alleviate bias. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed KFC in both standard deviation and accuracy at the same time., Comment: Accepted by BMVC 2023
Published: 2023

9. ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking

Author: Cheng, Cheng-Che, Qiu, Min-Xuan, Chiang, Chen-Kuo, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-Camera Multi-Object Tracking (MC-MOT) utilizes information from multiple views to better handle problems with occlusion and crowded scenes. Recently, the use of graph-based approaches to solve tracking problems has become very popular. However, many current graph-based methods do not effectively utilize information regarding spatial and temporal consistency. Instead, they rely on single-camera trackers as input, which are prone to fragmentation and ID switch errors. In this paper, we propose a novel reconfigurable graph model that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. This two-stage association approach enables us to extract robust spatial and temporal-aware features and address the problem with fragmented tracklets. Furthermore, our model is designed for online tracking, making it suitable for real-world applications. Experimental results show that the proposed graph model is able to extract more discriminating features for object tracking, and our model achieves state-of-the-art performance on several public datasets., Comment: Accepted by ICCV2023
Published: 2023

10. A Closer Look at Geometric Temporal Dynamics for Face Anti-Spoofing

Author: Chang, Chih-Jung, Lee, Yaw-Chern, Yao, Shih-Hsuan, Chen, Min-Hung, Wang, Chien-Yi, Lai, Shang-Hong, and Chen, Trista Pei-Chun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Face anti-spoofing (FAS) is indispensable for a face recognition system. Many texture-driven countermeasures were developed against presentation attacks (PAs), but the performance against unseen domains or unseen spoofing types is still unsatisfactory. Instead of exhaustively collecting all the spoofing variations and making binary decisions of live/spoof, we offer a new perspective on the FAS task to distinguish between normal and abnormal movements of live and spoof presentations. We propose Geometry-Aware Interaction Network (GAIN), which exploits dense facial landmarks with spatio-temporal graph convolutional network (ST-GCN) to establish a more interpretable and modularized FAS model. Additionally, with our cross-attention feature interaction mechanism, GAIN can be easily integrated with other existing methods to significantly boost performance. Our approach achieves state-of-the-art performance in the standard intra- and cross-dataset evaluations. Moreover, our model outperforms state-of-the-art methods by a large margin in the cross-dataset cross-type protocol on CASIA-SURF 3DMask (+10.26% higher AUC score), exhibiting strong robustness against domain shifts and unseen spoofing types., Comment: 2023 CVPR Biometrics Workshop, Best Paper Award
Published: 2023

11. Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

Author: Huang, Wei-Jhe, Yeh, Jheng-Hsien, Chen, Min-Hung, Faure, Gueter Josmy, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The goal of spatial-temporal action detection is to determine the time and place where each person's action occurs in a video and classify the corresponding action category. Most of the existing methods adopt fully-supervised learning, which requires a large amount of training data, making it very difficult to achieve zero-shot learning. In this paper, we propose to utilize a pre-trained visual-language model to extract the representative image and text features, and model the relationship between these features through different interaction modules to obtain the interaction feature. In addition, we use this feature to prompt each label to obtain more appropriate text features. Finally, we calculate the similarity between the interaction feature and the text feature for each label to determine the action category. Our experiments on J-HMDB and UCF101-24 datasets demonstrate that the proposed interaction module and prompting make the visual-language features better aligned, thus achieving excellent accuracy for zero-shot spatio-temporal action detection. The code will be available at https://github.com/webber2933/iCLIP., Comment: Accepted by ICCV Workshop 2023 (What is Next in Multimodal Foundation Models?)
Published: 2023

12. Kinship Representation Learning with Face Componential Relation

Author: Su, Weng-Tai, Chen, Min-Hung, Wang, Chien-Yi, Lai, Shang-Hong, and Chen, Trista Pei-Chun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Kinship recognition aims to determine whether the subjects in two facial images are kin or non-kin, which is an emerging and challenging problem. However, most previous methods focus on heuristic designs without considering the spatial correlation between face images. In this paper, we aim to learn discriminative kinship representations embedded with the relation information between face components (e.g., eyes, nose, etc.). To achieve this goal, we propose the Face Componential Relation Network, which learns the relationship between face components among images with a cross-attention mechanism, which automatically learns the important facial regions for kinship recognition. Moreover, we propose Face Componential Relation Network (FaCoRNet), which adapts the loss function by the guidance from cross-attention to learn more discriminative feature representations. The proposed FaCoRNet outperforms previous state-of-the-art methods by large margins for the largest public kinship recognition FIW benchmark., Comment: ICCV 2023 Workshop (Analysis and Modeling of Faces and Gestures)
Published: 2023

13. Generalized Face Anti-Spoofing via Multi-Task Learning and One-Side Meta Triplet Loss

Author: Chuang, Chu-Chun, Wang, Chien-Yi, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the increasing variations of face presentation attacks, model generalization becomes an essential challenge for a practical face anti-spoofing system. This paper presents a generalized face anti-spoofing framework that consists of three tasks: depth estimation, face parsing, and live/spoof classification. With the pixel-wise supervision from the face parsing and depth estimation tasks, the regularized features can better distinguish spoof faces. While simulating domain shift with meta-learning techniques, the proposed one-side triplet loss can further improve the generalization capability by a large margin. Extensive experiments on four public datasets demonstrate that the proposed framework and training strategies are more effective than previous works for model generalization to unseen domains., Comment: 2023 IEEE International Conference on Automatic Face and Gesture Recognition (FG)
Published: 2022

14. MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

Author: Wang, Fu-En, Wang, Chien-Yi, Sun, Min, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Although significant progress has been made in face recognition, demographic bias still exists in face recognition systems. For instance, it usually happens that the face recognition performance for a certain demographic group is lower than the others. In this paper, we propose MixFairFace framework to improve the fairness in face recognition models. First of all, we argue that the commonly used attribute-based fairness metric is not appropriate for face recognition. A face recognition system can only be considered fair while every person has a close performance. Hence, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches. Different from previous approaches that require sensitive attribute labels such as race and gender for reducing the demographic bias, we aim at addressing the identity bias in face representation, i.e., the performance inconsistency between different identities, without the need for sensitive attribute labels. To this end, we propose MixFair Adapter to determine and reduce the identity bias of training samples. Our extensive experiments demonstrate that our MixFairFace approach achieves state-of-the-art fairness performance on all benchmark datasets., Comment: Accepted in AAAI-23; Code: https://github.com/fuenwang/MixFairFace
Published: 2022

15. Holistic Interaction Transformer Network for Action Detection

Author: Faure, Gueter Josmy, Chen, Min-Hung, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Actions are about how we interact with the environment, including other people, objects, and ourselves. In this paper, we propose a novel multi-modal Holistic Interaction Transformer Network (HIT) that leverages the largely ignored, but critical hand and pose information essential to most human actions. The proposed "HIT" network is a comprehensive bi-modal framework that comprises an RGB stream and a pose stream. Each of them separately models person, object, and hand interactions. Within each sub-network, an Intra-Modality Aggregation module (IMA) is introduced that selectively merges individual interaction units. The resulting features from each modality are then glued using an Attentive Fusion Mechanism (AFM). Finally, we extract cues from the temporal context to better classify the occurring actions using cached memory. Our method significantly outperforms previous approaches on the J-HMDB, UCF101-24, and MultiSports datasets. We also achieve competitive results on AVA. The code will be available at https://github.com/joslefaure/HIT., Comment: Accepted for WACV 2023. Code: https://github.com/joslefaure/HIT
Published: 2022

16. Extremely Low-light Image Enhancement with Scene Text Restoration

Author: Hsu, Pohao, Lin, Che-Tsung, Ng, Chun Chet, Kew, Jie-Long, Tan, Mei Yih, Lai, Shang-Hong, Chan, Chee Seng, and Zach, Christopher
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene texts, as well as the overall quality of the image simultaneously under extremely low-light images conditions. Mainly, we employed a self-regularised attention map, an edge map, and a novel text detection loss. In addition, leveraging synthetic low-light images is beneficial for image enhancement on the genuine ones in terms of text detection. The quantitative and qualitative experimental results have shown that the proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting on See In the Dark and ICDAR15 datasets.
Published: 2022

17. Text in the dark: Extremely low-light text image enhancement

Author: Lin, Che-Tsung, Ng, Chun Chet, Tan, Zhi Qin, Nah, Wan Jun, Wang, Xinyu, Kew, Jie Long, Hsu, Pohao, Lai, Shang Hong, Chan, Chee Seng, and Zach, Christopher
Published: 2025
Full Text: View/download PDF

18. Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation

Author: Zhu, Wenbin, Wang, Chien-Yi, Tseng, Kuan-Lun, Lai, Shang-Hong, and Wang, Baoyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Due to the rising concern of data privacy, it's reasonable to assume the local client data can't be transferred to a centralized server, nor their associated identity label is provided. To support continuous learning and fill the last-mile quality gap, we introduce a new problem setup called Local-Adaptive Face Recognition (LaFR). Leveraging the environment-specific local data after the deployment of the initial global model, LaFR aims at getting optimal performance by training local-adapted models automatically and un-supervisely, as opposed to fixing their initial global model. We achieve this by a newly proposed embedding cluster model based on Graph Convolution Network (GCN), which is trained via meta-optimization procedure. Compared with previous works, our meta-clustering model can generalize well in unseen local environments. With the pseudo identity labels from the clustering results, we further introduce novel regularization techniques to improve the model adaptation performance. Extensive experiments on racial and internal sensor adaptation demonstrate that our proposed solution is more effective for adapting face recognition models in each specific environment. Meanwhile, we show that LaFR can further improve the global model by a simple federated aggregation over the updated local models., Comment: CVPR 2022
Published: 2022

19. PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition

Author: Wang, Chien-Yi, Lu, Yu-Ding, Yang, Shang-Ta, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Face anti-spoofing (FAS) plays a critical role in securing face recognition systems from different presentation attacks. Previous works leverage auxiliary pixel-level supervision and domain generalization approaches to address unseen spoof types. However, the local characteristics of image captures, i.e., capturing devices and presenting materials, are ignored in existing works and we argue that such information is required for networks to discriminate between live and spoof images. In this work, we propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem. To be specific, our framework recognizes the combination of capturing devices and presenting materials based on the patches cropped from non-distorted face images. This reformulation can largely improve the data variation and enforce the network to learn discriminative feature from local capture patterns. In addition, to further improve the generalization ability of the spoof feature, we propose the novel Asymmetric Margin-based Classification Loss and Self-supervised Similarity Loss to regularize the patch embedding space. Our experimental results verify our assumption and show that the model is capable of recognizing unseen spoof types robustly by only looking at local regions. Moreover, the fine-grained and patch-level reformulation of FAS outperforms the existing approaches on intra-dataset, cross-dataset, and domain generalization benchmarks. Furthermore, our PatchNet framework can enable practical applications like Few-Shot Reference-based FAS and facilitate future exploration of spoof-related intrinsic cues., Comment: CVPR 2022
Published: 2022

20. FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition

Author: Liu, Chih-Ting, Wang, Chien-Yi, Chien, Shao-Yi, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Current state-of-the-art deep learning based face recognition (FR) models require a large number of face identities for central training. However, due to the growing privacy awareness, it is prohibited to access the face images on user devices to continually improve face recognition models. Federated Learning (FL) is a technique to address the privacy issue, which can collaboratively optimize the model without sharing the data between clients. In this work, we propose a FL based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module. The client-specific personalized model can serve the need of optimized face recognition experience for registered identities at the local device. To the best of our knowledge, we are the first to explore the personalized face recognition in FL setup. The proposed framework is validated to be superior to previous approaches on several generic and personalized face recognition benchmarks with diverse FL scenarios. The source codes and our proposed personalized FR benchmark under FL setup are available at https://github.com/jackie840129/FedFR., Comment: This paper was accepted by AAAI 2022 Conference on Artificial Intelligence and selected as an oral paper
Published: 2021

21. High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth Estimation and Mask-Guided Attention Network

Author: Chiu, Meng-Tzu, Cheng, Hsun-Ying, Wang, Chien-Yi, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning approaches have achieved highly accurate face recognition by training the models with very large face image datasets. Unlike the availability of large 2D face image datasets, there is a lack of large 3D face datasets available to the public. Existing public 3D face datasets were usually collected with few subjects, leading to the over-fitting problem. This paper proposes two CNN models to improve the RGB-D face recognition task. The first is a segmentation-aware depth estimation network, called DepthNet, which estimates depth maps from RGB face images by including semantic segmentation information for more accurate face region localization. The other is a novel mask-guided RGB-D face recognition model that contains an RGB recognition branch, a depth map recognition branch, and an auxiliary segmentation mask branch with a spatial attention module. Our DepthNet is used to augment a large 2D face image dataset to a large RGB-D face dataset, which is used for training an accurate RGB-D face recognition model. Furthermore, the proposed mask-guided RGB-D face recognition model can fully exploit the depth map and segmentation mask information and is more robust against pose variation than previous methods. Our experimental results show that DepthNet can produce more reliable depth maps from face images with the segmentation mask. Our mask-guided face recognition model outperforms state-of-the-art methods on several public 3D face datasets., Comment: IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2021
Published: 2021

22. Disentangled Representation with Dual-stage Feature Learning for Face Anti-spoofing

Author: Wang, Yu-Chun, Wang, Chien-Yi, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As face recognition is widely used in diverse security-critical applications, the study of face anti-spoofing (FAS) has attracted more and more attention. Several FAS methods have achieved promising performances if the attack types in the testing data are the same as training data, while the performance significantly degrades for unseen attack types. It is essential to learn more generalized and discriminative features to prevent overfitting to pre-defined spoof attack types. This paper proposes a novel dual-stage disentangled representation learning method that can efficiently untangle spoof-related features from irrelevant ones. Unlike previous FAS disentanglement works with one-stage architecture, we found that the dual-stage training design can improve the training stability and effectively encode the features to detect unseen attack types. Our experiments show that the proposed method provides superior accuracy than the state-of-the-art methods on several cross-type FAS benchmarks., Comment: WACV 2022
Published: 2021

23. ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images

Author: Lee, Yu-Hui and Lai, Shang-Hong
Subjects: Computer Science - Multimedia
Abstract: In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images. Our ByeGlassesGAN consists of an encoder, a face decoder, and a segmentation decoder. The encoder is responsible for extracting information from the source face image, and the face decoder utilizes this information to generate glasses-removed images. The segmentation decoder is included to predict the segmentation mask of eyeglasses and completed face region. The feature vectors generated by the segmentation decoder are shared with the face decoder, which facilitates better reconstruction results. Our experiments show that ByeGlassesGAN can provide visually appealing results in the eyeglasses-removed face images even for semi-transparent color eyeglasses or glasses with glare. Furthermore, we demonstrate significant improvement in face recognition accuracy for face images with glasses by applying our method as a pre-processing step in our face recognition experiment.
Published: 2020

24. Unified Representation Learning for Cross Model Compatibility

Author: Wang, Chien-Yi, Chang, Ya-Liang, Yang, Shang-Ta, Chen, Dong, and Lai, Shang-Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a unified representation learning framework to address the Cross Model Compatibility (CMC) problem in the context of visual search applications. Cross compatibility between different embedding models enables the visual search systems to correctly recognize and retrieve identities without re-encoding user images, which are usually not available due to privacy concerns. While there are existing approaches to address CMC in face identification, they fail to work in a more challenging setting where the distributions of embedding models shift drastically. The proposed solution improves CMC performance by introducing a light-weight Residual Bottleneck Transformation (RBT) module and a new training scheme to optimize the embedding spaces. Extensive experiments demonstrate that our proposed solution outperforms previous approaches by a large margin for various challenging visual search scenarios of face recognition and person re-identification., Comment: To appear in British Machine Vision Conference (BMVC) 2020
Published: 2020

25. Cycle-object consistency for image-to-image domain adaptation

Author: Lin, Che-Tsung, Kew, Jie-Long, Chan, Chee Seng, Lai, Shang-Hong, and Zach, Christopher
Published: 2023
Full Text: View/download PDF

26. Confidence-Aware Anomaly Detection in Human Actions

Author: Wu, Tsung-Hsuan, Yang, Chun-Lung, Chiu, Li-Ling, Wang, Ting-Wei, Faure, Gueter Josmy, Lai, Shang-Hong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wallraven, Christian, editor, Liu, Qingshan, editor, and Nagahara, Hajime, editor
Published: 2022
Full Text: View/download PDF

27. 3D Object Detection from Consecutive Monocular Images

Author: Cheng, Chia-Chun, Lai, Shang-Hong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ishikawa, Hiroshi, editor, Liu, Cheng-Lin, editor, Pajdla, Tomas, editor, and Shi, Jianbo, editor
Published: 2021
Full Text: View/download PDF

28. DeepRoom: 3D Room Layout and Pose Estimation from a Single Image

Author: Lin, Hung Jin, Lai, Shang-Hong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Palaiahnakote, Shivakumara, editor, Sanniti di Baja, Gabriella, editor, Wang, Liang, editor, and Yan, Wei Qi, editor
Published: 2020
Full Text: View/download PDF

29. Group Activity Recognition via Computing Human Pose Motion History and Collective Map from Video

Author: Chen, Hsing-Yu, Lai, Shang-Hong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Palaiahnakote, Shivakumara, editor, Sanniti di Baja, Gabriella, editor, Wang, Liang, editor, and Yan, Wei Qi, editor
Published: 2020
Full Text: View/download PDF

30. Y-Net: Learning Domain Robust Feature Representation for ground camera image and large-scale image-based point cloud registration

Author: Liu, Weiquan, Wang, Cheng, Chen, Shuting, Bian, Xuesheng, Lai, Baiqi, Shen, Xuelun, Cheng, Ming, Lai, Shang-Hong, Weng, Dongdong, and Li, Jonathan
Published: 2021
Full Text: View/download PDF

31. Multi-Modal Pedestrian Crossing Intention Prediction with Transformer-Based Model.

Author: Wang, Ting-Wei, Lai, Shang-Hong, Wang, Jia-Ching, Wang, Hsin-Min, Peng, Wen-Hsiao, and Yeh, Chia-Hung
Subjects: DRIVER assistance systems, COMPUTER vision, TRAFFIC safety, PREDICTION models, INFORMATION resources
Abstract: Pedestrian crossing intention prediction based on computer vision plays a pivotal role in enhancing the safety of autonomous driving and advanced driver assistance systems. In this paper, we present a novel multi-modal pedestrian crossing intention prediction framework leveraging the transformer model. By integrating diverse sources of information and leveraging the transformer's sequential modeling and parallelization capabilities, our system accurately predicts pedestrian crossing intentions. We introduce a novel representation of traffic environment data and incorporate lifted 3D human pose and head orientation data to enhance the model's understanding of pedestrian behavior. Experimental results demonstrate the state-of-the-art accuracy of our proposed system on benchmark datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Author: Huang, Yi, Lai, Shang-Hong, Tai, Shao-Heng, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Leal-Taixé, Laura, editor, and Roth, Stefan, editor
Published: 2019
Full Text: View/download PDF

33. Multi-task CNN for restoring corrupted fingerprint images

Author: Wong, Wei Jing and Lai, Shang-Hong
Published: 2020
Full Text: View/download PDF

34. AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

Author: Huang, Sheng-Wei, Lin, Che-Tsung, Chen, Shu-Ping, Wu, Yen-Yi, Hsu, Po-Hao, Lai, Shang-Hong, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Ferrari, Vittorio, editor, Hebert, Martial, editor, Sminchisescu, Cristian, editor, and Weiss, Yair, editor
Published: 2018
Full Text: View/download PDF

35. 3D Object Detection from Consecutive Monocular Images

Author: Cheng, Chia-Chun, primary and Lai, Shang-Hong, additional
Published: 2021
Full Text: View/download PDF

36. Image captioning by incorporating affective concepts learned from both visual and textual components

Author: Yang, Jufeng, Sun, Yan, Liang, Jie, Ren, Bo, and Lai, Shang-Hong
Published: 2019
Full Text: View/download PDF

37. Rethinking Long-Tailed Visual Recognition with Dynamic Probability Smoothing and Frequency Weighted Focusing

Author: Nah, Wan Jun, primary, Chet Ng, Chun, additional, Lin, Che-Tsung, additional, Lee, Yeong Khang, additional, Long Kew, Jie, additional, Tan, Zhi Qin, additional, Seng Chan, Chee, additional, Zach, Christopher, additional, and Lai, Shang-Hong, additional
Published: 2023
Full Text: View/download PDF

38. Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

Author: Huang, Wei-Jhe, primary, Yeh, Jheng-Hsien, additional, Chen, Min-Hung, additional, Faure, Gueter Josmy, additional, and Lai, Shang-Hong, additional
Published: 2023
Full Text: View/download PDF

39. Kinship Representation Learning with Face Componential Relation

Author: Su, Wen-Tai, primary, Chen, Min-Hung, additional, Wang, Chien-Yi, additional, Lai, Shang-Hong, additional, and Chen, Trista, additional
Published: 2023
Full Text: View/download PDF

40. ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images

Author: Lee, Yu-Hui, primary and Lai, Shang-Hong, additional
Published: 2020
Full Text: View/download PDF

41. Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network

Author: Weng, Ching-Hua, Lai, Ying-Hsiu, Lai, Shang-Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Chu-Song, editor, Lu, Jiwen, editor, and Ma, Kai-Kuang, editor
Published: 2017
Full Text: View/download PDF

42. Relationship between sonography of sternocleidomastoid muscle and cervical passive range of motion in infants with congenital muscular torticollis

Author: Lin, Chu-Hsu, Hsu, Hung-Chih, Hou, Yu-Jen, Chen, Kai-Hua, Lai, Shang-Hong, and Chang, Wen-Ming
Published: 2018
Full Text: View/download PDF

43. Boosting Unsupervised Domain Adaptation for 3D Object Detection in Point Clouds with 2D Image Semantic Information

Author: Ku, Chun-Chieh, primary, Chen, Tsung-Yu, additional, and Lai, Shang-Hong, additional
Published: 2023
Full Text: View/download PDF

44. Video synthesis from stereo videos with iterative depth refinement

Author: Wei, Chen-Hao, Lai, Shang-Hong, and Chiang, Chen-Kuo
Published: 2017
Full Text: View/download PDF

45. Hierarchical Interpolation-Based Disocclusion Region Recovery for Two-View to N-View Conversion System

Author: Lin, Wun-Ting, Yeh, Chen-Ting, Lai, Shang-Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ho, Yo-Sung, editor, Sang, Jitao, editor, Ro, Yong Man, editor, Kim, Junmo, editor, and Wu, Fei, editor
Published: 2015
Full Text: View/download PDF

46. Sparse Representation Based Approach for RGB-D Hand Gesture Recognition

Author: Su, Te-Feng, Fan, Chin-Yun, Lin, Meng-Hsuan, Lai, Shang-Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ho, Yo-Sung, editor, Sang, Jitao, editor, Ro, Yong Man, editor, Kim, Junmo, editor, and Wu, Fei, editor
Published: 2015
Full Text: View/download PDF

47. Integrated Vehicle and Lane Detection with Distance Estimation

Author: Chen, Yu-Chun, Su, Te-Feng, Lai, Shang-Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Jawahar, C. V., editor, and Shan, Shiguang, editor
Published: 2015
Full Text: View/download PDF

48. 3D Reconstruction with Automatic Foreground Segmentation from Multi-view Images Acquired from a Mobile Device

Author: Kuo, Ping-Cheng, Chen, Chao-An, Chang, Hsing-Chun, Su, Te-Feng, Lai, Shang-Hong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Jawahar, C. V., editor, and Shan, Shiguang, editor
Published: 2015
Full Text: View/download PDF

49. MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

Author: Wang, Fu-En, primary, Wang, Chien-Yi, additional, Sun, Min, additional, and Lai, Shang-Hong, additional
Published: 2023
Full Text: View/download PDF

50. Robust Multi-Object Tracking With Spatial Uncertainty

Author: Liao, Pin-Jie, primary, Huang, Yu-Cheng, additional, Chiang, Chen-Kuo, additional, and Lai, Shang-Hong, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

652 results on '"Lai, Shang Hong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources