193 results on '"Kot, Alex C."'
Search Results
2. Towards Data-Centric Face Anti-spoofing: Improving Cross-Domain Generalization via Physics-Based Data Synthesis
- Author
-
Cai, Rizhao, Soh, Cecelia, Yu, Zitong, Li, Haoliang, Yang, Wenhan, and Kot, Alex C.
- Published
- 2024
- Full Text
- View/download PDF
3. Mining Generalized Multi-timescale Inconsistency for Detecting Deepfake Videos
- Author
-
Yu, Yang, Ni, Rongrong, Yang, Siyuan, Ni, Yu, Zhao, Yao, and Kot, Alex C.
- Published
- 2024
- Full Text
- View/download PDF
4. Beyond Learned Metadata-Based Raw Image Reconstruction
- Author
-
Wang, Yufei, Yu, Yi, Yang, Wenhan, Guo, Lanqing, Chau, Lap-Pui, Kot, Alex C., and Wen, Bihan
- Published
- 2024
- Full Text
- View/download PDF
5. GTADT: Gated tone-sensitive acne grading via augmented domain transfer
- Author
-
Tan, Min, Wang, Ruirui, Purwar, Ankur, Jin, Tao, Yu, Jun, and Kot, Alex C
- Published
- 2024
- Full Text
- View/download PDF
6. A unified deep semantic expansion framework for domain-generalized person re-identification
- Author
-
Ang, Eugene P.W., Lin, Shan, and Kot, Alex C.
- Published
- 2024
- Full Text
- View/download PDF
7. Skeleton-based relational reasoning for group activity analysis
- Author
-
Perez, Mauricio, Liu, Jun, and Kot, Alex C.
- Published
- 2022
- Full Text
- View/download PDF
8. Detection of HEVC double compression with non-aligned GOP structures via inter-frame quality degradation analysis
- Author
-
Xu, Qiang, Jiang, Xinghao, Sun, Tanfeng, and Kot, Alex C.
- Published
- 2021
- Full Text
- View/download PDF
9. Detection of transcoded HEVC videos based on in-loop filtering and PU partitioning analyses
- Author
-
Xu, Qiang, Jiang, Xinghao, Sun, Tanfeng, and Kot, Alex C.
- Published
- 2021
- Full Text
- View/download PDF
10. Unsupervised Domain Adaptation in the Wild via Disentangling Representation Learning
- Author
-
Li, Haoliang, Wan, Renjie, Wang, Shiqi, and Kot, Alex C.
- Published
- 2021
- Full Text
- View/download PDF
11. Face Image Reflection Removal
- Author
-
Wan, Renjie, Shi, Boxin, Li, Haoliang, Duan, Ling-Yu, and Kot, Alex C.
- Published
- 2021
- Full Text
- View/download PDF
12. A scale adaptive network for crowd counting
- Author
-
Zhang, Youmei, Zhou, Chunluan, Chang, Faliang, and Kot, Alex C.
- Published
- 2019
- Full Text
- View/download PDF
13. DeepShoe: An improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval
- Author
-
Zhan, Huijing, Shi, Boxin, Duan, Ling-Yu, and Kot, Alex C.
- Published
- 2019
- Full Text
- View/download PDF
14. One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
- Author
-
Yang, Siyuan, Liu, Jun, Lu, Shijian, Hwa, Er Meng, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins., 8 pages, 4 figures, 6 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
- Published
- 2023
15. Self-Supervised 3D Action Representation Learning with Skeleton Cloud Colorization
- Author
-
Yang, Siyuan, Liu, Jun, Lu, Shijian, Hwa, Er Meng, Hu, Yongjian, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
3D Skeleton-based human action recognition has attracted increasing attention in recent years. Most of the existing work focuses on supervised learning which requires a large number of labeled action sequences that are often expensive and time-consuming to annotate. In this paper, we address self-supervised 3D action representation learning for skeleton-based action recognition. We investigate self-supervised representation learning and design a novel skeleton cloud colorization technique that is capable of learning spatial and temporal skeleton representations from unlabeled skeleton sequence data. We represent a skeleton action sequence as a 3D skeleton cloud and colorize each point in the cloud according to its temporal and spatial orders in the original (unannotated) skeleton sequence. Leveraging the colorized skeleton point cloud, we design an auto-encoder framework that can learn spatial-temporal features from the artificial color labels of skeleton joints effectively. Specifically, we design a two-steam pretraining network that leverages fine-grained and coarse-grained colorization to learn multi-scale spatial-temporal features. In addition, we design a Masked Skeleton Cloud Repainting task that can pretrain the designed auto-encoder framework to learn informative representations. We evaluate our skeleton cloud colorization approach with linear classifiers trained under different configurations, including unsupervised, semi-supervised, fully-supervised, and transfer learning settings. Extensive experiments on NTU RGB+D, NTU RGB+D 120, PKU-MMD, NW-UCLA, and UWA3D datasets show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins and achieves competitive performance in supervised 3D action recognition as well., This work is an extension of our ICCV 2021 paper [arXiv:2108.01959] https://openaccess.thecvf.com/content/ICCV2021/html/Yang_Skeleton_Cloud_Colorization_for_Unsupervised_3D_Action_Representation_Learning_ICCV_2021_paper.html
- Published
- 2023
16. Temporal Coherent Test-Time Optimization for Robust Video Classification
- Author
-
Yi, Chenyu, Yang, Siyuan, Wang, Yufei, Li, Haoliang, Tan, Yap-Peng, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep neural networks are likely to fail when the test data is corrupted in real-world deployment (e.g., blur, weather, etc.). Test-time optimization is an effective way that adapts models to generalize to corrupted data during testing, which has been shown in the image domain. However, the techniques for improving video classification corruption robustness remain few. In this work, we propose a Temporal Coherent Test-time Optimization framework (TeCo) to utilize spatio-temporal information in test-time optimization for robust video classification. To exploit information in video with self-supervised learning, TeCo uses global content from video clips and optimizes models for entropy minimization. TeCo minimizes the entropy of the prediction based on the global content from video clips. Meanwhile, it also feeds local content to regularize the temporal coherence at the feature level. TeCo retains the generalization ability of various video classification models and achieves significant improvements in corruption robustness across Mini Kinetics-C and Mini SSV2-C. Furthermore, TeCo sets a new baseline in video classification corruption robustness via test-time optimization.
- Published
- 2023
17. A two-stage quality measure for mobile phone captured 2D barcode images
- Author
-
Chen, Changsheng, Kot, Alex C., and Yang, Huijuan
- Published
- 2013
- Full Text
- View/download PDF
18. Variational Disentanglement for Domain Generalization
- Author
-
Wang, Yufei, Li, Haoliang, Cheng, Hao, Wen, Bihan, Chau, Lap-Pui, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Domain generalization aims to learn an invariant model that can generalize well to the unseen target domain. In this paper, we propose to tackle the problem of domain generalization by delivering an effective framework named Variational Disentanglement Network (VDN), which is capable of disentangling the domain-specific features and task-specific features, where the task-specific features are expected to be better generalized to unseen but related test data. We further show the rationale of our proposed method by proving that our proposed framework is equivalent to minimize the evidence upper bound of the divergence between the distribution of task-specific features and its invariant ground truth derived from variational inference. We conduct extensive experiments to verify our method on three benchmarks, and both quantitative and qualitative results illustrate the effectiveness of our method., Accepted to TMLR 2022
- Published
- 2021
19. Asymmetric Modality Translation for Face Presentation Attack Detection.
- Author
-
Li, Zhi, Li, Haoliang, Luo, Xin, Hu, Yongjian, Lam, Kwok-Yan, and Kot, Alex C.
- Abstract
Face presentation attack detection (PAD) is an essentialmeasure to protect face recognition systems from being spoofed by malicious users and has attracted great attention from both academia and industry. Although most of the existing methods can achieve desired performance to some extent, the generalization issue of face presentation attack detection under cross-domain settings (e.g., the setting of unseen attacks and varying illumination) remains to be solved. In this paper, we propose a novel framework based on asymmetric modality translation for face presentation attack detection in bi-modality scenarios. Under the framework, we establish connections between two modality images of genuine faces. Specifically, a novel modality fusion scheme is presented that the image of one modality is translated to the other one through an asymmetric modality translator, then fused with its corresponding paired image. The fusion result is fed as the input to a discriminator for inference. The training of the translator is supervised by an asymmetric modality translation loss. Besides, an illumination normalization module based on Pattern of Local Gravitational Force (PLGF) representation is used to reduce the impact of illumination variation. We conduct extensive experiments on three public datasets, which validate that our method is effective in detecting various types of attacks and achieves state-of-the-art performance under different evaluation protocols. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning
- Author
-
Yang, Siyuan, Liu, Jun, Lu, Shijian, Er, Meng Hwa, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Skeleton-based human action recognition has attracted increasing attention in recent years. However, most of the existing works focus on supervised learning which requiring a large number of annotated action sequences that are often expensive to collect. We investigate unsupervised representation learning for skeleton action recognition, and design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data. Specifically, we represent a skeleton action sequence as a 3D skeleton cloud and colorize each point in the cloud according to its temporal and spatial orders in the original (unannotated) skeleton sequence. Leveraging the colorized skeleton point cloud, we design an auto-encoder framework that can learn spatial-temporal features from the artificial color labels of skeleton joints effectively. We evaluate our skeleton cloud colorization approach with action classifiers trained under different configurations, including unsupervised, semi-supervised and fully-supervised settings. Extensive experiments on NTU RGB+D and NW-UCLA datasets show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins, and it achieves competitive performance in supervised 3D action recognition as well., This paper is accepted by ICCV2021
- Published
- 2021
21. Steganalysis of halftone image using inverse halftoning
- Author
-
Cheng, Jun and Kot, Alex C.
- Published
- 2009
- Full Text
- View/download PDF
22. Individuality of alphabet knowledge in online writer identification
- Author
-
Tan, Guo Xian, Viard-Gaudin, Christian, and Kot, Alex C.
- Published
- 2010
- Full Text
- View/download PDF
23. One-Class Knowledge Distillation for Face Presentation Attack Detection.
- Author
-
Li, Zhi, Cai, Rizhao, Li, Haoliang, Lam, Kwok-Yan, Hu, Yongjian, and Kot, Alex C.
- Abstract
Face presentation attack detection (PAD) has been extensively studied by research communities to enhance the security of face recognition systems. Although existing methods have achieved good performance on testing data with similar distribution as the training data, their performance degrades severely in application scenarios with data of unseen distributions. In situations where the training and testing data are drawn from different domains, a typical approach is to apply domain adaptation techniques to improve face PAD performance with the help of target domain data. However, it has always been a non-trivial challenge to collect sufficient data samples in the target domain, especially for attack samples. This paper introduces a teacher-student framework to improve the cross-domain performance of face PAD with one-class domain adaptation. In addition to the source domain data, the framework utilizes only a few genuine face samples of the target domain. Under this framework, a teacher network is trained with source domain samples to provide discriminative feature representations for face PAD. Student networks are trained to mimic the teacher network and learn similar representations for genuine face samples of the target domain. In the test phase, the similarity score between the representations of the teacher and student networks is used to distinguish attacks from genuine ones. To evaluate the proposed framework under one-class domain adaptation settings, we devised two new protocols and conducted extensive experiments. The experimental results show that our method outperforms baselines under one-class domain adaptation settings and even state-of-the-art methods with unsupervised domain adaptation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Motion-Adaptive Detection of HEVC Double Compression With the Same Coding Parameters.
- Author
-
Xu, Qiang, Jiang, Xinghao, Sun, Tanfeng, and Kot, Alex C.
- Abstract
High Efficiency Video Coding (HEVC) double compression detection is of prime significance in video forensics. However, double compression with the same parameters and video content with high motion displacement intensity have become two main factors that limit the performance of existing algorithms. To address these issues, a novel motion-adaptive algorithm is proposed in this paper. Firstly, the analysis of GOP structure in HEVC standard and the coding process of HEVC double compression are provided. Next, sub-features composed of fluctuation intensities of intra prediction modes and unstable Prediction Units (PUs) in normal Intra-Frames (I-frames) and optical flow in adaptive I-frames are exploited in our algorithm. Each sub-feature is extracted during the process of multiple decompression. We further combine these sub-features into a 27-dimensional detection feature, which is finally fed to the Support Vector Machine (SVM) classifier. By following a separation-fusion detection strategy, the experimental result shows that the proposed algorithm outperforms the existing state-of-the-art methods and demonstrates superior robustness to various motion displacement intensities and a wide variety of coding parameter settings. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Performance of space-time codes: Gallager bounds and weight enumeration
- Author
-
Ling, Cong, Li, Kwok H., and Kot, Alex C.
- Subjects
Coding theory -- Research ,Communications circuits -- Design and construction ,Information theory -- Research - Abstract
Since the standard union bound for space-time codes may diverge in quasi-static fading channels, the limit-before-average (LBA) technique has been exploited to derive tight performance bounds. However, it suffers from the computational burden arising from a multidimensional integral. In this paper, efficient bounding techniques for space--time codes are developed in the framework of Gallager bounds. Two closed-form upper bounds, the ellipsoidal bound and the spherical bound, are proposed that come close to simulation results within a few tenths of a decibel. In addition, two novel methods of weight enumeration operating on a further reduced state diagram are presented, which, in conjunction with the bounding techniques, give a thorough treatment of performance bounds for space--time codes. Index Terms--Characteristic function, ellipsoidal bound, Gallager bounds, performance bounds, space-time coding, spherical bound, weight spectrum.
- Published
- 2008
26. Gallager bounds for noncoherent decoders in fading channels
- Author
-
Ling, Cong, Wu, Xiaofu, Li, Kwok Hung, and Kot, Alex C.
- Subjects
Decoders -- Design and construction ,Communications circuits -- Design and construction ,Information theory -- Research - Abstract
Recently, Gallager's bounding techniques have been used to derive tight performance bounds for coded systems in fading channels. Most works in this field have thus far dealt with coherent decoding. This paper develops Gallager bounds for noncoherent systems in fading channels. Unlike coherent decoding, the exact error probability of a noncoherent decoder/detector conditioned on the fading coefficients does not admit a closed-form expression. This difficulty is overcome in this paper by employing the Chernoff technique. Although it weakens the bounds to some extent, the Chernoff technique enables the derivations of the limit-before-average (LBA) bound and Gallager bounds in closed form for noncoherent fading channels. Numerical examples show that the proposed bounds are convergent and are tighter than the conventional union bound. Index Terms--Characteristic function, Chernoff bound, fading channels, Gallager bounds, noncoherent decoding.
- Published
- 2007
27. Objective distortion measure for binary text image based on edge line segment similarity
- Author
-
Jun Cheng and Kot, Alex C.
- Subjects
Image processing -- Methods ,Electric distortion -- Analysis ,Business ,Computers ,Electronics ,Electronics and electrical industries - Abstract
The article proposes new approach to measure the distortion introduced by changing individual edge pixels in binary text images and the approach considers not only how many pixels are changed but also where the pixels are changed and how the flipping affects the overall shape formed by the edge line.
- Published
- 2007
28. New Phase Adjustment and Channel Estimation Methods for RF Combining Applicable in Mobile Terminals
- Author
-
Liang, Ying-Chang and Kot, Alex C.
- Published
- 2003
- Full Text
- View/download PDF
29. On decision-feedback detection of differential space-time modulation in continuous fading
- Author
-
Ling, Cong, Li, Kwok H., and Kot, Alex C.
- Abstract
We show that linear prediction (LP)-based decision-feedback detection (DFD) for nondiagonal differential space-time modulation (DSTM) may suffer from a severe performance degradation in continuously fading channels. DSTM constellations that incur no degradation in LP-DFD are identified as those with a diagonal generator. To eater to other constellations, we propose a low-complexity DFD scheme by inserting decision-feedback symbols into the metric of multiple-symbol differential detection. Index Terms--Continuous fading, decision-feedback detection (DFD), differential space-time modulation (DSTM), linear prediction (LP), multiple-symbol differential detection (MSDD).
- Published
- 2004
30. Noncoherent sequence detection of differential space-time modulation
- Author
-
Ling, Cong, Li, Kwok H., and Kot, Alex C.
- Subjects
Space and time -- Research - Abstract
Approximate maximum-likelihood noncoherent sequence detection (NSD) for differential space-time modulation (DSTM) in time-selective fading channels is proposed. The starting point is the optimum multiple-symbol differential detection for DSTM that is characterized by exponential complexity. By truncating the memory of the incremental metric, a finite-state trellis is obtained so that a Viterbi algorithm can be implemented to perform sequence detection. Compared to existing linear predictive receivers, a distinguished feature of NSD is that it can accommodate nondiagonal constellations in continuous fading. Error analysis demonstrates that significant improvement in performance is achievable over linear prediction receivers. By incorporating the reduced-state sequence detection techniques, performance and complexity tradeoffs can be controlled by the branch memory and trellis size. Numerical results show that most of the performance gain can be achieved by using an L-state trellis, where L is the size of the DSTM constellation. Index Terms--Differential space-time modulation (DSTM), multiple-symbol differential detection, noncoherent sequence detection (NSD), time-selective fading, Viterbi decoding.
- Published
- 2003
31. Multisampling decision-feedback linear prediction receivers for differential space-time modulation over Rayleigh fast-fading channels
- Author
-
Ling, Cong, Li, Kwok Hung, Kot, Alex C., and Zhang, Q.T.
- Subjects
Digital signal processor ,Signal processing -- Research - Abstract
Novel decision-feedback (DF) linear prediction (LP) receivers, which process multiple samples per symbol interval in conjunction with optimal sample combining, are proposed for differential space-time modulation (DSTM) over Rayleigh fast-fading channels. Performance analysis demonstrates that multisampling DF-LP receivers outperform their symbol-rate sampling counterpart in fast fading substantially. In addition, an asymptotically tight upper bound on the pairwise error probability is derived. In view of this bound, the design criterion of DSTM for fast fading is the same as that for block-wise static lading. To avoid the estimation of the second-order statistics of the channel, a polynomial-model-based DF-LP receiver is proposed. It can approach the performance of the optimum DF-LP receiver at high signal-to noise ratios, provided fading is moderate. Index Terms--Differential detection, diversity combining, linear prediction (LP), space-time modulation, time-selective fading.
- Published
- 2003
32. Towards More Efficient Security Inspection via Deep Learning: A Task-Driven X-ray Image Cropping Scheme.
- Author
-
Nguyen, Hong Duc, Cai, Rizhao, Zhao, Heng, Kot, Alex C., and Wen, Bihan
- Subjects
DEEP learning ,X-ray imaging ,ASPECT ratio (Images) ,X-ray detection ,OBJECT recognition (Computer vision) ,PUBLIC transit ,X-ray scattering - Abstract
X-ray imaging machines are widely used in border control checkpoints or public transportation, for luggage scanning and inspection. Recent advances in deep learning enabled automatic object detection of X-ray imaging results to largely reduce labor costs. Compared to tasks on natural images, object detection for X-ray inspection are typically more challenging, due to the varied sizes and aspect ratios of X-ray images, random locations of the small target objects within the redundant background region, etc. In practice, we show that directly applying off-the-shelf deep learning-based detection algorithms for X-ray imagery can be highly time-consuming and ineffective. To this end, we propose a Task-Driven Cropping scheme, dubbed TDC, for improving the deep image detection algorithms towards efficient and effective luggage inspection via X-ray images. Instead of processing the whole X-ray images for object detection, we propose a two-stage strategy, which first adaptively crops X-ray images and only preserves the task-related regions, i.e., the luggage regions for security inspection. A task-specific deep feature extractor is used to rapidly identify the importance of each X-ray image pixel. Only the regions that are useful and related to the detection tasks are kept and passed to the follow-up deep detector. The varied-scale X-ray images are thus reduced to the same size and aspect ratio, which enables a more efficient deep detection pipeline. Besides, to benchmark the effectiveness of X-ray image detection algorithms, we propose a novel dataset for X-ray image detection, dubbed SIXray-D, based on the popular SIXray dataset. In SIXray-D, we provide the complete and more accurate annotations of both object classes and bounding boxes, which enables model training for supervised X-ray detection methods. Our results show that our proposed TDC algorithm can effectively boost popular detection algorithms, by achieving better detection mAPs or reducing the run time. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. GMFAD: Towards Generalized Visual Recognition via Multilayer Feature Alignment and Disentanglement.
- Author
-
Li, Haoliang, Wang, Shiqi, Wan, Renjie, and Kot, Alex C.
- Subjects
OBJECT recognition (Computer vision) ,DEEP learning ,DISTRIBUTION (Probability theory) ,MOLECULAR recognition - Abstract
The deep learning based approaches which have been repeatedly proven to bring benefits to visual recognition tasks usually make a strong assumption that the training and test data are drawn from similar feature spaces and distributions. However, such an assumption may not always hold in various practical application scenarios on visual recognition tasks. Inspired by the hierarchical organization of deep feature representation that progressively leads to more abstract features at higher layers of representations, we propose to tackle this problem with a novel feature learning framework, which is called GMFAD, with better generalization capability in a multilayer perceptron manner. We first learn feature representations at the shallow layer where shareable underlying factors among domains (e.g., a subset of which could be relevant for each particular domain) can be explored. In particular, we propose to align the domain divergence between domain pair(s) by considering both inter-dimension and inter-sample correlations, which have been largely ignored by many cross-domain visual recognition methods. Subsequently, to learn more abstract information which could further benefit transferability, we propose to conduct feature disentanglement at the deep feature layer. Extensive experiments based on different visual recognition tasks demonstrate that our proposed framework can learn better transferable feature representation compared with state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. A blind code timing estimator and its implementation for DS-CDMA signals in unknown colored noise
- Author
-
Ma, Yugang, Li, K.H., Kot, Alex C., and Ye, Getian
- Subjects
Business ,Electronics ,Electronics and electrical industries ,Transportation industry - Abstract
Commonly, the channel noise is assumed as temporally white when we estimate the PN code timing for direct-sequence code-division multiple-access (DS-CDMA) systems. However, it may be invalid in practice due to, for instance, the presence of some narrow-band interference or lumping the secondary users into the noise. In this paper, we introduce the matrix decomposition technique in a subspace-based code timing estimator. The new code timing estimator can robustly work in unknown colored noise. The Cramer-Rao bound for the code timing estimator is outlined. Furthermore, we propose a practical implementation method, which includes reducing the number of eigendecompositions and an adaptive eigendecomposition algorithm based on subspace tracking using the unconstrained gradient-descent technique. The performance of the new method is evaluated by computer simulation and compared with the MUSIC method. It is proved that the proposed code timing estimation algorithm outperforms the MUSIC method in colored noise or when the number of users is large. Index Terms--Adaptive algorithm, code-division multiple-access (CDMA), code timing estimation, colored noise, eigendecomposition, synchronization.
- Published
- 2002
35. Parameter estimation of a real single tone from short data records
- Author
-
Fung, H.W., Kot, Alex C., Li, K.H., and Teh, K.C.
- Published
- 2004
- Full Text
- View/download PDF
36. Deep Learning-Based Joint Detection for OFDM-NOMA Scheme.
- Author
-
Xie, Yihang, Teh, Kah Chan, and Kot, Alex C.
- Abstract
Non-orthogonal multiple access (NOMA) technique has drawn much attention in recent years. It has also been a promising technique for the fifth-generation (5G) wireless communication system and beyond. In this letter, we develop a novel deep learning (DL) aided receiver for NOMA joint signal detection. The DL-based receiver serves as an end-to-end mode, which simultaneously fulfills the function of channel estimation, equalization, and demodulation. Compared with the traditional signal detection method for the NOMA scheme, the proposed deep learning method shows feasible improvement in performance and robustness with the tapped-delay line (TDL) channel model, which is adopted for the 5G communication environment. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. DRL-FAS: A Novel Framework Based on Deep Reinforcement Learning for Face Anti-Spoofing.
- Author
-
Cai, Rizhao, Li, Haoliang, Wang, Shiqi, Chen, Changsheng, and Kot, Alex C.
- Abstract
Inspired by the philosophy employed by human beings to determine whether a presented face example is genuine or not, i.e., to glance at the example globally first and then carefully observe the local regions to gain more discriminative information, for the face anti-spoofing problem, we propose a novel framework based on the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN). In particular, we model the behavior of exploring face-spoofing-related information from image sub-patches by leveraging deep reinforcement learning. We further introduce a recurrent mechanism to learn representations of local information sequentially from the explored sub-patches with an RNN. Finally, for the classification purpose, we fuse the local information with the global one, which can be learned from the original input image through a CNN. Moreover, we conduct extensive experiments, including ablation study and visualization analysis, to evaluate our proposed framework on various public databases. The experiment results show that our method can generally achieve state-of-the-art performance among all scenarios, demonstrating its effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Multi-Domain Adversarial Feature Generalization for Person Re-Identification.
- Author
-
Lin, Shan, Li, Chang-Tsun, and Kot, Alex C.
- Subjects
GENERALIZATION ,CAMCORDERS ,SUPERVISED learning ,DIGITAL cameras ,SECURE Sockets Layer (Computer network protocol) ,VIDEO surveillance ,MULTICASTING (Computer networks) - Abstract
With the assistance of sophisticated training methods applied to single labeled datasets, the performance of fully-supervised person re-identification (Person Re-ID) has been improved significantly in recent years. However, these models trained on a single dataset usually suffer from considerable performance degradation when applied to videos of a different camera network. To make Person Re-ID systems more practical and scalable, several cross-dataset domain adaptation methods have been proposed, which achieve high performance without the labeled data from the target domain. However, these approaches still require the unlabeled data of the target domain during the training process, making them impractical. A practical Person Re-ID system pre-trained on other datasets should start running immediately after deployment on a new site without having to wait until sufficient images or videos are collected and the pre-trained model is tuned. To serve this purpose, in this paper, we reformulate person re-identification as a multi-dataset domain generalization problem. We propose a multi-dataset feature generalization network (MMFA-AAE), which is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to ‘unseen’ camera systems. The network is based on an adversarial auto-encoder to learn a generalized domain-invariant latent feature representation with the Maximum Mean Discrepancy (MMD) measure to align the distributions across multiple domains. Extensive experiments demonstrate the effectiveness of the proposed method. Our MMFA-AAE approach not only outperforms most of the domain generalization Person Re-ID methods, but also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing
- Author
-
Chen, Zhuo, Lin, Weisi, Wang, Shiqi, Duan, Lingyu, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia ,Multimedia (cs.MM) - Abstract
The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting visual signals or the ultimately utilized top-layer deep learning features, we propose to compactly represent and convey the intermediate-layer deep learning features of high generalization capability, to facilitate the collaborating approach between front and cloud ends. This strategy enables a good balance among the computational load, transmission load and the generalization ability for cloud servers when deploying the deep neural networks for large scale cloud based visual analysis. Moreover, the presented strategy also makes the standardization of deep feature coding more feasible and promising, as a series of tasks can simultaneously benefit from the transmitted intermediate layers. We also present the results for evaluation of lossless deep feature compression with four benchmark data compression methods, which provides meaningful investigations and baselines for future research and standardization activities.
- Published
- 2018
40. Attention to Head Locations for Crowd Counting
- Author
-
Zhang, Youmei, Zhou, Chunluan, Chang, Faliang, and Kot, Alex C.
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which are the most important cue for crowd counting. The attention model estimates a probability map in which high probabilities indicate locations where heads are likely to be present. The estimated probability map is used to suppress non-head regions in feature maps from several multi-scale feature extraction branches of a convolution neural network for crowd density estimation, which makes our method robust to complex backgrounds, scale variations and non-uniform distributions. In addition, we introduce a relative deviation loss to compensate a commonly used training loss, Euclidean distance, to improve the accuracy of sparse crowd density estimation. Experiments on Shanghai-Tech, UCF_CC_50 and World-Expo'10 data sets demonstrate the effectiveness of our method.
- Published
- 2018
41. Detection of Spoofing Medium Contours for Face Anti-Spoofing.
- Author
-
Zhu, Xun, Li, Sheng, Zhang, Xinpeng, Li, Haoliang, and Kot, Alex C.
- Subjects
HUMAN facial recognition software ,FEATURE extraction - Abstract
Face anti-spoofing is an important step for secure face recognition. In this paper, we target on building a general classifier to detect the face images with spoofing medium contours (termed as SMCs for simplicity). To this end, we consider the task of face anti-spoofing as the detection of SMCs from the image. We propose and train a Contour Enhanced Mask R-CNN (CEM-RCNN) model for the detection. This model detects the existence of the SMCs by incorporating the contour objectness which measures how likely an object contains the SMCs. The experimental results demonstrate the generality of the CEM-RCNN for identifying the face images with SMCs, which performs significantly better than the state-of-the-art on the cross-database scenario. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. CoRRN: Cooperative Reflection Removal Network.
- Author
-
Wan, Renjie, Shi, Boxin, Li, Haoliang, Duan, Ling-Yu, Tan, Ah-Hwee, and Kot, Alex C.
- Subjects
REFLECTIONS ,APPLICATION software ,FEATURE extraction ,DEEP learning - Abstract
Removing the undesired reflections from images taken through the glass is of broad application to various computer vision tasks. Non-learning based methods utilize different handcrafted priors such as the separable sparse gradients caused by different levels of blurs, which often fail due to their limited description capability to the properties of real-world reflections. In this paper, we propose a network with the feature-sharing strategy to tackle this problem in a cooperative and unified framework, by integrating image context information and the multi-scale gradient information. To remove the strong reflections existed in some local regions, we propose a statistic loss by considering the gradient level statistics between the background and reflections. Our network is trained on a new dataset with 3250 reflection images taken under diverse real-world scenes. Experiments on a public benchmark dataset show that the proposed method performs favorably against state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
43. An ANN-based smart capacitive pressure sensor in dynamic environment
- Author
-
Patra, Jagdish C., van den Bos, Adriaan, and Kot, Alex C.
- Published
- 2000
- Full Text
- View/download PDF
44. Skeleton-Based Online Action Prediction Using Scale Selection Network.
- Author
-
Liu, Jun, Shahroudy, Amir, Wang, Gang, Duan, Ling-Yu, and Kot, Alex C.
- Subjects
FORECASTING ,MATHEMATICAL convolutions ,SKELETON - Abstract
Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
45. Feature Boosting Network For 3D Pose Estimation.
- Author
-
Liu, Jun, Ding, Henghui, Shahroudy, Amir, Duan, Ling-Yu, Jiang, Xudong, Wang, Gang, and Kot, Alex C.
- Subjects
GRAPHICAL modeling (Statistics) ,LOGIC circuits ,TASK performance ,TASK analysis - Abstract
In this paper, a feature boosting network is proposed for estimating 3D hand pose and 3D body pose from a single RGB image. In this method, the features learned by the convolutional layers are boosted with a new long short-term dependence-aware (LSTD) module, which enables the intermediate convolutional feature maps to perceive the graphical long short-term dependency among different hand (or body) parts using the designed Graphical ConvLSTM. Learning a set of features that are reliable and discriminatively representative of the pose of a hand (or body) part is difficult due to the ambiguities, texture and illumination variation, and self-occlusion in the real application of 3D pose estimation. To improve the reliability of the features for representing each body part and enhance the LSTD module, we further introduce a context consistency gate (CCG) in this paper, with which the convolutional feature maps are modulated according to their consistency with the context representations. We evaluate the proposed method on challenging benchmark datasets for 3D hand pose estimation and 3D full body pose estimation. Experimental results show the effectiveness of our method that achieves state-of-the-art performance on both of the tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
46. Heterogeneous Domain Adaptation via Nonlinear Matrix Factorization.
- Author
-
Li, Haoliang, Pan, Sinno Jialin, Wang, Shiqi, and Kot, Alex C.
- Subjects
MATRIX decomposition ,HILBERT space ,OBJECT recognition (Computer vision) ,PHYSIOLOGICAL adaptation ,LEARNING problems - Abstract
Heterogeneous domain adaptation (HDA) aims to solve the learning problems where the source- and the target-domain data are represented by heterogeneous types of features. The existing HDA approaches based on matrix completion or matrix factorization have proven to be effective to capture shareable information between heterogeneous domains. However, there are two limitations in the existing methods. First, a large number of corresponding data instances between the source domain and the target domain are required to bridge the gap between different domains for performing matrix completion. These corresponding data instances may be difficult to collect in real-world applications due to the limited size of data in the target domain. Second, most existing methods can only capture linear correlations between features and data instances while performing matrix completion for HDA. In this paper, we address these two issues by proposing a new matrix-factorization-based HDA method in a semisupervised manner, where only a few labeled data are required in the target domain without requiring any corresponding data instances between domains. Such labeled data are more practical to obtain compared with cross-domain corresponding data instances. Our proposed algorithm is based on matrix factorization in an approximated reproducing kernel Hilbert space (RKHS), where nonlinear correlations between features and data instances can be exploited to learn heterogeneous features for both the source and the target domains. Extensive experiments are conducted on cross-domain text classification and object recognition, and experimental results demonstrate the superiority of our proposed method compared with the state-of-the-art HDA approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
47. Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation.
- Author
-
Zhang, Tianyi, Lin, Guosheng, Cai, Jianfei, Shen, Tong, Shen, Chunhua, and Kot, Alex C.
- Abstract
Weakly supervised semantic segmentation receives much research attention since it alleviates the need to obtain a large amount of dense pixel-wise ground-truth annotations for the training images. Compared with other forms of weak supervision, image labels are quite efficient to obtain. In this paper, we focus on the weakly supervised semantic segmentation with image label annotations. Recent progress for this task has been largely dependent on the quality of generated pseudo-annotations. In this paper, inspired by spatial neural-attention for image captioning, we propose a decoupled spatial neural attention network for generating pseudo-annotations. Our decoupled attention structure could simultaneously identify the object regions and localize the discriminative parts, which generates high-quality pseudo-annotations in one forward path. The generated pseudo-annotations lead to the segmentation results that achieve the state of the art in weakly supervised semantic segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates.
- Author
-
Liu, Jun, Shahroudy, Amir, Xu, Dong, Kot, Alex C., and Wang, Gang
- Subjects
MOTION detectors ,SPATIOTEMPORAL processes ,RECURRENT neural networks ,HUMAN body ,HUMAN activity recognition - Abstract
Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. Moreover, we introduce a novel multi-modal feature fusion strategy within the LSTM unit in this paper. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
49. Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing.
- Author
-
Duan, Ling-Yu, Sun, Wei, Zhang, Xinfeng, Wang, Shiqi, Chen, Jie, Yin, Jianxiong, See, Simon, Huang, Tiejun, Kot, Alex C., and Gao, Wen
- Subjects
MOTHERBOARDS ,GRAPHICS processing units ,SYNTAX in programming languages ,COMPUTER storage devices ,DATABASE management - Abstract
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of a CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of graphics processing unit (GPU). We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation as well as the memory access mechanism are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU for resolving the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which enables to leverage the advantages of GPU platforms harmoniously, and yield significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
50. Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks.
- Author
-
Liu, Jun, Wang, Gang, Duan, Ling-Yu, Abdiyeva, Kamila, and Kot, Alex C.
- Subjects
HUMAN behavior ,PATTERN recognition systems ,THREE-dimensional imaging ,SHORT-term memory ,PERFORMANCE evaluation - Abstract
Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, long short-term memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, global context-aware attention LSTM, for skeleton-based action recognition, which is capable of selectively focusing on the informative joints in each frame by using a global context memory cell. To further improve the attention capability, we also introduce a recurrent attention mechanism, with which the attention performance of our network can be enhanced progressively. Besides, a two-stream framework, which leverages coarse-grained attention and fine-grained attention, is also introduced. The proposed method achieves state-of-the-art performance on five challenging datasets for skeleton-based action recognition. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.