646 results on '"Quantization (signal)"'
Search Results
2. Content-Aware Tunable Selective Encryption for HEVC Using Sine-Modular Chaotification Model.
- Author
-
Sheng, Qingxin, Fu, Chong, Lin, Zhaonan, Chen, Junxin, Wang, Xingwei, and Sham, Chiu-Wing
- Published
- 2025
- Full Text
- View/download PDF
3. DNP-AUT: Image Compression Using Double-Layer Non-Uniform Partition and Adaptive U Transform.
- Author
-
Zhang, Yumo and Cai, Zhanchuan
- Published
- 2025
- Full Text
- View/download PDF
4. ViTSen: Bridging Vision Transformers and Edge Computing With Advanced In/Near-Sensor Processing.
- Author
-
Tabrizchi, Sepehr, Reidy, Brendan C., Najafi, Deniz, Angizi, Shaahin, Zand, Ramtin, and Roohi, Arman
- Abstract
This letter introduces ViTSen, optimizing vision transformers (ViTs) for resource-constrained edge devices. It features an in-sensor image compression technique to reduce data conversion and transmission power costs effectively. Further, ViTSen incorporates a ReRAM array, allowing efficient near-sensor analog convolution. This integration, novel pixel reading, and peripheral circuitry decrease the reliance on analog buffers and converters, significantly lowering power consumption. To make ViTSen compatible, several established ViT algorithms have undergone quantization and channel reduction. Circuit-to-application co-simulation results show that ViTSen maintains accuracy comparable to a full-precision baseline across various data precisions, achieving an efficiency of ~3.1 TOp/s/W. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Toward Precision-Aware Safe Neural-Controlled Cyber–Physical Systems.
- Author
-
Thevendhriya, Harikishan, Ghosh, Sumana, and Lohar, Debasmita
- Abstract
The safety of neural network (NN) controllers is crucial, specifically in the context of safety-critical Cyber-Physical System (CPS) applications. Current safety verification focuses on the reachability analysis, considering the bounded errors from the noisy environments or inaccurate implementations. However, it assumes real-valued arithmetic and does not account for the fixed-point quantization often used in the embedded systems. Some recent efforts have focused on generating the sound quantized NN implementations in fixed-point, ensuring specific target error bounds, but they assume the safety of NNs is already proven. To bridge this gap, we introduce Nexus, a novel two-phase framework combining reachability analysis with sound NN quantization. Nexus provides an end-to-end solution that ensures CPS safety within bounded errors while generating mixed-precision fixed-point implementations for the NN controllers. Additionally, we optimize these implementations for the automated parallelization on the FPGAs using a commercial HLS compiler, reducing the machine cycles significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Reducing ADC Front-End Costs During Training of On-Sensor Printed Multilayer Perceptrons.
- Author
-
Afentaki, Florentia, Duarte, Paula Carolina Lozano, Zervakis, Georgios, and Tahoori, Mehdi B.
- Abstract
Printed electronics (PEs) technology offers a cost-effective and fully-customizable solution to computational needs beyond the capabilities of traditional silicon technologies, offering advantages, such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of PEs, which results in large feature sizes, poses a challenge for integrating complex designs like those of machine learning (ML) classification systems. Current literature optimizes only the multilayer perceptron (MLP) circuit within the classification system, while the cost of analog-to-digital converters (ADCs) is overlooked. Printed applications frequently require on-sensor processing, yet while the digital classifier has been extensively optimized, the analog-to-digital interfacing, specifically the ADCs, dominates the total area and energy consumption. In this letter, we target digital printed MLP classifiers and we propose the design of customized ADCs per MLP’s input which involves minimizing the distinct represented numbers for each input, simplifying thus the ADC’s circuitry. Incorporating this ADC optimization in the MLP training, enables eliminating ADC levels and the respective comparators, while still maintaining high classification accuracy. Our approach achieves $11.2\times $ lower ADC area for less than 5% accuracy drop across varying MLPs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. SoC-Based Implementation of 1-D Convolutional Neural Network for 3-Channel ECG Arrhythmia Classification via HLS4ML.
- Author
-
Ahmad, Feroz and Zafar, Saima
- Abstract
Real-time monitoring of 1-D biopotentials, such as electrocardiograms (ECG), necessitates effective feature extraction and classification, a strength of deep learning (DL) algorithms. Designing 1-D convolutional neural network (1-D CNN) accelerators for biopotential classification via open-source codesign workflows, particularly high-level synthesis for machine learning (HLS4ML), offers advantages over GPU-based or cloud-based solutions, including high performance, low latency, low power consumption, swift time-to-market, and cost-effectiveness. We present an implementation of a quantized-pruned (QP) 1-D CNN model on the PYNQ Z2 SoC using HLS4ML by seamlessly deploying its soft IP core generated via Vivado Accelerator backend, showcasing the efficacy of quantization-aware training (QAT) in reducing power consumption to 1.655 W from 1.823 W. Our approach demonstrates improved area consumption, resource utilization, and inferences per second compared to the baseline (B) 1-D CNN model, with a controlled 4% or less reduction in weighted Accuracy, Precision, Recall, and F1-score, revealing the nuanced tradeoffs between performance metrics and system efficiency for real-time 3-channel ECG Arrhythmia classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations.
- Author
-
Frasser, Christiam F., Morán, Alejandro, Canals, Vincent, Font, Joan, Isern, Eugeni, Roca, Miquel, and Rosselló, Josep L.
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,ADDITION (Mathematics) ,ENERGY consumption ,STOCHASTIC systems - Abstract
Deploying modern neural networks on resource-constrained edge devices necessitates a series of optimizations to ready them for production. These optimizations typically involve pruning, quantization, and fixed-point conversion to compress the model size and enhance energy efficiency. While these optimizations are generally adequate for most edge devices, there exists potential for further improving the energy efficiency by leveraging special-purpose hardware and unconventional computing paradigms. In this study, we explore stochastic computing neural networks and their impact on quantization and overall performance concerning weight distributions. When arithmetic operations such as addition and multiplication are executed by stochastic computing hardware, the arithmetic error may significantly increase, leading to a diminished overall accuracy. To bridge the accuracy gap between a fixed-point model and its stochastic computing implementation, we propose a novel approximate arithmetic-aware training method. We validate the efficacy of our approach by implementing the LeNet-5 convolutional neural network on an FPGA. Our experimental results reveal a negligible accuracy degradation of merely 0.01% compared with the floating-point counterpart, while achieving a substantial 27× speedup and 33× enhancement in energy efficiency compared with other FPGA implementations. Additionally, the proposed method enhances the likelihood of selecting optimal LFSR seeds for stochastic computing systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix-Vector Multiplication on General Purpose CPUs.
- Author
-
Katebi, Hossein, Asadi, Navidreza, and Goudarzi, Maziar
- Abstract
Sub-byte quantization on popular vector ISAs suffers from heavy waste of vector as well as memory bandwidth. The latest methods pack a number of quantized data in one vector, but have to pad them with empty bits to avoid overflow to neighbours. We remove even these empty bits and provide full utilization of the vector and memory bandwidth by our data-layout/compute co-design scheme. We implemented FullPack on TFLite for Vector-Matrix multiplication and showed up to $6.7\times$ 6. 7 × speedup, $2.75\times$ 2. 75 × on average on single layers, which translated to $1.56-2.11\times$ 1. 56 - 2. 11 × end-to-end speedup on DeepSpeech. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Design and Implementation of a Universal Shift Convolutional Neural Network Accelerator.
- Author
-
Song, Qingzeng, Cui, Weizhi, Sun, Liankun, and Jin, Guanghao
- Abstract
Currently, many applications implement convolutional neural networks (CNNs) on CPUs or GPUs while the performances are limited by the computational complexity of these networks. Compared with the implementation on CPUs or GPUs, deploying convolutional neural accelerators on FPGAs can achieve superior performance. On the other side, the multiplication operations of CNNs have been a constraint for FPGAs to achieve the better performance. In this letter, we proposed a shift CNN accelerator, which converts the multiplication operations into shift operations. Based on the shift operation, our accelerator can break the computational bottleneck of FPGAs. On Virtex UltraScale+ VU9P, our accelerator can save DSP resources and reduce memory consumption while achieving a performance of 1.18 Tera Operations Per Second (TOPS), which is an essential improvement over the other convolutional neural accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers.
- Author
-
Liu, Yuhao, Rai, Shubham, Ullah, Salim, and Kumar, Akash
- Abstract
Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Spherical Centralized Quantization for Fast Image Retrieval.
- Author
-
Song, Jingkuan, Zhang, Zhibin, Zhu, Xiaosu, Zhao, Qike, Wang, Meng, and Shen, Heng Tao
- Subjects
- *
IMAGE retrieval , *FEATURE extraction , *SEMANTICS , *PRIOR learning , *ALGORITHMS - Abstract
Existing supervised quantization methods usually learn the quantizers from pair-wise, triplet, or anchor-based losses, which only capture their relationship locally without aligning them globally. This may cause an inadequate use of the entire space and a severe intersection among different semantics, leading to inferior retrieval performance. Furthermore, to enable quantizers to learn in an end-to-end way, current practices usually relax the non-differentiable quantization operation by substituting it with softmax, which unfortunately is biased, leading to an unsatisfying suboptimal solution. To address the above issues, we present Spherical Centralized Quantization (SCQ), which contains a Priori Knowledge based Feature (PKFA) module for the global alignment of feature vectors, and an Annealing Regulation Semantic Quantization (ARSQ) module for low-biased optimization. Specifically, the PKFA module first applies Semantic Center Allocation (SCA) to obtain semantic centers based on prior knowledge, and then adopts Centralized Feature Alignment (CFA) to gather feature vectors based on corresponding semantic centers. The SCA and CFA globally optimize the inter-class separability and intra-class compactness, respectively. After that, the ARSQ module performs a partial-soft relaxation to tackle biases, and an Annealing Regulation Quantization loss for further addressing the local optimal solution. Experimental results show that our SCQ outperforms state-of-the-art algorithms by a large margin (2.1%, 3.6%, 5.5% mAP respectively) on CIFAR-10, NUS-WIDE, and ImageNet with a code length of 8 bits. Codes are publicly available: https://github.com/zzb111/Spherical-Centralized-Quantization. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. FABNet: Frequency-Aware Binarized Network for Single Image Super-Resolution.
- Author
-
Jiang, Xinrui, Wang, Nannan, Xin, Jingwei, Li, Keyu, Yang, Xi, Li, Jie, Wang, Xiaoyu, and Gao, Xinbo
- Subjects
- *
DISCRETE wavelet transforms , *HIGH resolution imaging , *TASK analysis , *NEURAL codes , *LINEAR network coding - Abstract
Remarkable achievements have been obtained with binary neural networks (BNN) in real-time and energy-efficient single-image super-resolution (SISR) methods. However, existing approaches often adopt the Sign function to quantize image features while ignoring the influence of image spatial frequency. We argue that we can minimize the quantization error by considering different spatial frequency components. To achieve this, we propose a frequency-aware binarized network (FABNet) for single image super-resolution. First, we leverage the wavelet transformation to decompose the features into low-frequency and high-frequency components and then employ a “divide-and-conquer” strategy to separately process them with well-designed binary network structures. Additionally, we introduce a dynamic binarization process that incorporates learned-threshold binarization during forward propagation and dynamic approximation during backward propagation, effectively addressing the diverse spatial frequency information. Compared to existing methods, our approach is effective in reducing quantization error and recovering image textures. Extensive experiments conducted on four benchmark datasets demonstrate that the proposed methods could surpass state-of-the-art approaches in terms of PSNR and visual quality with significantly reduced computational costs. Our codes are available at https://github.com/xrjiang527/FABNet-PyTorch. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Improving Lightweight AdderNet via Distillation From ℓ 2 to ℓ 1 -norm.
- Author
-
Dong, Minjing, Chen, Xinghao, Wang, Yunhe, and Xu, Chang
- Subjects
- *
CONVOLUTIONAL neural networks , *CROSS correlation , *ENERGY consumption , *DISTILLATION , *VANILLA - Abstract
To achieve efficient inference with a hardware-friendly design, Adder Neural Networks (ANNs) are proposed to replace expensive multiplication operations in Convolutional Neural Networks (CNNs) with cheap additions through utilizing $\ell _{1}$ -norm for similarity measurement instead of cosine distance. However, we observe that there exists an increasing gap between CNNs and ANNs with reducing parameters, which cannot be eliminated by existing algorithms. In this paper, we present a simple yet effective Norm-Guided Distillation (NGD) method for $\ell _{1}$ -norm ANNs to learn superior performance from $\ell _{2}$ -norm ANNs. Although CNNs achieve similar accuracy with $\ell _{2}$ -norm ANNs, the clustering performance based on $\ell _{2}$ -distance can be easily learned by $\ell _{1}$ -norm ANNs compared with cross correlation in CNNs. The features in $\ell _{2}$ -norm ANNs are encouraged to achieve intra-class centralization and inter-class decentralization to amplify this advantage. Furthermore, the roughly estimated gradients in vanilla ANNs are modified to a progressive approximation from $\ell _{2}$ -norm to $\ell _{1}$ -norm so that a more accurate optimization can be achieved. Extensive evaluations on several benchmarks demonstrate the effectiveness of NGD on lightweight networks. For example, our method improves ANN by 10.43% with $0.25\times $ GhostNet on CIFAR-100 and 3.1% with $1.0\times $ GhostNet on ImageNet. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. E2FIF: Push the Limit of Binarized Deep Imagery Super-Resolution Using End-to-End Full-Precision Information Flow.
- Author
-
Song, Chongxing, Lang, Zhiqiang, Wei, Wei, and Zhang, Lei
- Subjects
- *
FEATURE extraction , *HIGH resolution imaging , *GENERALIZATION , *SPINE , *SIGNALS & signaling - Abstract
Binary neural network (BNN) provides a promising solution to deploy parameter-intensive deep single image super-resolution (SISR) models onto real devices with limited storage and computational resources. To achieve comparable performance with the full-precision counterpart, most existing BNNs for SISR mainly focus on compensating for the information loss incurred by binarizing weights and activations in the network through better approximations to the binarized convolution. In this study, we revisit the difference between BNNs and their full-precision counterparts and argue that the key to good generalization performance of BNNs lies on preserving a complete full-precision information flow along with an accurate gradient flow passing through each binarized convolution layer. Inspired by this, we propose to introduce a full-precision skip connection, or a variant thereof, over each binarized convolution layer across the entire network, which can increase the forward expressive capability and the accuracy of back-propagated gradient, thus enhancing the generalization performance. More importantly, such a scheme can be applied to any existing BNN backbones for SISR without introducing any additional computation cost. To validate the efficacy of the proposed approach, we evaluate it using four different backbones for SISR on four benchmark datasets and report obviously superior performance over existing BNNs and even some 4-bit competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression.
- Author
-
Guo, Zongyu, Feng, Runsen, Zhang, Zhizheng, Jin, Xin, and Chen, Zhibo
- Subjects
- *
VIDEO codecs , *OPTICAL flow , *PREDICTION models , *VIDEO coding , *PYRAMIDS , *FORECASTING , *VIDEO compression - Abstract
Neural video codecs have demonstrated great potential in video transmission and storage applications. Existing neural hybrid video coding approaches rely on optical flow or Gaussian-scale flow for prediction, which cannot support fine-grained adaptation to diverse motion content. Towards more content-adaptive prediction, we propose a novel cross-scale prediction module that achieves more effective motion compensation. Specifically, on the one hand, we produce a reference feature pyramid as prediction sources and then transmit cross-scale flows that leverage the feature scale to control the precision of prediction. On the other hand, for the first time, a weighted prediction mechanism is introduced even if only a single reference frame is available, which can help synthesize a fine prediction result by transmitting cross-scale weight maps. In addition to the cross-scale prediction module, we further propose a multi-stage quantization strategy, which improves the rate-distortion performance with no extra computational penalty during inference. We show the encouraging performance of our efficient neural video codec (ENVC) on several benchmark datasets. In particular, the proposed ENVC can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. We also analyze in detail the effectiveness of the cross-scale prediction module in handling various video content, and provide a comprehensive ablation study to analyze those important components. Test code is available at https://github.com/USTC-IMCL/ENVC. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. An Algorithm for Learning Orthonormal Matrix Codebooks for Adaptive Transform Coding.
- Author
-
Boragolla, Rashmi and Yahampath, Pradeepa
- Subjects
- *
MACHINE learning , *APPROXIMATION algorithms , *COVARIANCE matrices , *CONSTRAINED optimization , *TWO-dimensional bar codes - Abstract
This paper proposes a novel data-driven approach to designing orthonormal transform matrix codebooks for adaptive transform coding of any non-stationary vector processes which can be considered locally stationary. Our algorithm, which belongs to the class of block-coordinate descent algorithms, relies on simple probability models such as Gaussian or Laplacian for transform coefficients to directly minimize with respect to the orthonormal transform matrix the mean square error (MSE) of scalar quantization and entropy coding of transform coefficients. A difficulty commonly encountered in such minimization problems is imposing the orthonormality constraint on the matrix solution. We get around this difficulty by mapping the constrained problem in Euclidean space to an unconstrained problem on the Stiefel manifold and leveraging known algorithms for unconstrained optimization on manifolds. While the basic design algorithm directly applies to non-separable transforms, an extension to separable transforms is also proposed. We present experimental results for adaptive transform coding of still images and video inter-frame prediction residuals, comparing the transforms designed using the proposed method and a number of other content-adaptive transforms recently reported in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Learned Video Compression With Efficient Temporal Context Learning.
- Author
-
Jin, Dengchao, Lei, Jianjun, Peng, Bo, Pan, Zhaoqing, Li, Li, and Ling, Nam
- Subjects
- *
IMAGE compression , *VIDEO coding , *CODECS , *VIDEO compression , *SIGNALS & signaling - Abstract
In contrast to image compression, the key of video compression is to efficiently exploit the temporal context for reducing the inter-frame redundancy. Existing learned video compression methods generally rely on utilizing short-term temporal correlations or image-oriented codecs, which prevents further improvement of the coding performance. This paper proposed a novel temporal context-based video compression network (TCVC-Net) for improving the performance of learned video compression. Specifically, a global temporal reference aggregation (GTRA) module is proposed to obtain an accurate temporal reference for motion-compensated prediction by aggregating long-term temporal context. Furthermore, in order to efficiently compress the motion vector and residue, a temporal conditional codec (TCC) is proposed to preserve structural and detailed information by exploiting the multi-frequency components in temporal context. Experimental results show that the proposed TCVC-Net outperforms public state-of-the-art methods in terms of both PSNR and MS-SSIM metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. MBFQuant: A Multiplier-Bitwidth-Fixed, Mixed-Precision Quantization Method for Mobile CNN-Based Applications.
- Author
-
Peng, Peng, You, Mingyu, Jiang, Kai, Lian, Youzao, and Xu, Weisheng
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *COMPUTER architecture , *MOBILE operating systems , *MOBILE apps - Abstract
Deploying Convolutional Neural Network (CNN)-based applications to mobile platforms can be challenging due to the conflict between the restricted computing capacity of mobile devices and the heavy computational overhead of running a CNN. Network quantization is a promising way of alleviating this problem. However, network quantization can result in accuracy degradation and this is especially the case with the compact CNN architectures that are designed for mobile applications. This paper presents a novel and efficient mixed-precision quantization pipeline, called MBFQuant. It redefines the design space for mixed-precision quantization by keeping the bitwidth of the multiplier fixed, unlike other existing methods, because we have found that the quantized model can maintain almost the same running efficiency, so long as the sum of the quantization bitwidth of the weight and the input activation of a layer is a constant. To maximize the accuracy of a quantized CNN model, we have developed a Simulated Annealing (SA)-based optimizer that can automatically explore the design space, and rapidly find the optimal bitwidth assignment. Comprehensive evaluations applying ten CNN architectures to four datasets have served to demonstrate that MBFQuant can achieve improvements in accuracy of up to 19.34% for image classification and 1.12% for object detection, with respect to a corresponding uniform bitwidth quantized model. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Revisiting Multi-Codebook Quantization.
- Author
-
Zhu, Xiaosu, Song, Jingkuan, Gao, Lianli, Gu, Xiaoyan, and Shen, Heng Tao
- Subjects
- *
CLUSTERING algorithms , *APPROXIMATION algorithms , *HEURISTIC algorithms , *BINARY codes , *RESEARCH personnel - Abstract
Multi-Codebook Quantization (MCQ) is a generalized version of existing codebook-based quantizations for Approximate Nearest Neighbor (ANN) search. Specifically, MCQ picks one codeword for each sub-codebook independently and takes the sum of picked codewords to approximate the original vector. The objective function involves no constraints, therefore, MCQ theoretically has the potential to achieve the best performance because solutions of other codebook-based quantization methods are all covered by MCQ’s solution space under the same codebook size setting. However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, i.e., converting an input vector to a binary code. To tackle this, researchers apply constraints to it to find near-optimal solutions or employ heuristic algorithms that are still time-consuming for encoding. Different from previous approaches, this paper takes the first attempt to find a deep solution to MCQ. The encoding network is designed to be as simple as possible, so the very complex encoding problem becomes simply a feed-forward. Compared with other methods on three datasets, our method shows state-of-the-art performance. Notably, our method is $11\times $ - $38\times $ faster than heuristic algorithms for encoding, which makes it more practical for the real scenery of large-scale retrieval. Our code is publicly available: https://github.com/DeepMCQ/DeepQ. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Multi-Label Hashing for Dependency Relations Among Multiple Objectives.
- Author
-
Peng, Liangkang, Qian, Jiangbo, Xu, Zhengtao, Xin, Yu, and Guo, Lijun
- Subjects
- *
FEATURE extraction , *HAMMING distance , *IMAGE retrieval , *CONVOLUTIONAL neural networks , *PROBLEM solving - Abstract
Learning hash functions have been widely applied for large-scale image retrieval. Existing methods usually use CNNs to process an entire image at once, which is efficient for single-label images but not for multi-label images. First, these methods cannot fully exploit independent features of different objects in one image, resulting in some small object features with important information being ignored. Second, the methods cannot capture different semantic information from dependency relations among objects. Third, the existing methods ignore the impacts of imbalance between hard and easy training pairs, resulting in suboptimal hash codes. To address these issues, we propose a novel deep hashing method, termed multi-label hashing for dependency relations among multiple objectives (DRMH). We first utilize an object detection network to extract object feature representations to avoid ignoring small object features and then fuse object visual features with position features and further capture dependency relations among objects using a self-attention mechanism. In addition, we design a weighted pairwise hash loss to solve the imbalance problem between hard and easy training pairs. Extensive experiments are conducted on multi-label datasets and zero-shot datasets, and the proposed DRMH outperforms many state-of-the-art hashing methods with respect to different evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. PolarPose: Single-Stage Multi-Person Pose Estimation in Polar Coordinates.
- Author
-
Li, Jianing, Wang, Yaowei, and Zhang, Shiliang
- Subjects
- *
CARTESIAN coordinates , *TASK analysis , *HEATING , *DETECTORS , *CLASSIFICATION - Abstract
Regression based multi-person pose estimation receives increasing attention because of its promising potential in achieving realtime inference. However, the challenges in long-range 2D offset regression have restricted the regression accuracy, leading to a considerable performance gap compared with heatmap based methods. This paper tackles the challenge of long-range regression through simplifying the 2D offset regression to a classification task. We present a simple yet effective method, named PolarPose, to perform 2D regression in Polar coordinate. Through transforming the 2D offset regression in Cartesian coordinate to quantized orientation classification and 1D length estimation in the Polar coordinate, PolarPose effectively simplifies the regression task, making the framework easier to optimize. Moreover, to further boost the keypoint localization accuracy in PolarPose, we propose a multi-center regression to relieve the quantization error during orientation quantization. The resulting PolarPose framework is able to regress the keypoint offsets in a more reliable way, and achieves more accurate keypoint localization. Tested with the single-model and single-scale setting, PolarPose achieves the AP of 70.2% on COCO test-dev dataset, outperforming the state-of-the-art regression based methods. PolarPose also achieves promising efficiency, e.g., 71.5% AP at 21.5FPS and 68.5%AP at 24.2FPS and 65.5%AP at 27.2FPS on COCO val2017 dataset, faster than current state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Fingerprinting Classifiers With Benign Inputs.
- Author
-
Maho, Thibault, Furon, Teddy, and Merrer, Erwan Le
- Abstract
Recent advances in the fingerprinting of deep neural networks are able to detect specific instances of models, placed in a black-box interaction scheme. Inputs used by the fingerprinting protocols are specifically crafted for each precise model to be checked for. While efficient in such a scenario, this nevertheless results in a lack of guarantee after a mere modification of a model (e.g. finetuning, quantization of the parameters). This article generalizes fingerprinting to the notion of model families and their variants and extends the task-encompassing scenarios where one wants to fingerprint not only a precise model (previously referred to as a detection task) but also to identify which model or family is in the black-box (identification task). The main contribution is the proposal of fingerprinting schemes that are resilient to significant modifications of the models. We achieve these goals by demonstrating that benign inputs, that are unmodified images, are sufficient material for both tasks. We leverage an information-theoretic scheme for the identification task. We devise a greedy discrimination algorithm for the detection task. Both approaches are experimentally validated over an unprecedented set of more than 1,000 networks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. GradQuant: Low-Loss Quantization for Remote-Sensing Object Detection.
- Author
-
Deng, Chenwei, Deng, Zhiyuan, Han, Yuqi, Jing, Donglin, and Zhang, Hong
- Abstract
Convolutional neural network-based methods have shown remarkable performance in remote-sensing object detection. However, their deployment on resource-limited embedded devices is hindered by their high computational complexity. Neural network quantization methods have been proven effective in compressing and accelerating CNN models by clipping outlier activations and utilizing low-precision values to represent weights and clipped activations. Nonetheless, the clipping of outlier activations leads to distortion of object local features. Furthermore, the lack of enhanced overall feature mining exacerbates the degradation of detection accuracy. To address the limitations above, we propose an innovative clipping-free quantization method called GradQuant, which mitigates the model’s quantization accuracy loss caused by clipping outlier activations and the lack of overall feature mining. Specifically, a bounded activation function (sigmoid-weighted tanh, SiTanh) is carefully designed to ensure that object features are represented within a limited range without clipping. On the basis of this, an activation substitution training (AST) method is codesigned to prompt models to focus more on nonoutlier object features instead of outlier-like local ones. Extensive experiments on public remote-sensing datasets demonstrate the effectiveness of the GradQuant method compared with other state-of-the-art quantization methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Characterization Method for Digital Correlator of Interferometric Radiometers Based on Correlated Noise Source.
- Author
-
Liu, Jiayi, Han, Donghao, Guo, Xi, Niu, Lijie, and Liu, Hao
- Abstract
Digital correlator is the central signal processing unit for an interferometric radiometer system. The core function of the digital correlator is to quantitatively measure the correlation coefficients of all the receiver output combination pairs, and then generate the visibility functions together with the system noise temperatures obtained by single-channel total-power measurements. In this letter, a comprehensive quantitative characterization method is proposed for a 30-channel, three-level quantized digital correlator unit (DCU), which will be used for the microwave imager combined active and passive (MICAP) C- and K-bands radiometers, on-board Chinese ocean salinity mission (COSM). First, several key performance characteristics of DCU, including correlation offset and correlation efficiency, are defined and discussed based on a segmented error model. Second, a test platform based on an arbitrary waveform generator (AWG) is established and applied for the test and evaluation of the DCU engineering model (EM). Orthogonal in-phase and quadrature bandlimited noise signals are generated via a newly proposed approach based on Hilbert transform, facilitating the joint tests of correlation efficiency and phase error. Moreover, the influences of the correlator’s input power level on correlation offset and correlation efficiency are also investigated during the test of DCU. DCU test results show that the correlation offset is smaller than −40 dB, the correlation efficiency is better than 0.996, and the phase error is less than 5°. The result can provide an important reference for the digital correlator test in COSM. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Central Cohesion Gradual Hashing for Remote Sensing Image Retrieval.
- Author
-
Han, Lirong, Paoletti, Mercedes E., Tao, Xuanwen, Wu, Zhaoyue, Haut, Juan M., Plaza, Javier, and Plaza, Antonio
- Abstract
With the recent development of remote sensing technology, large image repositories have been collected. In order to retrieve the desired images of massive remote sensing data sets effectively and efficiently, we propose a novel central cohesion gradual hashing (CCGH) mechanism for remote sensing image retrieval. First, we design a deep hashing model based on ResNet-18 which has a shallow architecture and extracts features of remote sensing imagery effectively and efficiently. Then, we propose a new training model by minimizing a central cohesion loss which guarantees that remote-sensing hash codes are as close to their hash code centers as possible. We also adopt a quantization loss which promotes that outputs are binary values. The combination of both loss functions produces highly discriminative hash codes. Finally, a gradual sign-like function is used to reduce quantization errors. By means of the aforementioned developments, our CCGH achieves state-of-the-art accuracy in the task of remote sensing image retrieval. Extensive experiments are conducted on two public remote sensing image data sets. The obtained results support the fact that our newly developed CCGH is competitive with other existing deep hashing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Towards Efficient In-Memory Computing Hardware for Quantized Neural Networks: State-of-the-Art, Open Challenges and Perspectives.
- Author
-
Krestinskaya, Olga, Zhang, Li, and Salama, Khaled Nabil
- Abstract
The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This article provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. Remote State Estimation
- Author
-
Mahmoud, MagdiSadek, Karaki, Bilal J., Kacprzyk, Janusz, Series Editor, Mahmoud, MagdiSadek, and Karaki, Bilal J.
- Published
- 2022
- Full Text
- View/download PDF
29. Color Classification Under Complex Background via Genetic Algorithm-Based Color Difference Histogram.
- Author
-
Chen, Haiyong, Zhang, Yaxiu, Cui, Yuejiao, and Liu, Kun
- Subjects
- *
SILICON solar cells , *POLYCRYSTALLINE silicon , *DESCRIPTOR systems , *CLASSIFICATION algorithms , *SOLAR cells , *IMAGE color analysis , *HISTOGRAMS , *GRAPH coloring - Abstract
Color classification of polycrystalline silicon solar cells is really challenging for performing the task of production quality control during the manufacturing due to the non-Gaussian color distribution and random texture background. The motivation of this work is to present a robust color classification technique by designing a novel tiny color difference feature descriptor. Thus, a genetic algorithm based color difference histograms (GACDH) is proposed. First, the optimal color space of color difference histogram (CDH) to represent tiny color changes is designed. It counts the perceptually uniform color difference in a small local neighborhood in the L*a*b* color space, which reduces the false classification due to small color variations and illumination variation. Second, the genetic algorithm based color quantization for CDH is proposed to select the optimal quantization bins in the L* component, then we make some comparative experiments in a* and b* color components to select optimal quantization bins. The optimization of feature dimension not only reduces the large dimensionality of histogram bins in the computation but also improves the following classification performance. Finally, the proposed algorithm is validated with color dataset of solar cells with distance measure method. Some experimental results and analysis show that the overall performance of the proposed method achieves 98.6% and outperforms other techniques available in the literature in terms of weak discriminative color difference classification. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Lossless Recompression of JPEG Images Using Transform Domain Intra Prediction.
- Author
-
Sun, Chentian, Fan, Xiaopeng, and Zhao, Debin
- Subjects
- *
IMAGE compression , *JPEG (Image coding standard) , *LAPTOP computers , *PERSONAL computers , *DISCRETE cosine transforms - Abstract
JPEG, which was developed 30 years ago, is the most widely used image coding format, especially favored by the resource-deficient devices, due to its simplicity and efficiency. With the evolution of the Internet and the popularity of mobile devices, a huge amount of user-generated JPEG images are uploaded to social media sites like Facebook and Flickr or stored in personal computers or notebooks, which leads to an increase in storage cost. However, the performance of JPEG is far from the-state-of-the art coding methods. Therefore, the lossless recompression of JPEG images is urgent to be studied, which will further reduce the storage cost while maintaining the image fidelity. In this paper, a hybrid coding framework for the lossless recompression of JPEG images (LLJPEG) using transform domain intra prediction is proposed, including block partition and intraprediction, transform and quantization, and entropy coding. Specifically, in LLJPEG, intra prediction is first used to obtain a predicted block. Then the predicted block is transformed by DCT and then quantized to obtain the predicted coefficients. After that, the predicted coefficients are subtracted from the original coefficients to get the DCT coefficient residuals. Finally, the DCT residuals are entropy coded. In LLJPEG, some new coding tools are proposed for intra prediction and the entropy coding is redesigned. The experiments show that LLJPEG can reduce the storage space by 29.43% and 26.40% on the Kodak and DIV2K datasets respectively without any loss for JPEG images, while maintaining low decoding complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Residual Quantization for Low Bit-Width Neural Networks.
- Author
-
Li, Zefan, Ni, Bingbing, Yang, Xiaokang, Zhang, Wenjun, and Gao, Wen
- Abstract
Neural network quantization has shown to be an effective way for network compression and acceleration. However, existing binary or ternary quantization methods suffer from two major issues. First, low bit-width input/activation quantization easily results in severe prediction accuracy degradation. Second, network training and quantization are always treated as two non-related tasks, leading to accumulated parameter training error and quantization error. In this work, we introduce a novel scheme, named Residual Quantization, to train a neural network with both weights and inputs constrained to low bit-width, e.g., binary or ternary values. On one hand, by recursively performing residual quantization, the resulting binary/ternary network is guaranteed to approximate the full-precision network with much smaller errors. On the other hand, we mathematically re-formulate the network training scheme in an EM-like manner, which iteratively performs network quantization and parameter optimization. During expectation, the low bit-width network is encouraged to approximate the full-precision network. During maximization, the low bit-width network is further tuned to gain better representation capability. Extensive experiments well demonstrate that the proposed quantization scheme outperforms previous low bit-width methods and achieves much closer performance to the full-precision counterpart. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. Reduced-Order Fault Detection Filter Design for Fuzzy Semi-Markov Jump Systems With Partly Unknown Transition Rates.
- Author
-
Zhang, Linchuang, Sun, Yonghui, Pan, Yingnan, and Lam, Hak-Keung
- Subjects
- *
MARKOVIAN jump linear systems , *TUNNEL diodes , *COMMUNICATION barriers , *SYMMETRIC matrices , *MARKOV processes - Abstract
This article deals with the fault detection problem for a class of Takagi–Sugeno (T–S) fuzzy semi-Markov jump systems (FSMJSs) with partly unknown transition rates (PUTRs) subject to output quantization by designing a reduced-order filter. First, a more general PUTRs model is constructed to describe the situation that the information of some elements is completely unknown, where this model is affected simultaneously by PU information and time-varying parameter compared with the traditional PUTRs model. Second, we take full advantage of the reduced-order filter to address the fault detection problem for FSMJSs, in which the stochastic failure phenomenon is injected into the reduced-order filter. Besides, the logarithmic quantizer is employed to tackle the limited bandwidth problem in a communication channel. Consequently, the new sufficient conditions are developed based on the Lyapunov theory to obtain the desired reduced-order filter. Simulation results with respect to the tunnel diode circuit are provided to demonstrate the usefulness and availability of the established theoretical results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. A Little Bit More: Bitplane-Wise Bit-Depth Recovery.
- Author
-
Punnappurath, Abhijith and Brown, Michael S.
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE sensors , *PHOTOGRAPHIC editing , *IMAGE reconstruction , *DEEP learning , *HIGH dynamic range imaging , *ECHO-planar imaging - Abstract
Imaging sensors digitize incoming scene light at a dynamic range of 10–12 bits (i.e., 1024–4096 tonal values). The sensor image is then processed onboard the camera and finally quantized to only 8 bits (i.e., 256 tonal values) to conform to prevailing encoding standards. There are a number of important applications, such as high-bit-depth displays and photo editing, where it is beneficial to recover the lost bit depth. Deep neural networks are effective at this bit-depth reconstruction task. Given the quantized low-bit-depth image as input, existing deep learning methods employ a single-shot approach that attempts to either (1) directly estimate the high-bit-depth image, or (2) directly estimate the residual between the high- and low-bit-depth images. In contrast, we propose a training and inference strategy that recovers the residual image bitplane-by-bitplane. Our bitplane-wise learning framework has the advantage of allowing for multiple levels of supervision during training and is able to obtain state-of-the-art results using a simple network architecture. We test our proposed method extensively on several image datasets and demonstrate an improvement from 0.5dB to 2.3dB PSNR over prior methods depending on the quantization level. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Delay-Dependent Distributed Kalman Fusion Estimation With Dimensionality Reduction in Cyber-Physical Systems.
- Author
-
Chen, Bo, Ho, Daniel W. C., Hu, Guoqiang, and Yu, Li
- Abstract
This article studies the distributed dimensionality reduction fusion estimation problem with communication delays for a class of cyber-physical systems (CPSs). The raw measurements are preprocessed in each sink node to obtain the local optimal estimate (LOE) of a CPS, and the compressed LOE under dimensionality reduction encounters with communication delays during the transmission. Under this case, a mathematical model with compensation strategy is proposed to characterize the dimensionality reduction and communication delays. This model also has the property of reducing the information loss caused by the dimensionality reduction and delays. Based on this model, a recursive distributed Kalman fusion estimator (DKFE) is derived by optimal weighted fusion criterion in the linear minimum variance sense. A stability condition for the DKFE, which can be easily verified by the exiting software, is derived. In addition, this condition can guarantee that the estimation error covariance matrix of the DKFE converges to the unique steady-state matrix for any initial values and, thus, the steady-state DKFE (SDKFE) is given. Note that the computational complexity of the SDKFE is much lower than that of the DKFE. Moreover, a probability selection criterion for determining the dimensionality reduction strategy is also presented to guarantee the stability of the DKFE. Two illustrative examples are given to show the advantage and effectiveness of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.
- Author
-
Zhao, Jiawei, Dai, Steve, Venkatesan, Rangharajan, Zimmer, Brian, Ali, Mustafa, Liu, Ming-Yu, Khailany, Brucek, Dally, William J., and Anandkumar, Anima
- Subjects
- *
ARTIFICIAL neural networks , *NUMBER systems , *MACHINE learning , *WEIGHT training , *ENERGY consumption , *COMPUTER vision - Abstract
Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing an efficient datapath for LNS computations. Our implementation effectively reduces energy overhead incurred by LNS-to-integer conversion and partial sum accumulation. Experimental results show that LNS-Madam achieves comparable accuracy to full-precision counterparts with only 8 bits on popular computer vision and natural language tasks. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization.
- Author
-
Liu, Fangxin, Zhao, Wenbo, Wang, Zongwu, Zhao, Yilong, Yang, Tao, Chen, Yiran, and Jiang, Li
- Subjects
- *
FLOW control (Data transmission systems) , *ENERGY consumption - Abstract
Weight quantization is well adapted to cope with the ever-growing complexity of the deep neural network (DNN) model. Diversified quantization schemes lead to diverse quantized bit width and formats of the weights, thereby, subject to different hardware implementations. Such variety prevents a general NPU to leverage different quantization schemes to gain performance and energy efficiency. More importantly, a trend of quantization diversity emerges that applies multiple quantization schemes to different fine-grained structures (e.g., a layer or a channel of weight) of a DNN. Therefore, a general architecture is desired to exploit varied quantization schemes. The crossbar-based processing-in-memory (PIM) architecture, a promising DNN accelerator, is well known for its highly efficient matrix-vector multiplication. However, PIM suffers from the inflexible intracrossbar data path because the weight is stationary on the crossbar and binds to the “add” operation along the bitline. Therefore, many nonuniform quantization methods must rollback the quantization before mapping the weights onto the crossbar. Counterintuitively, this article discovers a unique opportunity of the PIM architecture to exploit varied quantization schemes. We first transform the quantization diversity problem into a consistency problem by aligning the bit with the same magnitude along the same bitline of the crossbar. Consequently, such naive weight mapping causes many square hollows of idle PIM cells. We then propose a novel spatial mapping to exempt these “hollow” crossbar from the intercrossbar data path. To further squeeze the weights on fewer crossbars, we decouple the intracrossbar data path from the hardware bitline by a novel temporal scheduling, so that bits with different magnitudes can be placed on cells along the same bitline. Finally, the proposed IVQ includes a temporal pipeline to avoid the introduced stalling cycles, and a data flow with delicate control mechanisms for the new intra and intercrossbar data paths. Putting all together, IVQ achieves $19.7\times $ , $10.7\times $ , $4.7\times \sim 63.4\times $ , $91.7\times $ speedup, and $17.7\times $ , $5.1\times $ , $5.7\times \sim 68.1\times $ , $541\times $ energy savings over two PIM accelerators (ISAAC and CASCADE), two customized quantization accelerators (based on ASIC and FPGA), and NVIDIA RTX 2080 GPU, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Exploring the Potential of Low-Bit Training of Convolutional Neural Networks.
- Author
-
Zhong, Kai, Ning, Xuefei, Dai, Guohao, Zhu, Zhenhua, Zhao, Tianchen, Zeng, Shulin, Wang, Yu, and Yang, Huazhong
- Subjects
- *
CONVOLUTIONAL neural networks , *ENERGY consumption , *GEOMETRIC quantization , *HUNGER - Abstract
Convolutional neural networks (CNNs) have been widely used in many tasks, but training CNNs is time consuming and energy hungry. Using the low-bit integer format has been proved promising for speeding up and improving the energy efficiency of CNN inference, while CNN training can hardly benefit from such a technique because of the following challenges: 1) the integer data format cannot meet the requirements of the data dynamic range in training, resulting in the accuracy drop; 2) the floating-point data format keeps sizeable dynamic range with much more exponent bits, thus using it results in higher accumulation power than using the integer data format; and 3) there are some specially designed data formats (e.g., with group-wise scaling) that have the potential to deal with the former two problems but common hardware platforms cannot support them efficiently. To tackle all these challenges and make the training phase of CNNs benefit from the low-bit format, we propose a low-bit training framework for CNNs to pursue a better tradeoff between accuracy and energy efficiency: 1) we adopt element-wise scaling to increase the dynamic range of data representation, which significantly reduces the quantization error; 2) group-wise scaling with hardware friendly factor format is designed to reduce the element-wise exponent bits without degrading the accuracy; and 3) we design the customized hardware unit that implements the low-bit tensor convolution arithmetic with our multilevel scaling data format. Experiments show that our framework achieves a superior tradeoff between the accuracy and the bit-width than previous low-bit training studies. For training various models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within 1%. On larger datasets like ImageNet, using 4-bit mantissa and 2-bit exponent is adequate. Through the energy consumption simulation of the whole network, we can see that training a variety of models with our framework could achieve $4.9\times $ – $10.2\times $ higher energy efficiency than full-precision arithmetic. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. MLogNet: A Logarithmic Quantization-Based Accelerator for Depthwise Separable Convolution.
- Author
-
Choi, Jooyeon, Sim, Hyeonuk, Oh, Sangyun, Lee, Sugil, and Lee, Jongeun
- Subjects
- *
MATHEMATICAL optimization , *COMPUTER architecture - Abstract
In this article, we propose a novel logarithmic quantization-based deep neural network (DNN) architecture for depthwise separable convolution (DSC) networks. Our architecture is based on selective two-word logarithmic quantization (STLQ), which improves accuracy greatly over logarithmic-scale quantization while retaining the speed and area advantage of logarithmic quantization. On the other hand, it also comes with the synchronization problem due to variable-latency processing elements (PEs), which we address through a novel architecture and a compile-time optimization technique. Our architecture is dynamically reconfigurable to support various combinations of depthwise versus pointwise convolution layers efficiently. Our experimental results using layers from MobileNetV2 and ShuffleNetV2 demonstrate that our architecture is significantly faster and more area efficient than previous DSC accelerator architectures as well as previous accelerators utilizing logarithmic quantization. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Low Latency Implementations of CNN for Resource-Constrained IoT Devices.
- Author
-
Mujtaba, Ahmed, Lee, Wai-Kong, and Hwang, Seong Oun
- Abstract
Convolutional Neural Network (CNN) inference on a resource-constrained Internet-of-Things (IoT) device (i.e., ARM Cortex-M microcontroller) requires careful optimization to reduce the timing overhead. We propose two novel techniques to improve the computational efficiency of CNNs by targeting low-cost microcontrollers. Our techniques utilize on-chip memory and minimize redundant operations, yielding low-latency inference results on complex quantized models such as MobileNetV1. On the ImageNet dataset for per-layer quantization, we reduce inference latency and Multiply-and-Accumulate (MAC) per cycle by 22.4% and 22.9%, respectively, compared to the state-of-the-art mixed-precision CMix-NN library. On the CIFAR-10 dataset for per-channel quantization, we reduce inference latency and MAC per cycle by 31.7% and 31.3%, respectively. The achieved low-latency inference results can improve the user experience and save power budget in resource-constrained IoT devices. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Diffusion Quantized Recursive Mixture Minimum Error Entropy Algorithm.
- Author
-
Cai, Peng and Wang, Shiyuan
- Abstract
The minimum error entropy (MEE) criterion is widely used in distributed estimation, since it is insensitive to many types of non-Gaussian noises. However, the default Gaussian kernel function may not always be a proper kernel function. To solve this problem and further improve the performance of the diffusion recursive MEE (DRMEE) algorithm, a diffusion recursive mixture MEE (DRMMEE) algorithm is proposed by combining the mixture MEE criterion and the diffusion strategy. In addition, a quantized version of DRMMEE called the diffusion quantized recursive mixture MEE (DQRMMEE) algorithm, is proposed to reduce the computational burden of DRMMEE. Simulation results show that DRMMEE has higher filtering accuracy than other recursive least-squares-based algorithms, and DQRMMEE even has similar filtering accuracy to DRMMEE in different non-Gaussian noise environments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. 1-ADM-CNN: A Lightweight In-field Compression Method for Seismic Data.
- Author
-
Iqbal, Naveed
- Abstract
Large-scale seismic acquisition, versatility, flexibility, automation, and scalability are the objectives of future oil and gas exploration technology. An example of emerging technology for seismic monitoring is distributed acoustic sensing (DAS). The significant amount of data produced by DAS is a challenge that necessitates the development of new technologies for its efficient handling and processing. A typical seismic survey, on the one hand, can generate hundreds of terabytes of raw seismic data per day. The demand for wireless seismic data transmission, on the other hand, remains enormous. The massive amount of data transmission from geophones to the on-site data collection center and its storage poses significant challenges. A lightweight compression procedure is required in order to reduce the data traffic and the storage size at the data center without putting an extra burden on a geophone. In this brief, an efficient implementation of a 1D convolutional neural network (CNN) together with 1-bit adaptive delta modulation is presented for in-field seismic data compression. It is worth mentioning here that the training of CNN is done offline on synthetic data set and hence, the proposed approach has potential for real-time implementation. Furthermore, no assumption on the underlying statistics of noise or the seismic signal is imposed and consequently, the proposed method is suitable for a wide range of seismic data. Furthermore, the proposed method works in the time-domain, unlike existing transform-domain methods, making it suitable for quick diagnosis of bad traces at the data center. Simulation results with real data set reveal that the proposed approach achieves a signal-to-noise ratio (SNR) of approximately 30 dB with a compression gain of 35: 1. Finally, significant superiority in terms of compression gain and reconstruction quality is demonstrated when compared to the existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. A Novel 12-Bit 0.6-mW Two-Step Coarse-Fine Time-to-Digital Converter.
- Author
-
Wang, Zhaoyuan, Jin, Yeran, and Zhou, Bo
- Abstract
A novel two-step coarse-fine time-to-digital converter (TDC) is fabricated in 65-nm CMOS, with a relaxation oscillator based peak counter (ROC) for the coarse stage and a successive approximation analog-to-digital converter (SAR-ADC) for the fine stage. A reconfigurable 3-bit digital counter expands the dynamic range, and a high-precision 9-bit SAR-ADC ensures the resolution. The proposed ROC-ADC scheme conducts the time residence and the transfer linearity well for two-step quantization. Experimental results show that the presented 12-bit TDC achieves a high resolution less than 8 ps and a wide dynamic range up to 30 ns, with the differential nonlinearity (DNL) and integral nonlinearity (INL) values of 0.92 LSB and 1.07 LSB, respectively. The TDC consumes a low power of 0.6 mW from a 1-V supply, with the active area of 0.14 mm2. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. A 158-mW 360-MHz BW 68-dB DR Continuous-Time 1-1-1 Filtering MASH ADC in 40-nm CMOS.
- Author
-
Liu, Qilong, Breems, Lucien J., Bajoria, Shagun, Bolatkale, Muhammed, Rutten, Robert, and Radulov, Georgi
- Subjects
CONTINUOUS-time filters ,ANALOG-to-digital converters ,SUCCESSIVE approximation analog-to-digital converters ,SIGNAL-to-noise ratio ,FILTERS & filtration ,DIGITIZATION ,BROADBAND communication systems ,DIGITAL-to-analog converters - Abstract
This article presents a 5-GS/s continuous-time (CT) multi-stage noise-shaping (MASH) analog-to-digital converter (ADC). The ADC consists of three first-order modulators with a 3-bit quantizer/digital-to-analog converter (DAC) per stage. An RC-hybrid stabilization DAC is used to compensate for the excess loop delay and excess phase shift. A delay matching all-pass input filter with a low-pass feedforward filter is employed to suppress input signal leakage. As a result, inter-stage DACs are waived in residue generation, and low-power, area-saving Gm-C integrators are enabled in the back-end stages. The MASH ADC was implemented in 40-nm CMOS and occupies 0.21 mm2. The ADC achieves 68-dB dynamic range (DR) and 65-dB signal-to-noise and distortion ratio (SNDR) over a 360-MHz bandwidth (BW). The ADC consumes 158 mW from 1/1.1/1.8 V supplies, yielding 159-dB Schreier figure-of-merit (FOM) and 151-fJ/Conv. Walden FOM. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. A Pseudo-Virtual Ground Feedforwarding Technique Enabling Linearization and Higher Order Noise Shaping in VCO-Based ΔΣ Modulators.
- Author
-
Pochet, Corentin and Hall, Drew A.
- Subjects
VOLTAGE-controlled oscillators ,ANALOG-to-digital converters ,DIGITAL-to-analog converters ,NOISE ,COMPUTER architecture - Abstract
This article presents a third-order voltage-controlled oscillator (VCO)-based analog-to-digital converter (ADC) that leverages pseudo-virtual ground (PVG) feedforwarding (FF), linearizing the VCOs and enabling higher order noise shaping with a single feedback digital-to-analog converter. This technique leads to a power-efficient ADC implementation with a wide dynamic range. The ADC is fabricated in a 65-nm process and achieves a 92.1-dB SNDR in a 2.5-kHz bandwidth. This results in a state-of-the-art 179.6-dB figure-of-merit (FoM) among previously published VCO-based ADCs. The PVG FF technique allows the ADC to attain extremely high linearity, 123-dB peak SFDR, with a wide 1.8-Vpp differential input range. The ADC maintains performance with up to 200-mV variation on the 0.8-V supply and across temperatures from 0 to 70 °C. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. A 10-GS/s 8-bit 2850-μm 2 Two-Step Time-Domain ADC With Speed and Efficiency Enhanced by the Delay-Tracking Pipelined-SAR TDC.
- Author
-
Liu, Juzheng, Hassanpourghadi, Mohsen, and Chen, Mike Shuo-Wei
- Subjects
SUCCESSIVE approximation analog-to-digital converters ,TIME-digital conversion ,ANALOG-to-digital converters ,NYQUIST frequency ,SIGNAL-to-noise ratio ,COMPUTER architecture - Abstract
This article presents an 8-bit time-domain analog-to-digital converter (ADC) achieving ten-GS/s conversion speed with only two time-interleaved (TI) channels. A successive approximation register (SAR) time-to-digital converter (TDC) is implemented for the subpicosecond resolution time quantization with high power/area efficiency and low jitter. The throughput of the SAR TDC is enhanced by a unique delay-tracking pipelining technique to enable a 5-GS/s single-channel conversion. On the circuit level, the reference time generation for the SAR TDC is realized by the proposed selective delay tuning (SDT) cell for high efficiency and small reference time variation. Fabricated in the 14-nm FinFet CMOS technology, this ADC achieves a 37.2-dB signal-to-noise and distortion ratio (SNDR) and a 50.6-dB spurious-free dynamic range (SFDR) at the Nyquist input frequency, leading to a 24.8-fJ/conv-step Walden figure of merit with an active area of only 2850 $\mu \text{m}^{2}$. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Adaptive Control of Uncertain Nonlinear Systems With Discontinuous Input and Time-Varying Input Delay.
- Author
-
Xia, Xiaonan, Zhang, Tianping, Kang, Guanpeng, and Fang, Yu
- Subjects
- *
ADAPTIVE control systems , *NONLINEAR systems , *UNCERTAIN systems , *TIME-varying systems , *FUNCTIONALS - Abstract
In this note, we investigate the adaptive quantized or event-triggered control designs for strict-feedback nonlinear systems with time-varying input delay. Because the control signal is discontinuous in the quantized control and event-triggered control, many existing methods to deal with the input delay are no longer applicable. Through constructing an auxiliary tracking error and an auxiliary system, the input-quantized control is implemented for nonlinear systems with unknown control gain and unknown input delay. With the well-designed Lyapunov–Krasovskii functionals and the linear growth condition of input delay, the stability analysis is achieved. The studied method is also applicable to the event-triggered control for systems possessing unknown input delay. The stability analysis shows that all the signals are semiglobally uniformly ultimately bounded (SGUUB). The use of dynamic surface control (DSC) effectively simplifies the controller structure. The quantized and event-triggered control simulations illustrate that the proposed schemes are effective. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Two-Stage Supervised Discrete Hashing for Cross-Modal Retrieval.
- Author
-
Zhang, Donglin, Wu, Xiao-Jun, Xu, Tianyang, and Kittler, Josef
- Subjects
- *
BINARY codes , *MULTIMODAL user interfaces , *COMPUTER programming education , *INSTRUCTIONAL systems - Abstract
Recently, hashing-based multimodal learning systems have received increasing attention due to their query efficiency and parsimonious storage costs. However, impeded by the quantization loss caused by numerical optimization, the existing cross-media hashing approaches are unable to capture all the discriminative information present in the original multimodal data. Besides, most cross-modal methods belong to the one-step paradigm, which learn the binary codes and hash function simultaneously, increasing the complexity of optimization. To address these issues, we propose a novel two-stage approach, named the two-stage supervised discrete hashing (TSDH) method. In particular, in the first phase, TSDH generates a latent representation for each modality. These representations are then mapped to a common Hamming space to generate the binary codes. In addition, TSDH directly endows the hash codes with the semantic labels, enhancing the discriminatory power of the learned binary codes. A discrete hash optimization approach is developed to learn the binary codes without relaxation, avoiding the large quantization loss. The proposed hash function learning scheme reuses the semantic information contained by the embeddings, endowing the hash functions with enhanced discriminability. Extensive experiments on several databases demonstrate the effectiveness of the developed TSDH, outperforming several recent competitive cross-media algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Exploring Model Stability of Deep Neural Networks for Reliable RRAM-Based In-Memory Acceleration.
- Author
-
Krishnan, Gokul, Yang, Li, Sun, Jingbo, Hazra, Jubin, Du, Xiaocong, Liehr, Maximilian, Li, Zheng, Beckmann, Karsten, Joshi, Rajiv V., Cady, Nathaniel C., Fan, Deliang, and Cao, Yu
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE compression , *STATISTICS , *HOPFIELD networks - Abstract
RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs). Furthermore, model compression techniques, such as quantization and pruning, are necessary to improve algorithm mapping and hardware performance. However, in the presence of RRAM device variations, low-precision and sparse DNNs suffer from severe post-mapping accuracy loss. To address this, in this work, we investigate a new metric, model stability, from the loss landscape to help shed light on accuracy loss under variations and model compression, which guides an algorithmic solution to maximize model stability and mitigate accuracy loss. Based on statistical data from a CMOS/RRAM 1T1R test chip at 65nm, we characterize wafer-level RRAM variations and develop a cross-layer benchmark tool that incorporates quantization, pruning, device variations, model stability, and IMC architecture parameters to assess post-mapping accuracy and hardware performance. Leveraging this tool, we show that a loss-landscape-based DNN model selection for stability effectively tolerates device variations and achieves a post-mapping accuracy higher than that with 50% lower RRAM variations. Moreover, we quantitatively interpret why model pruning increases the sensitivity to variations, while a lower-precision model has better tolerance to variations. Finally, we propose a novel variation-aware training method to improve model stability, in which there exists the most stable model for the best post-mapping accuracy of compressed DNNs. Experimental evaluation of the method shows up to 19%, 21%, and 11% post-mapping accuracy improvement for our 65nm RRAM device, across various precision and sparsity, on CIFAR-10, CIFAR-100, and SVHN datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Performance Analysis of IOS-Assisted NOMA System With Channel Correlation and Phase Errors.
- Author
-
Wang, Tianxiong, Badiu, Mihai-Alin, Chen, Gaojie, and Coon, Justin P.
- Subjects
- *
CHANNEL estimation , *ARRAY processing , *RANDOM variables , *SIGNAL processing - Abstract
In this paper, we investigate the performance of an intelligent omni-surface (IOS) assisted downlink non-orthogonal multiple access (NOMA) network with phase quantization errors and channel estimation errors, where the channels related to the IOS are spatially correlated. First, upper bounds on the average achievable rates of the two users are derived. Then, channel hardening is shown to occur in the proposed system, based on which we derive approximations of the average achievable rates of the two users. The analytical results illustrate that the proposed upper bound and approximation on the average achievable rates are asymptotically equivalent in the number of elements. Furthermore, it is proved that the asymptotic equivalence also holds for the average achievable rates with correlated and uncorrelated channels. Additionally, we extend the analysis by evaluating the average achievable rates for IOS assisted orthogonal multiple access (OMA) and IOS assisted multi-user NOMA scenarios. Simulation results corroborate the theoretical analysis and demonstrate that: i) low-precision elements with only two-bit phase adjustment can achieve the performance close to the ideal continuous phase shifting scheme; ii) The average achievable rates with correlated channels and uncorrelated channels are asymptotically equivalent in the number of elements; iii) IOS-assisted NOMA does not always perform better than OMA due to the reconfigurability of IOS in different time slots. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Secrecy Throughput Optimization and Precoding Design in Adaptive Transmit Antenna Selection Systems With Limited Feedback.
- Author
-
Wu, Tong, Zou, Yulong, and Jiang, Yuhan
- Subjects
- *
ADAPTIVE antennas , *TRANSMITTING antennas , *ANTENNA feeds , *PSYCHOLOGICAL feedback - Abstract
We analyze the secrecy performance of a wireless system with limited feedback, where the channel state information (CSI) is imperfectly known at a multi-antenna source. In order to mitigate the adverse effects of limited feedback and eavesdropper attack on secrecy transmission, we propose the quantized secrecy precoding (QSP) oriented adaptive transmit antenna selection (ATAS) scheme denoted as QSP-ATAS, where the transmit antennas with poor channel quality are inactivated by adjusting the channel gain threshold. Besides, the conventional single transmit antenna selection (STAS) scheme without secrecy precoding (NSP-STAS), where only the optimal transmit antenna has the chance to be selected for transmission, is considered as a baseline. Since that both the number of active transmit antennas and the number of quantization bits affect the quantization error, we define the effective secrecy throughput as the difference between the secrecy data rate and feedback rate to evaluate the system performance. We conduct an effective secrecy throughput analysis for both QSP-ATAS and NSP-STAS schemes and show the existence of the maximal effective secrecy throughput of proposed QSP-ATAS scheme. Furthermore, an optimization analysis of the QSP-ATAS scheme is carried out for the sake of further improving the effective secrecy throughput with regard to the channel gain threshold. Numerical simulation results demonstrate that our proposed QSP-ATAS scheme performs better than the conventional NSP-STAS scheme in terms of the effective secrecy throughput. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.