1,052 results
Search Results
2. Critique of ”MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Nanyang Technological University.
- Author
-
Li, Shenggui and Lee, Bu-Sung
- Subjects
X-rays ,GRAPHICS processing units ,COMPUTED tomography ,MICROSOFT Azure (Computing platform) ,SCHOOL contests - Abstract
In this technical report, we focus on reproducing the results reported in the paper “MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization” [1]. MemXCT is a scalable approach to X-ray Computed Tomography reconstruction which removes redundant computation. We reproduced the single CPU/GPU performance as well as strong scaling experiments. We set up our configurations on Microsoft Azure CycleCloud and have two clusters. One cluster has 4 nodes with 60 CPUs on each node and the other cluster has 4 nodes with 4 NVIDIA V100 GPUs on each node. Both clusters come with InfiniBand. The original author conducted his experiments on Theta and Blue Waters supercomputers. We were able to reproduce part of the results in the original paper, however, failed to produce similar performance on other experiments. This report was submitted as part of the reproducibility challenge in SC20 Student Cluster Competition. Digital artifacts from these experiments are available at: 10.5281/zenodo.5598108. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Introduction to the January Special Issue on the 2016 IEEE International Solid-State Circuits Conference.
- Author
-
Sylvester, Dennis, Markovic, Dejan, Genov, Roman, Kawasumi, Atsushi, and Mitra, Subhasish
- Subjects
SOLID state electronics ,CONFERENCES & conventions - Abstract
The IEEE International Solid-State Circuits Conference (ISSCC) is the premier global forum for presenting advances in solid-state circuits and system-on-a-chip. Every year since its first issue, the IEEE Journal of Solid-State Circuits has highlighted some well-received papers from the most recent ISSCC in special issues. This Special Issue covers the ISSCC Conference held in San Francisco, CA, USA, on February 5–9, 2016. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
4. Source Coding When the Side Information May Be Delayed.
- Author
-
Simeone, Osvaldo and Permuter, Haim Henri
- Subjects
MARKOV processes ,MEMORY ,PAPER arts ,CHANNEL coding ,INFORMATION theory - Abstract
For memoryless sources, delayed side information at the decoder does not improve the rate-distortion function. However, this is not the case for sources with memory, as demonstrated by a number of works focusing on the special case of (delayed) feedforward. In this paper, a setting is studied in which the encoder is potentially uncertain about the delay with which measurements of the side information, which is available at the encoder, are acquired at the decoder. Assuming a hidden Markov model for the source sequences, at first, a single-letter characterization is given for the setup where the side information delay is arbitrary and known at the encoder, and the reconstruction at the destination is required to be asymptotically lossless. Then, with delay equal to zero or one source symbol, a single-letter characterization of the rate-distortion region is given for the case where, unbeknownst to the encoder, the side information may be delayed or not. Finally, examples for binary and Gaussian sources are provided. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
5. Single Event Recording of Temperature and Tilt Using Liquid Metal With RFID Tags.
- Author
-
Wang, Wei, Owyeung, Rachel, Sadeqi, Aydin, and Sonkusale, Sameer
- Abstract
There is a need for economical monitoring of fragile goods that are sensitive to temperature, tilt or vibrations during transit. The mishandling of these goods could permanently alter or damage them. Existing methods rely on electronic devices that need a continuous power source for sensing and data logging. This article introduces a simple battery-free economical solution for recording a singular event of temperature crossing or a tilt activity using liquid metal, namely eutectic gallium indium (EGaIn) coupled to a Radio-Frequency Identification (RFID) tag. The crossing of a threshold of either temperature or tilting angle results in EGaIn flow, which activates or deactivates the RFID tag permanently. This change is irreversible; thus, the sensor memorizes the specific temperature or tilt event. This built-in memory function and wireless communication of the RFID sensor enables economical monitoring of temperature and motion-sensitive goods during transport without the need for a battery. In this paper, we have further reduced cost by fabricating these temperature and tilt sensors on paper substrates. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. New Constructions of MDS Euclidean Self-Dual Codes From GRS Codes and Extended GRS Codes.
- Author
-
Fang, Weijun and Fu, Fang-Wei
- Subjects
REED-Solomon codes ,CIPHERS - Abstract
In this paper, we consider the problem for which lengths a maximum distance separable (MDS) Euclidean self-dual code over $\mathbb {F}_{q}$ exists. This problem is completely solved for the case where $q$ is even. For $q$ is odd, some $q$ -ary MDS Euclidean self-dual codes were obtained in the literature. In this paper, we construct six new classes of $q$ -ary MDS Euclidean self-dual codes by using generalized Reed–Solomon (GRS for short) codes and extended GRS codes. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. Memory AMP.
- Author
-
Liu, Lei, Huang, Shunqi, and Kurkoski, Brian M.
- Subjects
MEAN square algorithms ,MESSAGE passing (Computer science) ,INTERFERENCE suppression ,MATCHED filters ,MEMORY - Abstract
Approximate message passing (AMP) is a low-cost iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions. AMP only applies to independent identically distributed (IID) transform matrices, but may become unreliable (e.g., perform poorly or even diverge) for other matrix ensembles, especially for ill-conditioned ones. To solve this issue, orthogonal/vector AMP (OAMP/VAMP) was proposed for general right-unitarily-invariant matrices. However, the Bayes-optimal OAMP/VAMP (BO-OAMP/VAMP) requires a high-complexity linear minimum mean square error (MMSE) estimator. This prevents OAMP/VAMP from being used in large-scale systems. To address the drawbacks of AMP and BO-OAMP/VAMP, this paper offers a memory AMP (MAMP) framework based on the orthogonality principle, which ensures that estimation errors in MAMP are asymptotically IID Gaussian. To realize the required orthogonality for MAMP, we provide an orthogonalization procedure for the local memory estimators. In addition, we propose a Bayes-optimal MAMP (BO-MAMP), in which a long-memory matched filter is used for interference suppression. The complexity of BO-MAMP is comparable to AMP. To asymptotically characterize the performance of BO-MAMP, a state evolution is derived. The relaxation parameters and damping vector in BO-MAMP are optimized based on state evolution. Most crucially, the state evolution of the optimized BO-MAMP converges to the same fixed point as that of the high-complexity BO-OAMP/VAMP for all right-unitarily-invariant matrices, and achieves the Bayes optimal MSE predicted by the replica method if its state evolution has a unique fixed point. Finally, simulations are provided to verify the theoretical results’ validity and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Transmission of a Bit Over a Discrete Poisson Channel With Memory.
- Author
-
Ahmadypour, Niloufar and Gohari, Amin
- Subjects
ERROR probability ,MEMORY ,CHANNEL coding ,RANDOM variables ,OPTICAL transmitters ,GAUSSIAN channels - Abstract
A coding scheme for transmission of a bit maps a given bit to a sequence of channel inputs (called the codeword associated with the transmitted bit). In this paper, we study the problem of designing the best code for a discrete Poisson channel with memory (under peak-power and total-power constraints). The outputs of a discrete Poisson channel with memory are Poisson distributed random variables with a mean comprising of a fixed additive noise and a linear combination of past input symbols. Assuming a maximum-likelihood (ML) decoder, we search for a codebook that has the smallest possible error probability. This problem is challenging because error probability of a code does not have a closed-form analytical expression. For the case of having only a total-power constraint, the optimal code structure is obtained, provided that the blocklength is greater than the memory length of the channel. For the case of having only a peak-power constraint, the optimal code is derived for arbitrary memory and blocklength in the high-power regime. For the case of having both the peak-power and total-power constraints, the optimal code is derived for memoryless Poisson channels when both the total-power and the peak-power bounds are large. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. An Approximate Memory Architecture for Energy Saving in Deep Learning Applications.
- Author
-
Nguyen, Duy Thanh, Hung, Nguyen Huy, Kim, Hyun, and Lee, Hyuk-Jae
- Subjects
DYNAMIC random access memory ,DEEP learning ,ENERGY consumption ,DATA warehousing ,MEMORY ,DATA integrity - Abstract
DRAM devices require periodic refresh operations to preserve data integrity. Slowing down the refresh rate can reduce the energy consumption; however, it may cause a loss of data stored in the DRAM cell. This paper proposes a new memory architecture of soft approximation for deep learning applications, which reduces the refresh energy consumption while maintaining accuracy and high performance. Utilizing the error-tolerant property of deep learning applications, the proposed memory architecture avoids the accuracy drop caused by data loss by flexibly controlling the refresh operation for different bits, depending on their criticality. For data storage, the approximate DRAM architecture reorganizes the data so that these data are sorted according to their bit significance. Critical bits are stored in more frequently refreshed devices while non-critical bits are stored in less frequently refreshed devices. In addition, for further reduction of the DRAM energy consumption, this paper combines hard approximation, which reduces the number of accesses to DRAM, with soft approximation. Simulation results show that the refresh energy consumption is reduced by 69.71%, and the total energy consumption is reduced by 26.0 % for the hybrid memory with a negligible drop in both training and testing phases on state-of-the-art deep networks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Influence of Design Parameters on On-Load Demagnetization Characteristics of Switched Flux Hybrid Magnet Memory Machine.
- Author
-
Lyu, Shukang, Yang, Hui, Lin, Heyun, Zhu, Z. Q., Zheng, Hao, and Pan, Zhenbao
- Subjects
DEMAGNETIZATION ,PERMANENT magnets ,MAGNETS ,FLUX (Energy) ,MACHINING ,MEMORY - Abstract
In this paper, the influences of design parameters on the on-load demagnetization behavior of a switched flux hybrid magnet memory machine (SF-HMMM) are evaluated, which aims to provide a useful design guideline to avoid this performance degradation. The machine topology and on-load demagnetization phenomenon are described, respectively. Furthermore, some key design parameters are selected to optimize to elevate the operating point of the low coercive force (LCF) permanent magnets (PMs). Then, some geometric modifications are suggested to alleviate the on-load demagnetization effects. Finally, an SF-HMMM prototype is fabricated and tested to experimentally verify the finite-element analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
11. An 8T SRAM With On-Chip Dynamic Reliability Management and Two-Phase Write Operation in 28-nm FDSOI.
- Author
-
Lee, Zhao Chuan, Siddiqui, M. Sultan M., Kong, Zhi-Hui, and Kim, Tony Tae-Hyoung
- Subjects
STATIC random access memory ,FAULT-tolerant computing ,RELIABILITY in engineering - Abstract
Bias temperature instability (BTI) degradation poses increasingly critical lifetime reliability design challenges in static random access memory (SRAM), as fabrication technology marches toward a very deep nanometer regime. This paper presents circuit techniques that enable on-chip dynamic reliability management, which intelligently monitors and mitigates the half-selected cell stability failure due to BTI degradation in SRAM. The dynamic reliability management is achieved through the automated BTI-aware write word-line (WWL) control, whereby it detects the BTI degradation in SRAM cells through a replica row-based BTI-aware stability monitor and adjusts the WWL voltage level with two-phase write operation (TPWO). The WWL voltage level is divided into two phases to maintain the half-selected cell stability with BTI without compromising other circuit parameters. Silicon validation of a 16-kb SRAM based on a 28-nm fully depleted silicon on insulator (FDSOI) technology successfully demonstrates that the half-selected cell stability failure is eliminated from 57.13% down to 0% with the proposed approach at a marginal 3.42% power and 10% area overheads. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. Detection and Coding Schemes for Sneak-Path Interference in Resistive Memory Arrays.
- Author
-
Ben-Hur, Yuval and Cassuto, Yuval
- Subjects
ERROR rates ,MEMORY ,MAGNITUDE (Mathematics) ,ERROR detection (Information theory) ,VIDEO coding ,RANDOM access memory - Abstract
Resistive memory is a promising technology for achieving unprecedented storage densities and new in-memory computing features. However, to fulfill their promise, resistive memories require array architectures suffering from a severe interference effect called “sneak paths.” In this paper, we address the sneak-path problem through a communication-theory framework. Starting from the fundamental problem of readout with parallel-resistance interference, we develop several tools for detection and coding that significantly improve memory reliability. For the detection problem, we formulate and derive the optimal detector for a realistic array model, and then propose simplifications that enjoy similarly good performance and simpler implementation. Complementing detection for better error rates is done by a new coding scheme that shapes the stored bits to get lower sneak-path incidence. For the same storage rates, the new coding scheme exhibits error rates lower by an order of magnitude compared to known shaping techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. Demonstration and Understanding of Nano-RAM Novel One-Time Programmable Memory Application.
- Author
-
Ning, Sheyang and Luo, Jia
- Subjects
RECORDS management ,ELECTRON tunneling ,RF values (Chromatography) ,MEMORY ,DATA libraries ,DYNAMIC random access memory - Abstract
In prior researches, carbon nanotube (CNT)-based nano-random access-memory (NRAM) uses reset initialization which obtains >1011 high write endurance for large-time programmable (LTP) application. In this paper, for the first time, NRAM one-time programmable (OTP) application is proposed by using set initialization for data archive. Specifically, virgin NRAM cells are all in high resistance state (HRS) for storing bit “0.” In contrast, set initialization uses reversed polarity voltage to obtain low resistance state (LRS) for bit “1.” Furthermore, physical models of set initialization and retention current degradation are proposed for the first time. The current increment during set initialization can be attributed to electron tunneling and CNT deformation. Retention current degradation may be due to variation of paralleled CNT contacts in bottleneck zone. As for OTP performance, median NRAM bit and 1% tail bit demonstrate more than 1 billion years and 15 years data retentions, respectively, on 150 °C. The tail bit activation energy is 2.41 eV. Finally, no LRS read disturb is found after 10 s 0.5-V stress on both polarities. The virgin NRAM cells in HRS should be more stable than set initialized NRAM cells in LRS for both retention time and read disturb. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Binary LCD Codes and Self-Orthogonal Codes From a Generic Construction.
- Author
-
Zhou, Zhengchun, Li, Xia, Tang, Chunming, and Ding, Cunsheng
- Subjects
LINEAR codes ,INJECTIONS ,POLYNOMIALS ,BINARY codes ,GENERIC drugs - Abstract
Linear codes with certain special properties have received renewed attention in recent years due to their practical applications. Among them, binary linear complementary dual (LCD) codes play an important role in implementations against side-channel attacks and fault injection attacks. Self-orthogonal codes can be used to construct quantum codes. In this paper, four classes of binary linear codes are constructed via a generic construction which has been intensively investigated in the past decade. Simple characterizations of these linear codes to be LCD or self-orthogonal are presented. Resultantly, infinite families of binary LCD codes and self-orthogonal codes are obtained. Infinite families of binary LCD codes from the duals of these four classes of linear codes are produced. Many LCD codes and self-orthogonal codes obtained in this paper are optimal or almost optimal in the sense that they meet certain bounds on general linear codes. In addition, the weight distributions of two sub-families of the proposed linear codes are established in terms of Krawtchouk polynomials. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Clock Sequences for Increasing the Fault Coverage of Functional Test Sequences.
- Author
-
Pomeranz, Irith
- Subjects
LOGIC ,BOREL subsets ,USB flash drives ,BIT error rate ,MEMORY - Abstract
A functional test sequence for a design may not be effective as a manufacturing test for a logic block in the design because it achieves a low gate-level fault coverage. This paper describes a procedure for selecting a clock sequence that increases the gate-level fault coverage of a functional test sequence when it is used for testing a subset of logic blocks. The procedure deactivates the clocks to the logic blocks in the subset when a primary input vector has a negative effect on their fault coverage. The procedure is different from earlier test generation and test compaction procedures in that it increases the fault coverage without modifying the functional test sequence. It thus preserves some of the functional characteristics and the test application process for the sequence. Experimental results for benchmark circuits are presented to demonstrate the effectiveness of the procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
16. Reconfigurable Bit-Serial Operation Using Toggle SOT-MRAM for High-Performance Computing in Memory Architecture.
- Author
-
Wang, Jinkai, Bai, Yining, Wang, Hongyu, Hao, Zuolei, Wang, Guanda, Zhang, Kun, Zhang, Youguang, Lv, Weifeng, and Zhang, Yue
- Subjects
RANDOM access memory ,MAGNETIC torque ,MEMORY ,ENERGY consumption ,COLUMNS ,OPTICAL character recognition - Abstract
Computing in memory (CIM) is a promising candidate for high throughput and energy-efficient data-driven applications, which mitigates the well-known memory bottleneck in Von Neumann architecture. In this paper, we present a reconfigurable bit-serial operation using toggle spin-orbit torque magnetic random access memory (TSOT-MRAM) to perform the computation completely in the bit-cell array instead of in a peripheral circuit. This bit-serial CIM (BSCIM) scheme achieves higher throughput and energy efficiency in CIM. First, basic Boolean logic operations are realized by utilizing the feature of TSOT device. A bit-cell array that implements the bit-serial operation is then built to provide the communication between column and row necessary for arithmetic operations, such as the carry propagation of addition and multiplication. Finally, we analyze the reliability of BSCIM scheme and demonstrate the performance advantage by performing convolution operations for $28\times 28$ handwritten digit images in a BSCIM architecture. The results show that the delay and energy of BSCIM architecture are respectively reduced by 1.16-5.49 times and 1.12-1.43 times compared with the existing digital CIM architectures. Besides, its throughput and energy efficiency are also enhanced to 51.2 GOPS and 9.9 TOPS/W respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Nonfragile Consensus of Multiagent Systems Based on Memory Sampled-Data Control.
- Author
-
Ge, Chao, Park, Ju H., Hua, Changchun, and Guan, Xinping
- Subjects
MULTIAGENT systems ,LINEAR matrix inequalities ,MEMORY ,SYMMETRIC matrices ,MAXIMAL functions - Abstract
In this paper, we address the consensus tracking problem for the multiagent system (MAS) based on a nonfragile memory sampled-data controller. Considering the effect of controller gain fluctuation and communication delay, a novel sampled-data control scheme with variable sampling interval is designed for each agent. By developing some new terms, an improved piecewise Lyapunov–Krasovskii functional (LKF) is constructed to take full advantage of characteristic about real sampling pattern. Furthermore, some relaxed matrices constructed in the LKF are not necessarily positive definite. Making full use of the LKF and free-matrix-based integral inequality, some sufficient criteria are developed to ensure the consistency of the MAS. Then, by solving a group of linear matrix inequalities with the maximal sampling interval, the desired sampled-data control gain matrix is obtained. Finally, the numerical example of a 5-agent system is given to illustrate the effectiveness of the proposed approach in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators.
- Author
-
Kang, Duseok, Kang, Donghyun, and Ha, Soonhoi
- Subjects
CONVOLUTIONAL neural networks ,MEMORY ,DATABASES - Abstract
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Spectral–Spatial Unified Networks for Hyperspectral Image Classification.
- Author
-
Xu, Yonghao, Zhang, Liangpei, Du, Bo, and Zhang, Fan
- Subjects
HYPERSPECTRAL imaging systems ,IMAGING systems ,SYSTEM administrators ,NEURAL circuitry ,MEMORY - Abstract
In this paper, we propose a spectral–spatial unified network (SSUN) with an end-to-end architecture for the hyperspectral image (HSI) classification. Different from traditional spectral–spatial classification frameworks where the spectral feature extraction (FE), spatial FE, and classifier training are separated, these processes are integrated into a unified network in our model. In this way, both FE and classifier training will share a uniform objective function and all the parameters in the network can be optimized at the same time. In the implementation of the SSUN, we propose a band grouping-based long short-term memory model and a multiscale convolutional neural network as the spectral and spatial feature extractors, respectively. In the experiments, three benchmark HSIs are utilized to evaluate the performance of the proposed method. The experimental results demonstrate that the SSUN can yield a competitive performance compared with existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. Bipolar SRAM Memory Architecture in 4H-SiC for Harsh Environment Applications.
- Author
-
Elgabra, Hazem, Siddiqui, Amna, and Singh, Shakti
- Subjects
SILICON carbide ,COMPLEMENTARY metal oxide semiconductors ,RANDOM access memory ,ELECTRICAL engineering ,ELECTRIC potential - Abstract
4H-silicon carbide (SiC) is a suitable candidate for high-temperature and radiation prone applications, due to its superior electrical and material properties. Several researchers have demonstrated small-scale logic circuits, entirely in 4H-SiC; however, to build a complete electronic module in 4H-SiC, a memory component is yet to be developed. This paper presents for the first time the design, optimization, and performance analysis of a 4H-SiC-based bipolar memory column including a static random access memory cell and peripherals, designed for voltages as low as 5 V. The memory column has average noise margins of 2 V and delays in the range of few nanoseconds at room temperature. The proposed memory architecture also demonstrates robust operation across a wide range of temperatures (27 °C–500 °C) with stable noise margins and speeds. This paper validates the potential of developing memory architectures in 4H-SiC, which operates reliably for varying conditions, paving the way to build complete electronic systems entirely based on 4H-SiC. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. Compact CA-Based Single Byte Error Correcting Codec.
- Author
-
Samanta, Jagannath, Bhaumik, Jaydeb, and Barman, Soma
- Subjects
ERROR correction (Information theory) ,COMPUTER storage devices ,SOFT errors ,COMPUTER simulation ,FIELD programmable gate arrays ,APPLICATION-specific integrated circuits - Abstract
Memory contents are usually corrupted due to soft errors caused by external radiation and hence the reliability of memory systems is reduced. In order to enhance the reliability of memory systems, error correcting codes (ECC) are widely used to detect and correct errors. Single bit error correcting with double bits errors detecting codes are generally used in memory systems. But in case of multiple cell errors, these codes are unable to detect and correct errors. Recently, single byte error correcting Reed Solomon (SEC-RS) codes are used to detect and correct single byte error in memory systems. In this paper, a new single byte error correcting (SEC) code is proposed based on the concept of cellular automata (termed as CASEC). The main aim of this work is to reduce the area and power of SEC encoder and decoder circuit without affecting delay. In this paper, CASEC(10,8,8), CASEC(18,16,8), 2xCASEC(10,8,4) and 2xCASEC(19,6,4) codecs are designed and implemented. CASEC(18,16,8) codec has 67.79 percent lesser hardware complexity compared to existing design. Proposed codecs are simulated and synthesized for both FPGA and ASIC platforms. It is found that speed of the proposed design is almost equal to the existing design but requires lesser area and power. Area-delay product (ADP) of proposed CASEC(10,8,8), CASEC(18,16,8), 2xCASEC(10,8,4) codecs are better compared to the existing designs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. A Blockchain-Powered Data Market for Multi-User Cooperative Search.
- Author
-
Jiang, Suhan and Wu, Jie
- Abstract
Cloud computing provides a feasible solution to data outsourcing, and hence forming a cloud-based data market, where data users buy data from owners through querying cloud servers. However, it also incurs new privacy and security problems, as data is under a centralized third-party instead of the data owner’s direct control. Existing data markets are also questioned on their inflexible and opaque pricing, where the value of data ownership and the cost of query searches are mixed. In this paper, we consider blockchain-based storage as a better choice to ensure safe data outsourcing since data is spread out across many data points. We propose an Ethereum-based data market that provides distributed storage and correct remote data search. We design a new pricing model, where each query will be charged by two parties: owner (paid for providing his data) and miner (rewarded by performing query searches). We study a new cooperative search scheme through a proxy to reduce cost on the user side. Given that each user query is charged based on its number of keywords, then a cooperative search can reduce user-side cost by combining multiple queries into a group so that overlapped keywords will only be charged for one time. To ensure user QoE, a combined query should not be significantly larger than any of its original queries in terms of the number of keywords. The total price is based on the total number of keywords in all groups. Since it is a cooperative model with shared resources, we also study various incentive properties on the user side, yielding a cost sharing mechanism to split joint cost in a truth-revealing and fair manner. We further extend our market with a set of substitute data owners and propose a double auction mechanism to match users and owners based on their requirements. Experiments have been conducted on real query trace to demonstrate the effectiveness of our proposed scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Iterative Programming of Noisy Memory Cells.
- Author
-
Horovitz, Michal, Yaakobi, Eitan, Gad, Eyal En, and Bruck, Jehoshua
- Subjects
BIT error rate ,DNA synthesis ,ERROR probability ,MEMORY ,ION channels - Abstract
In this paper, we study a model that mimics the programming operation of memory cells. This model was first introduced by Lastras-Montano et al. for continuous-alphabet channels, and later by Bunte and Lapidoth for discrete memoryless channels (DMC). Under this paradigm we assume that cells are programmed sequentially and individually. The programming process is modeled as transmission over a channel, such that it is possible to read the cell state in order to determine its programming success, and in case of programming failure, to reprogram the cell again. Reprogramming a cell can reduce the bit error rate, however this comes with the price of increasing the overall programming time and thereby affecting the writing speed of the memory. An iterative programming scheme is an algorithm which specifies the number of attempts to program each cell. Given the programming channel and constraints on the average and maximum number of attempts to program a cell, we study programming schemes which maximize the number of bits that can be reliably stored in the memory. We extend the results by Bunte and Lapidoth and study this problem when the programming channel is either discrete-input memoryless symmetric channel (including the BSC,BEC, BI-AWGN) or the $Z$ channel. For the BSC and the BEC our analysis is also extended for the case where the error probabilities on consecutive writes are not necessarily the same. Lastly, we also study a related model which is motivated by the synthesis process of DNA molecules. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Label Independent Memory for Semi-Supervised Few-Shot Video Classification.
- Author
-
Zhu, Linchao and Yang, Yi
- Subjects
VIDEOS ,VISUAL perception ,MEMORY ,CLASSIFICATION - Abstract
In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Infrared Pedestrian Tracking With Graph Memory Features.
- Author
-
Jin, Lei, Cheng, Jingchun, and Zhang, Chunxi
- Subjects
VIDEO surveillance ,PEDESTRIANS ,NIGHT vision ,MEMORY - Abstract
Challenging due to the absence of color information and low resolution, infrared pedestrian tracking is still of great importance in video surveillance for its ability to distinguish and track persons in night vision. Recent researches show that the CF (Correlation Filters) based methods have a satisfying performance on the infrared pedestrian tracking task when combined with deep features. However, most of these algorithms only rely on the general convolutional features and online model updating to trace object variation, resulting in the lost track of pedestrians with large deformation and infrequent poses. In this paper, we propose a long-term inter-frame graph memory feature as a stronger descriptor for infrared pedestrians. We show that this cross-frame memory graph can enrich deep features with time-variant pedestrian appearance information, boosting the robustness and accuracy of the baseline CF tracker. Extensive experiments and analyses are carried out on the widely-used PTB-TIR dataset, demonstrating the effectiveness and universality of the proposed graph memory features for infrared pedestrians. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. ShortcutFusion: From Tensorflow to FPGA-Based Accelerator With a Reuse-Aware Memory Allocation for Shortcut Data.
- Author
-
Nguyen, Duy Thanh, Je, Hyeonseung, Nguyen, Tuan Nghia, Ryu, Soojung, Lee, Kyujoong, and Lee, Hyuk-Jae
- Subjects
CONVOLUTIONAL neural networks ,DYNAMIC random access memory ,FIELD programmable gate arrays ,MEMORY - Abstract
Residual block is a very common component in recent state-of-the art CNNs such as EfficientNet/EfficientDet. Shortcut data accounts for nearly 40% of feature-maps access in ResNet152. Most of the previous DNN compilers/accelerators ignore the shortcut data optimization. This paper presents ShortcutFusion, an optimization tool for FPGA-based accelerator with a reuse-aware static memory allocation for shortcut data, to maximize on-chip data reuse given resource constraints. From TensorFlow DNN models, the proposed design generates instruction sets for a group of nodes which uses an optimized data reuse for each residual block. The accelerator design implemented on the Xilinx KCU1500 FPGA card $2.8\times $ faster and $9.9\times $ more power efficient than NVIDIA RTX 2080 Ti for $256\times 256$ input size. Compared to the result from baseline, in which the weights/inputs/outputs are accessed from the off-chip memory exactly once per each layer, ShortcutFusion reduces the DRAM access by 47.8-84.8% for RetinaNet, Yolov3, ResNet152, and EfficientNet. Given a similar buffer size to ShortcutMining, which also “mine” the shortcut data in hardware, the proposed work reduces off-chip access for feature-maps $5.27\times $ while accessing weight from off-chip memory exactly once. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Multi-Mode QC-LDPC Decoding Architecture With Novel Memory Access Scheduling for 5G New-Radio Standard.
- Author
-
Lee, Seongjin, Park, Sangsoo, Jang, Boseon, and Park, In-Cheol
- Subjects
5G networks ,LOW density parity check codes ,CHANNEL coding ,MEMORY ,ITERATIVE decoding - Abstract
As the low-density parity-check (LDPC) code has a powerful error-correcting performance and can achieve high throughput, it is being used in many application areas and recently adopted as a channel coding method in the 5G New-Radio communication standard. Unlike other LDPC codes, the 5G LDPC code has various irregular lifting sizes to support diverse message lengths. To meet the demanding requirements of the 5G standard, many solutions have been presented, but all of them are either impractical or fail to satisfy all the requirements. This paper, for the first time, proposes an area-efficient QC-LDPC decoder that satisfies the peak throughput requirements of the 5G standard and supports all the lifting sizes specified in the 5G standard. Instead of relying on full parallelism like in the previous works, this work tries partial parallelism to mitigate the hardware complexity, which leads to high efficiency in hardware complexity. In addition, a novel memory access scheduling method is proposed to solve the data access and alignment problems caused by the partially parallel structure, which is effective in supporting all the lifting sizes. A LDPC decoder realized in 65-nm CMOS technology demonstrates that its decoding throughput is greater than 20Gbps and its area is smaller than the existing decoders. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Identifying Biases of a Defect Diagnosis Procedure.
- Author
-
Pomeranz, Irith
- Subjects
DEVIATION (Statistics) ,DISCRIMINATION (Sociology) ,MOBILE communication systems -- Design & construction ,MEMORY ,TORQUE control - Abstract
A defect diagnosis procedure is an important part of the yield improvement process. As defects become more complex, the output responses they produce differ to larger extents from the output responses of modeled faults, and they become more difficult to diagnose. Biases in the defect diagnosis procedure can also cause defects to be more difficult to diagnose. It is important to study and remove such biases in order to ensure that they do not affect the accuracy of the procedure. This paper undertakes a study of the biases of a defect diagnosis procedure and suggests a way to improve it. The study illustrates an approach by which biases in defect diagnosis procedures can be analyzed in general. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
29. The VLSI Architecture of a Highly Efficient Deblocking Filter for HEVC Systems.
- Author
-
Hsu, Po-Kai and Shen, Chung-An
- Subjects
VERY large scale circuit integration ,VIDEO coding ,PIXELS ,ALGORITHMS ,BANDWIDTHS - Abstract
This paper presents the VLSI architecture and hardware implementation of a highly efficient deblocking filter (DBF) for High Efficiency Video Coding systems. In order to reduce the number of data accesses and thus to enhance the timing efficiency, novel data structures and memory access schemes for image pixels are proposed. Furthermore, a novel edge-fetching order is presented to strike a balance between the processing throughput and complexity. Based on the proposed structure and access pattern, a six-stage pipelined two-line DBF engine with low-latency data access sequence is designed, aiming to achieve high processing throughput while at the same time maintaining low complexity. The detailed storage structure and data access scheme are illustrated and VLSI architecture for the DBF engine is depicted in this paper. In addition, the proposed DBF is implemented using TSMC 90-nm standard cell library. The experimental results based on postlayout estimations show that the proposed design can achieve 60 frames/s for a frame resolution of $4096\times 2048$ pixels (ultra high definition resolution) assuming an operating frequency of 100 MHz. Moreover, this design occupies an area complexity of 466.5 kGE with a power consumption of 26.26 mW. In comparison with prior designs targeting similar system specification and throughput, the proposed design results in a significantly reduced area complexity. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
30. HD-Code: End-to-End High Density Code for DNA Storage.
- Author
-
Wu, Jianjun, Zhang, Shufang, Zhang, Tao, and Liu, Yuhong
- Abstract
With the rapid development of digital information techniques, the use of DNA media for information storage is considered as the future direction of data storage. Existing DNA storage schemes simply map compressed binary multimedia data into DNA base data, which has the disadvantages of data loss, low logical storage density and high cost of synthesis. This paper presents an end-to-end high density DNA encoding algorithm(referred to as HD-code, where HD stands for high density). The novelty and contributions of this work contain three parts. First, by taking full advantage of the statistical characteristics of the original multimedia data and considering the biological constraints on the DNA bases, the proposed scheme achieves higher logical storage density and improves the flexibility and consistency in data storage. Second, by performing data conversion, the proposed scheme can effectively encode extreme images with large proportion of single color. Third, the proposed method can reconstruct high quality images and reduce synthesis costs by yielding better rate-PSNR(Peak Signal to Noise Ratio). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
31. New LCD MDS Codes of Non-Reed-Solomon Type.
- Author
-
Wu, Yansheng, Hyun, Jong Yoon, and Lee, Yoonjin
- Subjects
REED-Solomon codes ,DATA warehousing ,LINEAR codes ,LIQUID crystal displays ,TELECOMMUNICATION systems ,CLOUD storage - Abstract
Both linear complementary dual (LCD) codes and maximum distance separable (MDS) codes have good algebraic structures, and they have interesting practical applications such as communication systems, data storage, quantum codes, and so on. So far, most of LCD MDS codes have been constructed by employing generalized Reed-Solomon codes. In this paper we construct some classes of new Euclidean LCD MDS codes and Hermitian LCD MDS codes which are not monomially equivalent to Reed-Solomon codes, called LCD MDS codes of non-Reed-Solomon type. Our method is based on the constructions of Beelen et al. (2017) and Roth and Lempel (1989). To the best of our knowledge, this is the first paper on the construction of LCD MDS codes of non-Reed-Solomon type; any LCD MDS code of non-Reed-Solomon type constructed by our method is not monomially equivalent to any LCD code constructed by the method of Carlet et al. (2018). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
32. Continuous-Flow Matrix Transposition Using Memories.
- Author
-
Garrido, Mario and Pirsch, Peter
- Subjects
MATRICES (Mathematics) ,MEMORY - Abstract
In this paper, we analyze how to calculate the matrix transposition in continuous flow by using a memory or group of memories. The proposed approach studies this problem for specific conditions such as square and non-square matrices, use of limited access memories and use of several memories in parallel. Contrary to previous approaches, which are based on specific cases or examples, the proposed approach derives the fundamental theory involved in the problem of matrix transposition in a continuous flow. This allows for obtaining the exact equations for the read and write addresses of the memories and other control signals in the circuits. Furthermore, the cases that involve non-square matrices, which have not been studied in detail in the literature, are analyzed in depth in this paper. Experimental results show that the proposed approach is capable of transposing matrices of $8192 \times 8192$ 32-bit data received in series at a rate of 200 mega samples per second, which doubles the throughput of previous approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
33. Codes for Limited Magnitude Error Correction in Multilevel Cell Memories.
- Author
-
Liu, Shanshan, Reviriego, Pedro, and Lombardi, Fabrizio
- Subjects
ERROR correction (Information theory) ,CIPHERS ,STATIC random access memory ,MEMORY ,CELLS - Abstract
Multilevel cell (MLC) memories have been advocated for increasing density at low cost in next generation memories. However, the feature of several bits in a cell reduces the distance between levels; this reduced margin makes such memories more vulnerable to defective phenomena and parameter variations, leading to an error in stored data. These errors typically are of limited magnitude, because the induced change causes the stored value to exceed only a few of the level boundaries. To protect these memories from such errors and ensure that the stored data is not corrupted, Error Correction Codes (ECCs) are commonly used. However, most existing codes have been designed to protect memories in which each cell stores a bit and thus, they are not efficient to protect MLC memories. In this paper, an efficient scheme that can correct up to magnitude-3 errors is presented and evaluated. The scheme is based by combining ECCs that are commonly used to protect traditional memories. In particular, Interleaved Parity (IP) bits and Single Error Correction and Double Adjacent Error Correction (SEC-DAEC) codes are utilized; both these codes are combined in the proposed IP-DAEC scheme to efficiently provide a strong coding function for correction, thus exceeding the capabilities of most existing coding schemes for limited magnitude errors. The SEC-DAEC code is used to detect the cell in error and correct some bits, while the IP bits identify the remaining erroneous bits in the memory cell. The use of these simple codes results in an efficient implementation of the decoder compared to existing techniques as shown by the evaluation results presented in this paper. The proposed scheme is also competitive in terms of number of parity check bits and memory redundancy. Therefore, the proposed IP-DAEC scheme is a very efficient alternative to protect and correct MLC memories from limited magnitude errors. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. A Novel Universal Interface for Constructing Memory Elements for Circuit Applications.
- Author
-
Zheng, Ciyan, Yu, Dongsheng, Iu, Herbert Ho Ching, Fernando, Tyrone, Sun, Tingting, Eshraghian, Jason K., and Guo, Hengdao
- Subjects
CIRCUIT elements ,EMULATION software ,MOORE'S law ,INTERFACE circuits ,MEMRISTORS ,MEMORY - Abstract
The rapid expansion of analog and neuromorphic memristive applications has proven that their reconfigurable and reprogrammable characteristics will be major proponents for pushing beyond Moore’s Law. The lack of easily accessible and reliable solid-state memory elements (mem-elements) results in an ever-increasing body of the research lacking physical verification, and an associated high barrier to entry for researchers. This paper serves to fix this deficiency by introducing a novel universal interface circuit, which when connected to different peripheral circuits, can be used to build fundamental mem-elements. There is an abundance of mem-element emulators, we adopt their advantages into our design to foster practical and broadly applicable mem-element circuits. In comparison to other similar state-of-the-art emulators, our circuit utilized up to 42.9% fewer active components which consumed up to 31.9% less power with an associated reduction of size by 41.7%. Our proposed emulator continues to operate with hysteresis at over 180 kHz, which is two orders of magnitude higher than other similar emulators and commercially available solid-state memristors, whilst maintaining floating terminal connections. Rigorous theoretical, simulation and experimental results are conducted with good agreement with applications given, demonstrating the ability of the universal interface to discretely build mem-elements. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. Layered Division Multiplexing for ATSC 3.0: Implementation and Memory Use Aspects.
- Author
-
Lee, Jae-Young, Park, Sung-Ik, Kwon, Sunhyoung, Lim, Bo-Mi, Ahn, Sungjun, Hur, Namho, Kim, Heung Mook, and Kim, Jeongchang
- Subjects
FREQUENCY division multiple access ,TRANSMITTERS (Communication) ,DIVISION ,MEMORY ,SIGNAL-to-noise ratio ,MULTIPLEXING - Abstract
This paper presents implementation and memory use aspects for layered division multiplexing (LDM) technology defined in the next generation terrestrial broadcast standard, called advanced television systems committee (ATSC) 3.0. As LDM becomes a new method that combines multiple broadcast contents, its practical considerations on transmitter and receiver implementations as well as memory usages are described in this paper. When multiple physical layer pipes are used, the feasibility of the implementation and memory use aspects are discussed, and the performance analysis in comparison with other multiplexing techniques that ATSC 3.0 offers is shown. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Harnessing Correlations in Distributed Erasure-Coded Key-Value Stores.
- Author
-
Ali, Ramy E. and Cadambe, Viveck R.
- Subjects
REED-Solomon codes ,RETAIL stores ,PROTHROMBIN - Abstract
Motivated by applications of distributed storage systems to key-value stores, the multi-version coding problem has been formulated to efficiently store frequently updated data in asynchronous decentralized storage systems. Inspired by consistency requirements in distributed systems, the main goal in the multi-version coding problem is to ensure that the latest possible version of the data is decodable even if the data updates have not reached all the servers in the system. In this paper, we study the storage cost of ensuring consistency for the case where the data versions are correlated, in contrast to previous work where the data versions were treated as being independent. We provide multi-version code constructions that show that the storage cost can be significantly smaller than the previous constructions depending on the degree of correlation, despite the asynchrony and the decentralized nature. Our achievability results are based on Reed–Solomon codes and random binning. Through an information-theoretic converse, we show that our multi-version codes are asymptotically nearly optimal, within a factor of 2, in certain interesting regimes. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Constructions of Coded Caching Schemes With Flexible Memory Size.
- Author
-
Cheng, Minquan, Jiang, Jing, Yan, Qifa, and Tang, Xiaohu
- Subjects
LINEAR network coding ,INFORMATION theory ,POCKET computers ,CACHE memory ,MEMORY - Abstract
Coded caching scheme recently has become quite popular in the wireless network, since the maximum transmission amount ${R}$ reduces effectively during the peak-traffic times. To realize a coded caching scheme, each file must be divided into ${F}$ packets, which usually increases the computation complexity of a coded caching scheme. So we prefer to design a scheme with ${R}$ and ${F}$ as small as possible in practice. However, there exists a tradeoff between ${R}$ and ${F}$. In this paper, we generalize the schemes constructed by Shangguan et al. (IEEE Transactions on Information Theory, 64, 5755–5766, 2018) and Yan et al. (IEEE Transactions on Information Theory 63, 5821–5833, 2017), respectively. These two classes of schemes have a wider range of application due to the more flexible memory size than the original ones. By comparing with the previous known deterministic schemes, our new schemes have advantages on ${R}$ or ${F}$. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. Two Bit Overlap: A Class of Double Error Correction One Step Majority Logic Decodable Codes.
- Author
-
Reviriego, Pedro, Liu, Shanshan, Rottenstreich, Ori, and Lombardi, Fabrizio
- Subjects
THRESHOLD logic ,ERROR correction (Information theory) ,PARITY-check matrix ,MAGIC squares ,ORTHOGONAL codes ,SOFT errors - Abstract
Error Correction Codes (ECCs) are commonly used to protect memories against soft errors with an impact on memory area and delay. For large memories, the area overhead is mostly due to the additional cells needed to store the parity check bits. In terms of delay, the overhead is mostly needed to detect and correct errors when the data is read from the memory. Most ECCs that can correct more than one error have a complex decoding process and so are limited in high speed memory applications. One exception is One Step Majority Logic Decodable (OS-MLD) codes for which decoding can be done in parallel at high speed. Unfortunately, there are only a few OS-MLD codes that provide a limited choice in terms of block sizes, error correction capabilities and code rate. Therefore, there is considerable interest in a novel construction of OS-MLD codes to provide additional choices for protecting memories. In this paper, a new method to construct Double Error Correction (DEC) OS-MLD codes is presented. This method is based on the use of parity check matrices in which two bits have at most two parity check equations in common; the proposed method provides codes that require a smaller number of parity check bits than existing codes like Orthogonal Latin Square (OLS) codes. The drawback of the proposed Two Bit Overlap (TBO) codes is that they require slightly more complex decoding than OLS codes. Therefore, they provide an intermediate solution between OLS and non OS-MLD codes in terms of decoding delay and number of parity check bits. The proposed TBO codes have been implemented for some block sizes and compared to both OLS and BCH codes to illustrate the trade off in delay and memory overhead. Finally, this paper discusses the generalization of the proposed scheme to codes with larger error correction capabilities. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. A 2M1M Crossbar Architecture: Memory.
- Author
-
Teimoori, Mehri, Amirsoleimani, Amirali, Ahmadi, Arash, and Ahmadi, Majid
- Subjects
MEMRISTORS ,LOGIC design ,CROSSBAR switches (Electronics) - Abstract
Memristor crossbar architectures are considered as one of the most promising platforms for future memory, logic, and in-memory computing applications. This paper presents a 2M1M crossbar architecture, capable of memory and logic applications, based on a transistor-less memory cell, which behaves as a switching circuit. The proposed memory cell consists of two access and one target memristors that utilize a gating structure by access devices to reduce sneak path effect. This paper has considerably lower wiring density and lower number of memristors per bit compared with its peers. Therefore, it can be a suitable structure for high-density memory and logic applications. In addition to its in-memory computing capabilities, 2M1M structure as a memory offers higher density and less energy consumption in comparison with conventional CMOS-based static random access memory. In comparison with previous works, simulation results show significant improvements in basic implementation costs of the memory cell in terms of write time (1.11 ns), read time (200 ps), density (80 Gb/cm2), energy consumption ($23.2\times 10^{-3}\,\,\mathrm {fJ/bit}$), and wiring complexity. Also, it has a sneak path current of 90-nA per memory cell operation which is considerably lower compared with its peers. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
40. Bounding DRAM Interference in COTS Heterogeneous MPSoCs for Mixed Criticality Systems.
- Author
-
Hassan, Mohamed and Pellizzoni, Rodolfo
- Subjects
DYNAMIC random access memory ,MULTIPROCESSORS ,PERFORMANCE of multiprocessors ,EMBEDDED computer systems ,REAL-time computing - Abstract
Commercial off-the-shelf (COTS) heterogeneous multiple processors systems-on-chip (MPSoCs) are appealing platforms for emerging mixed criticality systems (MCSs). To satisfy MCS requirements, the platform must guarantee predictable timing bounds for critical applications, without degrading average performance for noncritical applications. In particular, this paper studies the main memory subsystem, which in modern MPSoCs is typically based on double data rate synchronous dynamic access memory. While there exists previous work on worst-case DRAM latency analysis, such work only covers a small subset of possible COTS configurations, which are not targeted at MCS. Therefore, we derive a generalized interference delay analysis for DRAM main memory that accounts for a breadth of features deployed in COTS platforms. We then explore the design space by studying the effects of each feature on both the worst-case delay for critical applications, and the bandwidth for noncritical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
41. C-AND: Mixed Writing Scheme for Disturb Reduction in 1T Ferroelectric FET Memory.
- Author
-
Dahan, Mor M., Breyer, Evelyn T., Slesazeck, Stefan, Mikolajick, Thomas, and Kvatinsky, Shahar
- Subjects
FIELD-effect transistors ,FERROELECTRICITY ,MEMORY - Abstract
Ferroelectric field effect transistor (FeFET) memory has shown the potential to meet the requirements of the growing need for fast, dense, low-power, and non-volatile memories. In this paper, we propose a memory architecture named crossed-AND (C-AND), in which each storage cell consists of a single ferroelectric transistor. The write operation is performed using different write schemes and different absolute voltages, to account for the asymmetric switching voltages of the FeFET. It enables writing an entire wordline in two consecutive cycles and prevents current and power through the channel of the transistor. During the read operation, the current and power are mostly sensed at a single selected device in each column. The read scheme additionally enables reading an entire word without read errors, even along long bitlines. Our Simulations demonstrate that, in comparison to the previously proposed AND architecture, the C-AND architecture diminishes read errors, reduces write disturbs, enables the usage of longer bitlines, and saves up to 2.92X in memory cell area. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. EGCN: An Efficient GCN Accelerator for Minimizing Off-Chip Memory Access.
- Author
-
Han, Yunki, Park, Kangkyu, Jung, Youngbeom, and Kim, Lee-Sup
- Subjects
DYNAMIC random access memory ,REPRESENTATIONS of graphs ,MATRIX multiplications ,MEMORY ,RANDOM access memory ,ENERGY consumption - Abstract
As Graph Convolutional Networks (GCNs) have emerged as a promising solution for graph representation learning, designing specialized GCN accelerators has become an important challenge. An analysis of GCN workloads shows that the main bottleneck of GCN processing is not computation but the memory latency of intensive off-chip data transfer. Therefore, minimizing off-chip data transfer is the primary challenge for designing an efficient GCN accelerator. To address this challenge, optimization is initialized by considering GCNs as tiled matrix multiplication. In this paper, we optimize off-chip memory access from both the in- and out-of-tile perspectives. From the out-of-tile perspective, we find optimal tile configurations of given datasets and on-chip buffer capacity, then observe the dataflow across phases and layers. Inter-layer phase fusion dataflow with optimal tile configuration reduces data transfer of intermediate outputs. From the in-tile perspective, due to the sparsity of tiles, tiles have redundant data which does not participate in computation. Redundant data load is eliminated with hardware support. Finally, we introduce an efficient GCN inference accelerator, EGCN, specialized for minimizing off-chip memory access. EGCN achieves 41.9% off-chip DRAM access reduction, 1.49× speedup, and 1.95× energy efficiency improvement on average over the state-of-the-art accelerators. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Future Scaling of Memory Hierarchy for Tensor Cores and Eliminating Redundant Shared Memory Traffic Using Inter-Warp Multicasting.
- Author
-
Lee, Sunjung, Hwang, Seunghwan, Kim, Michael Jaemin, Choi, Jaewan, and Ahn, Jung Ho
- Subjects
MULTICASTING (Computer networks) ,ARTIFICIAL neural networks ,MEMORY ,PARALLEL programming - Abstract
The CUDA core of NVIDIA GPUs had been one of the most efficient computation units for parallel computing. However, recent rapid developments in deep neural networks demand an even higher level of computational performance. To meet this requirement, NVIDIA has introduced the Tensor core in recent generations. However, their impressive enhancements in computational performance have newly brought high pressure on the memory hierarchy. In this paper, first we identify the required memory bandwidth in the memory hierarchy as the computational performance increases in actual GPU hardware. Through a comparison of the CUDA core and the Tensor core in V100, we find that the tremendous performance increase of the Tensor core requires much higher memory bandwidth than that in the CUDA core. Moreover, we thoroughly investigate memory bandwidth requirement over Tensor core generations of V100, RTX TITAN, and A100. Lastly, we analyze a hypothetical next-generation Tensor core introduced by NVIDIA through a GPU simulation, through which we propose an inter-warp multicasting microarchitecture that reduces redundant shared memory (SMEM) traffic during the GEMM process. Our evaluation shows that inter-warp multicasting reduces the SMEM bandwidth pressure by 33% and improves the performance by 19% on average in all layers of ResNet-152 and BERT-Large. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. CloudChain: A Cloud Blockchain Using Shared Memory Consensus and RDMA.
- Author
-
Xu, Minghui, Liu, Shuo, Yu, Dongxiao, Cheng, Xiuzhen, Guo, Shaoyong, and Yu, Jiguo
- Subjects
MEMORY ,BLOCKCHAINS ,CLOUD computing ,GOVERNMENT agencies ,ALGORITHMS - Abstract
Blockchain technologies can enable secure computing environments among mistrusting parties. Permissioned blockchains are particularly enlightened by companies, enterprises, and government agencies due to their efficiency, customizability, and governance-friendly features. Obviously, seamlessly fusing blockchain and cloud computing can significantly benefit permissioned blockchains; nevertheless, most blockchains implemented on clouds are originally designed for loosely-coupled networks where nodes communicate asynchronously, failing to take advantages of the closely-coupled nature of cloud servers. In this paper, we propose an innovative cloud-oriented blockchain – CloudChain, which is a modularized three-layer system composed of the network layer, consensus layer, and blockchain layer. CloudChain is based on a shared-memory model where nodes communicate synchronously by direct memory accesses. We realize the shared-memory model with the Remote Direct Memory Access technology, based on which we propose a shared-memory consensus algorithm to ensure presistence and liveness, the two crucial blockchain security properties countering Byzantine nodes. We also implement a CloudChain prototype based on a RoCEv2-based testbed to experimentally validate our design, and the results verify the feasibility and efficiency of CloudChain. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Design of Placement Delivery Arrays for Coded Caching With Small Subpacketizations and Flexible Memory Sizes.
- Author
-
Wu, Xianzhang, Cheng, Minquan, Li, Congduan, and Chen, Li
- Subjects
ORTHOGONAL arrays ,DATA libraries ,SERVER farms (Computer network management) ,MEMORY ,DATA transmission systems ,POCKET computers ,MULTICASTING (Computer networks) - Abstract
Coded caching is an emerging technique to reduce the data transmission load during the peak-traffic times. In such a scheme, each file in the data center or library is divided into a number of packets to pursue a low broadcasting rate based on the designed placements at each user’s cache. However, the implementation complexity of this scheme increases with the number of packets. It is crucial to design a scheme with a small subpacketization level, while maintaining a relatively low transmission rate. Recently, a combinatorial structure called placement delivery array (PDA) was proposed as an effective tool to design coded caching schemes with a relatively low subpacketization level. This paper proposes a novel PDA construction by selecting proper orthogonal arrays (POAs), which generalizes the existing construction but with a more flexible memory size. Based on the proposed PDA construction, an effective transform is further proposed to enable a coded caching scheme to achieve a smaller subpacketization level. Moreover, two new coded caching schemes with the coded placement are derived. It is shown that the proposed schemes can yield a lower subpacketization level or transmission rate over the benchmark schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. MAMA Net: Multi-Scale Attention Memory Autoencoder Network for Anomaly Detection.
- Author
-
Chen, Yurong, Zhang, Hui, Wang, Yaonan, Yang, Yimin, Zhou, Xianen, and Wu, Q. M. Jonathan
- Subjects
SCALE-free network (Statistical physics) ,COVID-19 ,MEMORY ,MACHINE learning ,DATA distribution - Abstract
Anomaly detection refers to the identification of cases that do not conform to the expected pattern, which takes a key role in diverse research areas and application domains. Most of existing methods can be summarized as anomaly object detection-based and reconstruction error-based techniques. However, due to the bottleneck of defining encompasses of real-world high-diversity outliers and inaccessible inference process, individually, most of them have not derived groundbreaking progress. To deal with those imperfectness, and motivated by memory-based decision-making and visual attention mechanism as a filter to select environmental information in human vision perceptual system, in this paper, we propose a Multi-scale Attention Memory with hash addressing Autoencoder network (MAMA Net) for anomaly detection. First, to overcome a battery of problems result from the restricted stationary receptive field of convolution operator, we coin the multi-scale global spatial attention block which can be straightforwardly plugged into any networks as sampling, upsampling and downsampling function. On account of its efficient features representation ability, networks can achieve competitive results with only several level blocks. Second, it’s observed that traditional autoencoder can only learn an ambiguous model that also reconstructs anomalies “well” due to lack of constraints in training and inference process. To mitigate this challenge, we design a hash addressing memory module that proves abnormalities to produce higher reconstruction error for classification. In addition, we couple the mean square error (MSE) with Wasserstein loss to improve the encoding data distribution. Experiments on various datasets, including two different COVID-19 datasets and one brain MRI (RIDER) dataset prove the robustness and excellent generalization of the proposed MAMA Net. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Coflow-Like Online Data Acquisition from Low-Earth-Orbit Datacenters.
- Author
-
Huang, Huawei, Guo, Song, Liang, Weifa, Wang, Kun, and Okabe, Yasuo
- Subjects
ACQUISITION of data ,DATABASES ,TELECOMMUNICATION satellites ,LOW earth orbit satellites ,TELECOMMUNICATION systems ,DATA warehousing - Abstract
Satellite-based communication technology has gained much attention in the past few years, where satellites play mainly the supplementary roles as relay devices to terrestrial communication networks. Unlike previous work, we treat the low-earth-orbit (LEO) satellites as secure data storage mediums. We focus on data acquisition from a LEO satellite based data storage system (also referred to as the LEO based datacenters), which has been considered as a promising and secure paradigm on data storage. Under the LEO based datacenter architecture, one fundamental challenge is to deal with energy-efficient downloading from space to ground while maintaining the system stability. In this paper, we aim to maximize the amount of data admitted while minimizing the energy consumption, when downloading files from LEO based datacenters to meet user demands. To this end, we first formulate a novel optimization problem and develop an online scheduling framework. We then devise a novel coflow-like “Join the first $K$ K -shortest Queues (JKQ)” based job-dispatch strategy, which can significantly lower backlogs of queues residing in LEO satellites, thereby improving the system stability. We also analyze the optimality of the proposed approach and system stability. We finally evaluate the performance of the proposed algorithm through conducting emulator based simulations, based on real-world LEO constellation and user demand traces. The simulation results show that the proposed algorithm can dramatically lower the queue backlogs and achieve high energy efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
48. Decoder Partitioning: Towards Practical List Decoding of Polar Codes.
- Author
-
Hashemi, Seyyed Ali, Mondelli, Marco, Hassani, S. Hamed, Condo, Carlo, Urbanke, Rudiger L., and Gross, Warren J.
- Subjects
DECODING algorithms ,5G networks ,CYCLIC redundancy check codes ,ERROR correction (Information theory) ,MEMORY ,CHARTS, diagrams, etc. - Abstract
Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming 5G standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successive-cancelation list (SCL) decoding. However, its hardware implementation requires a large amount of memory. Recently, a partitioned SCL (PSCL) decoder has been proposed to significantly reduce the memory consumption. In this paper, we consider the paradigm of PSCL decoding from a practical standpoint, and we provide several improvements. First, by changing the target signal-to-noise ratio and consequently modifying the construction of the code, we are able to improve the performance at no additional computational, latency, or memory cost. Second, we bridge the performance gap between SCL and PSCL decoding by introducing a generalized PSCL decoder and a layered PSCL decoder. In this way, we obtain almost the same performance of the SCL decoder with a significantly lower memory requirement, as testified by hardware implementation results. Third, we present an optimal scheme to allocate cyclic redundancy checks. Finally, we provide a lower bound on the list size that guarantees optimal maximum a posteriori performance for the binary erasure channel. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
49. Abnormal Volatile Memory Characteristic in Normal Nonvolatile ZnSnO Resistive Switching Memory.
- Author
-
Hsu, Chih-Chieh, Chen, Yu-Ting, Chuang, Po-Yang, and Lin, Yu-Sheng
- Subjects
NONVOLATILE random-access memory ,ZINC oxide ,X-ray scattering ,LOGIC circuits ,SEMICONDUCTORS - Abstract
Resistive random access memory is known as a type of nonvolatile memory. An abnormal volatile memory characteristic of a normal ZnSnO resistive memory is first demonstrated in this paper. Although the ${I}$ – ${V}$ curves exhibit a normal and stable resistive switching memory behavior, the resistance state is found to be only determined by the initial applied voltage along with the voltage sweep direction. It is set/reset processes free. Namely, it is volatile. The resistance states are found to be dominated by trap-assisted tunneling, trap-controlled space-charge-limited conduction, and hopping transport. Different resistance states are related to different carrier transport mechanisms. Each resistance state can independently and repeatably appear for over 1000 voltage sweeps. The ratio of the high-resistance state to the low-resistance state is $\sim 3\times 10^{2}$. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. Viewer-Aware Intelligent Efficient Mobile Video Embedded Memory.
- Author
-
Chen, Dongliang, Edstrom, Jonathon, Gong, Yifu, Gao, Peng, Yang, Lei, McCourt, Mark E., Wang, Jinhui, and Gong, Na
- Subjects
COMPUTER storage devices ,DIGITAL video ,ELECTRIC power consumption - Abstract
Embedded memory is a critical component in today’s mobile video processing systems, increasingly dominating power consumption and shortening battery life of mobile devices. Traditional hardware-level power optimization techniques usually come with significant implementation overhead to solve the memory failure problem during low-voltage operations. This paper presents a novel mobile video memory to exploit the power saving opportunities provided by a viewer experience under environmental visual interference. The viewing contexts, in particular the ambient luminance, significantly influence the quality of the viewer experience, and in the context with higher luminance levels, mobile users have higher tolerance to the video degradation. Accordingly, the memory failures can be introduced adaptively to achieve power savings without influencing the viewer experience. To meet the silicon area constraint in mobile devices, a simple but an efficient hardware implementation scheme is developed to minimize area overhead. The experimental results based on a 45-nm CMOS technology show that, as compared with the conventional memory design, the proposed technique can achieve up to 48% power savings with good perceivable quality and negligible implementation overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.