Author: "Yuan Xie" / Journal: ieee transactions on computer-aided design of integrated circuits and systems - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yuan Xie"' showing total 21 results

Start Over Author "Yuan Xie" Journal ieee transactions on computer-aided design of integrated circuits and systems

21 results on '"Yuan Xie"'

1. IronMan-Pro: Multiobjective Design Space Exploration in HLS via Reinforcement Learning and Graph Neural Network-Based Modeling

Author: Nan Wu, Yuan Xie, and Cong Hao
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023

2. SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration

Author: Fengbin Tu, Yiqi Wang, Ling Liang, Yufei Ding, Leibo Liu, Shaojun Wei, Shouyi Yin, and Yuan Xie
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023

3. STPAcc: Structural TI-Based Pruning for Accelerating Distance-Related Algorithms on CPU-FPGA Platforms

Author: Yuan Xie, Boyuan Feng, Yuke Wang, Yufei Ding, Lei Deng, and Gushu Li
Subjects: Acceleration, Speedup, Exploit, Computer science, Computation, Pruning (decision trees), Electrical and Electronic Engineering, Field-programmable gate array, Computer Graphics and Computer-Aided Design, Algorithm, Implementation, Software, Efficient energy use
Abstract: As a promising solution to boost the performance of distance-related algorithms (e.g., K-means and KNN), FPGAbased acceleration attracts lots of attention, but also comes with numerous challenges. In this work, we propose, STPAcc, an optimization framework based on structural triangle-inequality (TI) based pruning (STP) for accelerating distance-related algorithms on CPU-FPGA platforms. STPAcc provides a domainspecific language to unify distance-related algorithms effectively, a structural TI-based pruning strategy to remove unnecessary distance computations, a coarse-grained workload partitioning and mapping strategy to fully exploit the potentials of the CPUFPGA platform, and fine-grained hardware optimizations to further improve performance on the FPGA. Intensive experiments show that STPAcc designs achieve 31:42× speedup and 99:63× better energy efficiency on average over standard CPU-based implementations.
Published: 2022

4. Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training

Author: Mingyu Yan, Xinfeng Xie, Yufei Ding, Xiaobing Chen, Xing Hu, Lei Deng, Abanti Basak, Ling Liang, Yuan Xie, Zidong Du, and Yuke Wang
Subjects: Theoretical computer science, Artificial neural network, business.industry, Computer science, Computation, Computer Graphics and Computer-Aided Design, Software, Parallelism (grammar), Cache, Electrical and Electronic Engineering, Architecture, Representation (mathematics), business, Efficient energy use
Abstract: Graph convolutional network (GCN) emerges as a promising direction to learn the inductive representation in graph data commonly used in widespread applications, such as E-commerce, social networks, and knowledge graphs. However, learning from graphs is non-trivial because of its mixed computation model involving both graph analytics and neural network computing. To this end, we decompose the GCN learning into two hierarchical paradigms: graph-level and node-level computing. Such a hierarchical paradigm facilitates the software and hardware accelerations for GCN learning. We propose a lightweight graph reordering methodology, incorporated with a GCN accelerator architecture that equips a customized cache design to fully utilize the graph-level data reuse. We also propose a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively. Results show that Rubik accelerator design improves energy efficiency by 26.3x to 1375.2x than GPU platforms across different datasets and GCN models.
Published: 2022

5. Hardware-Enabled Efficient Data Processing With Tensor-Train Decomposition

Author: Zheng Qu, Bangyan Wang, Jilan Lin, Lei Deng, Yuan Xie, Ling Liang, Guoqi Li, Hengnu Chen, and Zheng Zhang
Subjects: Data processing, Speedup, Computer science, business.industry, Deep learning, Big data, 02 engineering and technology, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Convolution, Software, Singular value decomposition, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Electrical and Electronic Engineering, business, Computer hardware, Curse of dimensionality
Abstract: In recent years, tensor computation has become a promising tool for solving big data analysis, machine learning, medical image and EDA problems. To ease the memory and computation intensity of tensor processing, decomposition techniques, especially Tensor-train Decomposition(TTD), are widely adopted to compress the extremely high-dimensional tensor data. Despite TTD’s potential to break the curse of dimensionality, researchers have not yet leveraged its full computational potential, mainly because of two reasons:(1) Executing TTD itself is time-and energy-consuming due to the singular value decomposition(SVD) operation inside each of TTD’s iteration; (2) Additional software/hardware optimizations are often required to process the obtained TT-format data in certain applications such as deep learning inference. In this paper, we address these challenges with two approaches. Firstly, we propose an algorithm-hardware co-design with customized architecture namely TTD Engine to accelerate TTD. We use MRI image compression as a demo application to illustrate the efficacy of the proposed accelerator. Secondly, we present a case study demonstrating the benefit of TT-format data processing and the efficacy of using TTD Engine. In the case study, we use TT approach to realize convolution operation, which is difficult and nontrivial for TT-format data. Experimental results show that, TTD Engine achieves, on average, 14.9×∼36.9× speedup over CPU implementations and 4.1×∼9.9× speedup compared to the GPU baseline. The energy efficiency is also improved by at least 14.4× and 5.4× over CPU and GPU, respectively. Moreover, our hardware-enabled TT-format data processing further leads to more efficient implementations of complicated operations and applications.
Published: 2022

6. Rescuing RRAM-Based Computing From Static and Dynamic Faults

Author: Tianqi Tang, Yuan Xie, Xing Hu, Yu Wang, Cheng-Da Wen, Jilan Lin, and Ing-Chao Lin
Subjects: Kernel (linear algebra), Nonlinear system, Artificial neural network, Computer engineering, Computer science, Reliability (computer networking), Quantization (signal processing), Overhead (computing), Fault tolerance, Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software, Resistive random-access memory
Abstract: Emerging resistive random access memory (RRAM) has shown the great potential of in-memory processing capability, and thus attracts considerable research interests in accelerating memory-intensive applications, such as neural networks (NNs). However, the accuracy of RRAM-based NN computing can degrade significantly, due to the intrinsic statistical variations of the resistance of RRAM cells. In this article, we propose SIGHT, a synergistic algorithm-architecture fault-tolerant framework, to holistically address this issue. Specifically, we consider three major types of faults for RRAM computing: 1) nonlinear resistance distribution; 2) static variation; and 3) dynamic variation. From the algorithm level, we propose a resistance-aware quantization to compel the NN parameters to follow the exact nonlinear resistance distribution as RRAM, and introduce an input regulation technique to compensate for RRAM variations. We also propose a selective weight refreshing scheme to address the dynamic variation issue that occurs at runtime. From the architecture level, we propose a general and low-cost architecture accordingly for supporting our fault-tolerant scheme. Our evaluation demonstrates almost no accuracy loss for our three fault-tolerant algorithms, and the proposed SIGHT architecture incurs performance overhead as little as 7.14%.
Published: 2021

7. DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads

Author: Niu Dimin, Yuan Xie, Peng Gu, Shuangchen Li, Zheng Hongzhong, Xinfeng Xie, and Malladi Krishna T
Subjects: Hardware_MEMORYSTRUCTURES, Speedup, Computer science, Concurrency, Locality, Memory bandwidth, 02 engineering and technology, Parallel computing, Loop tiling, Computer Graphics and Computer-Aided Design, Bottleneck, 020202 computer hardware & architecture, Memory bank, Lookup table, 0202 electrical engineering, electronic engineering, information engineering, Cache, Electrical and Electronic Engineering, Software
Abstract: The frequent data movement between the processor and the memory has become a severe performance bottleneck for deep neural network (DNN) training workloads in data centers. To solve this off-chip memory access challenge, the 3-D stacking processing-in-memory (3D-PIM) architecture provides a viable solution. However, existing 3D-PIM designs for DNN training suffer from the limited memory bandwidth in the base logic die. To overcome this obstacle, integrating the DNN related logic near each memory bank becomes a promising yet challenging solution, since naively implementing the floating-point (FP) unit and the cache in the memory die incurs a large area overhead. To address these problems, we propose DLUX, a high performance and energy-efficient 3D-PIM accelerator for DNN training using the near-bank architecture. From the hardware perspective, to support the FP multiplier with low area overhead, an in-DRAM lookup table (LUT) mechanism is invented. Then, we propose to use a small scratchpad buffer together with a lightweight transformation engine to exploit the locality and enable flexible data layout without the expensive cache. From the software aspect, we split the mapping/scheduling tasks during DNN training into intralayer and interlayer phases. During the intralayer phase, to maximize data reuse in the LUT buffer and the scratchpad buffer, achieve high concurrency, and reduce data movement among banks, a 3D-PIM customized loop tiling technique is adopted. During the interlayer phase, efficient techniques are invented to ensure the input–output data layout consistency and realize the forward–backward layout transposition. Experiment results show that DLUX can reduce FP32 multiplier area overhead by 60% against the direct implementation. Compared with a Tesla V100 GPU, end-to-end evaluations show that DLUX can provide on average $6.3\times $ speedup and $42\times $ energy efficiency improvement.
Published: 2021

8. Practical Attacks on Deep Neural Networks by Memory Trojaning

Author: Pengfei Zuo, Lei Deng, Xing Hu, Ling Liang, Yingyan Lin, Yuan Xie, Jing Ye, and Yang Zhao
Subjects: Hardware security module, Artificial neural network, business.industry, Computer science, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Toolchain, 020202 computer hardware & architecture, Trojan, Hardware Trojan, Embedded system, Threat model, 0202 electrical engineering, electronic engineering, information engineering, Preprocessor, Electrical and Electronic Engineering, business, Software
Abstract: Deep neural network (DNN) accelerators are widely deployed in computer vision, speech recognition, and machine translation applications, in which attacks on DNNs have become a growing concern. This article focuses on exploring the implications of hardware Trojan attacks on DNNs. Trojans are one of the most challenging threat models in hardware security where adversaries insert malicious modifications to the original integrated circuits (ICs), leading to malfunction once being triggered. Such attacks can be conducted by adversaries because modern ICs commonly include third-party intellectual property (IP) blocks. Previous studies design hardware Trojans to attack DNNs with the assumption that adversaries have full knowledge or manipulation of the DNN systems’ victim model and toolchain in addition to the hardware platforms, yet such a threat model is strict, limiting their practical adoption. In this article, we propose a memory Trojan methodology that implants the malicious logics merely into the memory controllers of DNN systems without the necessity of toolchain manipulation or accessing to the victim model and thus is feasible for practical uses. Specifically, we locate the input image data among the massive volume of memory traffics based on memory access patterns and propose a Trojan trigger mechanism based on detecting the geometric feature in input images. Extensive experiments show that the proposed trigger mechanism is effective even in the presence of environmental noises and preprocessing operations. Furthermore, we design and implement the payload and verify that the proposed Trojan technique can effectively conduct both untargeted and targeted attacks on DNNs.
Published: 2021

9. SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars

Author: Jing Pei, Xing Hu, Xin Ma, Guanrui Wang, Lei Deng, Ling Liang, Liang Chang, Yuan Xie, Guoqi Li, and Liu Liu
Subjects: Dataflow, Computer science, Pipeline (computing), Network mapping, 02 engineering and technology, Parallel computing, computer.software_genre, Computer Graphics and Computer-Aided Design, Column (database), 020202 computer hardware & architecture, Convolution, Reduction (complexity), 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Compiler, Electrical and Electronic Engineering, computer, Software
Abstract: Crossbar architecture has been widely used in neural network (NN) accelerators, involving conventional and emerging devices. It performs well on the fully connected layer through efficient vector–matrix multiplication. Whereas, the advantages degrade on the convolutional layer with huge data reuse, since the execution speed and resource overhead are imbalanced when using existing fully unfolded or fully folded mapping strategy. To address this issue, we propose a novel semi-folded mapping (SemiMap) framework for implementing the convolution on crossbars. It simultaneously folds the physical resources along the row dimension of feature maps (FMs) and unfolds them along the column dimension. The former reduces the resource overhead, and the latter maintains the parallelism. An FM slicing scheme is further proposed to enable the processing of large-size image. Via our mapping framework, a row-by-row streaming pipeline for intraimage dataflow and periodical pipeline for interimage dataflow are easy to be obtained. To validate the idea, we build a many-crossbar architecture with several designs to guarantee the overall functionality and performance. Based on the measurement data of a fabricated chip, a mapping compiler and a cycle-accurate simulator are developed for the hardware simulation of large-scale networks. We evaluate the proposed SemiMap on various convolutional NNs across different network scale. ${>} 35 {\times }$ resource saving and several hundred times cycle reduction are demonstrated compared to the existing fully unfolded and fully folded strategies, respectively. This paper jumps out of the current extreme mapping schemes, and provides a balanced solution on how to efficiently deploy the computational graphs with data reuse on many-crossbar architecture.
Published: 2020

10. Efficient Super-Resolution System with Block-wise Hybridization and Quantized Winograd on FPGA

Author: Bizhao Shi, Jiaxi Zhang, Zhuolun He, Xuechao Wei, Sicheng Li, Guojie Luo, Hongzhong Zheng, and Yuan Xie
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023

11. MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures

Author: Zhenhua Zhu, Hanbo Sun, Tongxin Xie, Yu Zhu, Guohao Dai, Lixue Xia, Dimin Niu, Xiaoming Chen, X. Sharon Hu, Yu Cao, Yuan Xie, Huazhong Yang, and Yu Wang
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2023

12. TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks

Author: Ming Cheng, Lixue Xia, Zhenhua Zhu, Yuan Xie, Yi Cai, Yu Wang, and Huazhong Yang
Subjects: Artificial neural network, Computer science, Supervised learning, 02 engineering and technology, Energy consumption, Computer Graphics and Computer-Aided Design, Backpropagation, 020202 computer hardware & architecture, Resistive random-access memory, Application-specific integrated circuit, Memory architecture, 0202 electrical engineering, electronic engineering, information engineering, Electronic engineering, Electrical and Electronic Engineering, Crossbar switch, Software, Efficient energy use
Abstract: The training of neural networks (NN) is usually time-consuming and resource intensive. The emerging metal-oxide resistive random-access memory (RRAM) device has shown potential for the computation of NN. RRAM crossbar structure and multibit characteristics can perform the matrix-vector product in high energy efficiency, which is the most common operation of NN. Two challenges exist for realizing training NN based on RRAM. First, the current architectures based on RRAM only support the inference in training NN and cannot perform the backpropagation (BP) and the weight update of training NN. Second, training NN requires enormous iterations to constantly update the weights for reaching the convergence. However, this weight update leads to large energy consumption because of the nonideal factors of RRAM. In this paper, we propose a training-in-memory based on RRAM (TIME) architecture and the peripheral circuit design to enable training NN on RRAM. TIME supports the BP and the weight update while maximizing the re-usage of peripheral circuits of the inference operation on RRAM. Meanwhile, a set of optimization strategies focusing on the nonideal factors are designed to reduce the cost of tuning RRAM. We explore the performance of both supervised learning (SL) and deep reinforcement learning (DRL) on TIME. A specific mapping method of DRL is also introduced to further improve energy efficiency. Simulation results show that in SL, TIME can achieve $5.3{\times }$ higher energy efficiency on average compared with DaDianNao, an application-specific integrated circuits (ASIC) in CMOS technology. In DRL, TIME can perform an average $126{\times }$ higher than GPU in energy efficiency. If the cost of tuning RRAM can be further reduced, TIME has the potential to boost the energy efficiency by two orders of magnitudes compared with ASIC.
Published: 2019

13. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

Author: Guohao Dai, Yongpan Liu, Yuze Chi, Huazhong Yang, Guangyu Sun, Tianhao Huang, Jishen Zhao, Yuan Xie, and Yu Wang
Subjects: Speedup, Hybrid Memory Cube, Computer science, Distributed computing, Locality, 02 engineering and technology, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Scheduling (computing), Data access, Memory architecture, 0202 electrical engineering, electronic engineering, information engineering, System on a chip, Electrical and Electronic Engineering, Software, Random access
Abstract: Large-scale graph processing requires the high bandwidth of data access. However, as graph computing continues to scale, it becomes increasingly challenging to achieve a high bandwidth on generic computing architectures. The primary reasons include: the random access pattern causing local bandwidth degradation, the poor locality leading to unpredictable global data access, heavy conflicts on updating the same vertex, and unbalanced workloads across processing units. Processing-in-memory (PIM) has been explored as a promising solution to providing high bandwidth, yet open questions of graph processing on PIM devices remain in: 1) how to design hardware specializations and the interconnection scheme to fully utilize bandwidth of PIM devices and ensure locality and 2) how to allocate data and schedule processing flow to avoid conflicts and balance workloads. In this paper, we propose GraphH, a PIM architecture for graph processing on the hybrid memory cube array, to tackle all four problems mentioned above. From the architecture perspective, we integrate SRAM-based on-chip vertex buffers to eliminate local bandwidth degradation. We also introduce reconfigurable double-mesh connection to provide high global bandwidth. From the algorithm perspective, partitioning and scheduling methods like index mapping interval-block and round interval pair are introduced to GraphH, thus workloads are balanced and conflicts are avoided. Two optimization methods are further introduced to reduce synchronization overhead and reuse on-chip data. The experimental results on graphs with billions of edges demonstrate that GraphH outperforms DDR-based graph processing systems by up to two orders of magnitude and $5.12 {\times }$ speedup against the previous PIM design.
Published: 2019

14. Fabrication cost analysis and cost-aware design space exploration for 3-D ICs

Author: Xiangyu Dong, Jishen Zhao, and Yuan Xie
Subjects: Standard IC, Integrated circuits -- Design and construction, Integrated circuits -- Economic aspects, Semiconductor chips -- Design and construction, Semiconductor chips -- Economic aspects, Interconnected electric utility systems -- Design and construction, International interconnected electric utility systems -- Design and construction, Silicon -- Electric properties
Published: 2010

15. Adapting <tex-math notation='LaTeX'>$\text{B}^{+}$ </tex-math> -Tree for Emerging Nonvolatile Memory-Based Main Memory

Author: Yuan Xie, Ping Chi, and Wang-Chien Lee
Subjects: 010302 applied physics, Random access memory, Dynamic random-access memory, Hardware_MEMORYSTRUCTURES, Computer science, Semiconductor memory, 02 engineering and technology, Parallel computing, Memory systems, 01 natural sciences, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Resistive random-access memory, law.invention, Non-volatile memory, Phase-change memory, Memory management, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Interleaved memory, Electrical and Electronic Engineering, Software, Computer memory, Dram
Abstract: Among the emerging nonvolatile memory (NVM) technologies, some resistive memories, including phase change memory (PCM), spin-transfer torque magnetic random access memory (STT-RAM), and metal-oxide resistive RAM (ReRAM), have been considered as promising replacements of conventional dynamic RAM (DRAM) to build future main memory systems. Main memory databases can benefit from their nice features, such as their low leakage power and nonvolatility, the high density of PCM, the good read performance and low read energy consumption of STT-RAM, and the low cost of ReRAM’s crossbar architecture. However, they also have some disadvantages, such as their long write latency, high write energy, and limited lifetime, which bring challenges to database algorithm design for NVM-based memory systems. In this paper, we focus on the design of the ubiquitous $\text{B}^{+}$ -tree, aiming to make it NVM-friendly. We present a basic cost model for NVM-based memory systems which distinguishes writes from reads, and propose detailed CPU cost and memory access models for search, insert, and delete operations on a $\text{B}^{+}$ -tree. Based on the proposed models, we analyze the CPU costs and memory behaviors of the existing NVM-friendly $\text{B}^{+}$ -tree schemes, and find that they suffer from three issues. To address these issues we propose three different schemes. Experimental results show that our schemes can efficiently improve the performance, reduce the memory energy consumption, and extend the lifetime for NVM-based memory systems.
Published: 2016

16. Optimizing the NoC Slack Through Voltage and Frequency Scaling in Hard Real-Time Embedded Systems

Author: Yuan Xie, Jia Zhan, Vijaykrishnan Narayanan, Nikolay Stoimenov, Jin Ouyang, and Lothar Thiele
Subjects: Interconnection, business.industry, Network packet, Computer science, Energy consumption, Chip, Computer Graphics and Computer-Aided Design, Embedded system, Hardware_INTEGRATEDCIRCUITS, Electrical and Electronic Engineering, Network calculus, business, Frequency scaling, Software, Computer network
Abstract: Hard real-time embedded systems impose a strict latency requirement on interconnection subsystems. In the case of network-on-chip (NoC), this means each packet of a traffic stream has to be delivered within a time interval. In addition, with the increasing complexity of NoC, it consumes a significant portion of total chip power, which boosts the power footprint of such chips. In this paper, we propose a methodology to minimize the energy consumption of NoC without violating the prespecified latency deadlines of real-time applications. First, we develop a formal approach based on network calculus to obtain the worst-case delay bound of all packets, from which we derive a safe estimate of the number of cycles that a packet can be further delayed in the network without violating its deadline—the worst-case slack. With this information, we then develop an optimization algorithm that trades the slacks for lower NoC energy. Our algorithm recognizes the distribution of slacks for different traffic streams, and assigns different voltages and frequencies to different routers to achieve NoC energy-efficiency, while meeting the deadlines for all packets. Furthermore, we design a feedback-control strategy to enable dynamic frequency and voltage scaling on the network routers in conjunction with the energy optimization algorithm. It can flexibly improve the energy-efficiency of the overall network in response to sporadic traffic patterns at runtime.
Published: 2014

17. PS3-RAM: A Fast Portable and Scalable Statistical STT-RAM Reliability/Energy Analysis Method

Author: Wujie Wen, Yu Wang, Yi Chen, Yuan Xie, and Yaojun Zhang
Subjects: Random access memory, Hardware_MEMORYSTRUCTURES, Speedup, Computer science, Monte Carlo method, Computer Graphics and Computer-Aided Design, Computer Science::Hardware Architecture, CMOS, Robustness (computer science), Scalability, Electronic engineering, Electrical and Electronic Engineering, Software, Simulation
Abstract: The development of emerging spin-transfer torque random access memory (STT-RAM) is facing two major technical challenges—poor write reliability and high write energy, both of which are severely impacted by process variations and thermal fluctuations. The evaluations on STT-RAM design metrics and robustness often require a hybrid simulation flow, i.e., modeling the CMOS and magnetic devices with SPICE and macro-magnetic models, respectively. Very often, such a hybrid simulation flow involves expensive Monte Carlo simulations when the design and behavioral variabilities of STT-RAM are taken into account. In this paper, we propose a fast and scalable semi-analytical method—PS3-RAM, enabling efficient statistical simulations in STT-RAM designs. By eliminating the costly macro-magnetic and SPICE simulations, PS3-RAM achieves more than 100 $000\boldsymbol {\times }$ runtime speedup with excellent agreement with the result of conventional simulation method. PS3-RAM can also accurately estimate the STT-RAM write error rate and write energy distributions at both magnetic tunneling junction switching directions under different temperatures, demonstrating great potential in the analysis of STT-RAM reliability and write energy at the early design stage of memory or micro-architecture.
Published: 2014

18. Through Silicon Via Aware Design Planning for Thermally Efficient 3-D Integrated Circuits

Author: Yuan Xie, Yibo Chen, Charles Luther Johnson, Dave Motschman, and Eren Kursun
Subjects: Materials science, Through-silicon via, business.industry, Bandwidth (signal processing), Design flow, Electrical engineering, Insulator (electricity), Integrated circuit design, Integrated circuit, Computer Graphics and Computer-Aided Design, law.invention, Thermal conductivity, law, Vertical direction, Electronic engineering, Electrical and Electronic Engineering, business, Software
Abstract: 3-D integrated circuits (3-D ICs) offer performance advantages due to their increased bandwidth and reduced wire-length enabled by through-silicon-via structures (TSVs). Traditionally TSVs have been considered to improve the thermal conductivity in the vertical direction. However, the lateral thermal blockage effect becomes increasingly important for TSV via farms (a cluster of TSV vias used for signal bus connections between layers) because the TSV size and pitch continue to scale in μm range and the metal to insulator ratio becomes smaller. Consequently, dense TSV farms can create lateral thermal blockages in thinned silicon substrate and exacerbate the local hotspots. In this paper, we propose a thermal-aware via farm placement technique for 3-D ICs to minimize lateral heat blockages caused by dense signal bus TSV structures. By incorporating thermal conductivity profile of via farm blocks in the design flow and enabling placement/aspect ratio optimization, the corresponding hotspots can be minimized within the wire-length and area constraints.
Published: 2013

19. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

Author: Yuan Xie, Cong Xu, Xiangyu Dong, and Norman P. Jouppi
Subjects: Magnetoresistive random-access memory, Hardware_MEMORYSTRUCTURES, Memory hierarchy, business.industry, CPU cache, Computer science, NAND gate, Computer Graphics and Computer-Aided Design, Resistive random-access memory, Non-volatile memory, Embedded system, Static random-access memory, Electrical and Electronic Engineering, business, Software, Computer hardware, Dram, Auxiliary memory
Abstract: Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies.
Published: 2012

20. MNSIM: Simulation Platform for Memristor-based Neuromorphic Computing System

Author: Pai-Yu Chen, Yu Cao, Yu Wang, Peng Gu, Lixue Xia, Tianqi Tang, Boxun Li, Yuan Xie, Shimeng Yu, and Huazhong Yang
Subjects: 010302 applied physics, Computer simulation, Artificial neural network, Computer science, Design space exploration, 02 engineering and technology, Memristor, 01 natural sciences, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, law.invention, Neuromorphic engineering, Memistor, Computer engineering, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Software
Abstract: Memristor-based computation provides a promising solution to boost the power efficiency of the neuromorphic computing system. However, a behavior-level memristor-based neuromorphic computing simulator, which can model the performance and realize an early stage design space exploration, is still missing. In this paper, we propose a simulation platform for the memristor-based neuromorphic system, called MNSIM. A hierarchical structure for memristor-based neuromorphic computing accelerator is proposed to provides flexible interfaces for customization. A detailed reference design is provided for large-scale applications. A behavior-level computing accuracy model is incorporated to evaluate the computing error rate affected by interconnect lines and nonideal device factors. Experimental results show that MNSIM achieves over 7000 times speed-up than SPICE simulation. MNSIM can optimize the design and estimate the tradeoff relationships among different performance metrics for users.
Published: 2017

21. Guest Editorial

Author: Yuan Xie and Gabriel Loh
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Yuan Xie"'

1. IronMan-Pro: Multiobjective Design Space Exploration in HLS via Reinforcement Learning and Graph Neural Network-Based Modeling

2. SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration

3. STPAcc: Structural TI-Based Pruning for Accelerating Distance-Related Algorithms on CPU-FPGA Platforms

4. Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training

5. Hardware-Enabled Efficient Data Processing With Tensor-Train Decomposition

6. Rescuing RRAM-Based Computing From Static and Dynamic Faults

7. DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads

8. Practical Attacks on Deep Neural Networks by Memory Trojaning

9. SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars

10. Efficient Super-Resolution System with Block-wise Hybridization and Quantized Winograd on FPGA

11. MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures

12. TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks

13. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing

14. Fabrication cost analysis and cost-aware design space exploration for 3-D ICs

15. Adapting <tex-math notation='LaTeX'>$\text{B}^{+}$ </tex-math> -Tree for Emerging Nonvolatile Memory-Based Main Memory

16. Optimizing the NoC Slack Through Voltage and Frequency Scaling in Hard Real-Time Embedded Systems

17. PS3-RAM: A Fast Portable and Scalable Statistical STT-RAM Reliability/Energy Analysis Method

18. Through Silicon Via Aware Design Planning for Thermally Efficient 3-D Integrated Circuits

19. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

20. MNSIM: Simulation Platform for Memristor-based Neuromorphic Computing System

21. Guest Editorial

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

21 results on '"Yuan Xie"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources