8 results on '"Je-Syu Liu"'
Search Results
2. A Neuromorphic Computing System for Bitwise Neural Networks Based on ReRAM Synaptic Array.
- Author
-
Pin-Yi Li, Cheng-Han Yang, Wei-Hao Chen, Jian-Hao Huang, Wei-Chen Wei, Je-Syu Liu, Wei-Yu Lin, Tzu-Hsiang Hsu, Chih-Cheng Hsieh, Ren-Shuo Liu, Meng-Fan Chang, and Kea-Tiong Tang
- Published
- 2018
- Full Text
- View/download PDF
3. A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors.
- Author
-
Cheng-Xin Xue, Wei-Hao Chen, Je-Syu Liu, Jia-Fang Li, Wei-Yu Lin, Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, Tsung-Yuan Huang, Hui-Yao Kao, Shih-Ying Wei, Yen-Cheng Chiu, Chun-Ying Lee, Chung-Chuan Lo, Ya-Chin King, Chorng-Jung Lin, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, and Meng-Fan Chang
- Published
- 2019
- Full Text
- View/download PDF
4. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices
- Author
-
Ta-Wei Liu, Meng-Fan Chang, Yu-Der Chih, Je-Syu Liu, Chin-Yi Su, Ting-Wei Chang, Shih-Ying Wei, Tsung-Yuan Huang, Cheng-Xin Xue, Wei-Chen Wei, Je-Min Hung, Chun-Ying Lee, Tai-Hsing Wen, Mon-Shu Ho, Yi-Ren Chen, Yen-Kai Chen, Kea-Tiong Tang, Yun-Chen Lo, Jing-Hong Wang, Sheng-Po Huang, Chou Chung-Cheng, Shih-Hsih Teng, Chung-Chuan Lo, Chih-Cheng Hsieh, Ren-Shuo Liu, Hui-Yao Kao, Yen-Cheng Chiu, and Tzu-Hsiang Hsu
- Subjects
Resistive touchscreen ,Edge device ,Computer science ,business.industry ,Process (computing) ,Energy consumption ,Electronic, Optical and Magnetic Materials ,Resistive random-access memory ,CMOS ,Electrical and Electronic Engineering ,Macro ,business ,Instrumentation ,Computer hardware ,Electronic circuit - Abstract
The development of small, energy-efficient artificial intelligence edge devices is limited in conventional computing architectures by the need to transfer data between the processor and memory. Non-volatile compute-in-memory (nvCIM) architectures have the potential to overcome such issues, but the development of high-bit-precision configurations required for dot-product operations remains challenging. In particular, input–output parallelism and cell-area limitations, as well as signal margin degradation, computing latency in multibit analogue readout operations and manufacturing challenges, still need to be addressed. Here we report a 2 Mb nvCIM macro (which combines memory cells and related peripheral circuitry) that is based on single-level cell resistive random-access memory devices and is fabricated in a 22 nm complementary metal–oxide–semiconductor foundry process. Compared with previous nvCIM schemes, our macro can perform multibit dot-product operations with increased input–output parallelism, reduced cell-array area, improved accuracy, and reduced computing latency and energy consumption. The macro can, in particular, achieve latencies between 9.2 and 18.3 ns, and energy efficiencies between 146.21 and 36.61 tera-operations per second per watt, for binary and multibit input–weight–output configurations, respectively. Commercial complementary metal–oxide–semiconductor and resistive random-access memory technologies can be used to create multibit compute-in-memory circuits capable of fast and energy-efficient inference for use in small artificial intelligence edge devices.
- Published
- 2020
5. Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors
- Author
-
Tung-Cheng Chang, Jing-Hong Wang, Je-Syu Liu, Chrong Jung Lin, Wei-En Lin, Ya-Chin King, Cheng-Xin Xue, Tsung-Yuan Huang, Chun-Ying Lee, Ren-Shuo Liu, Meng-Fan Chang, Wei-Hao Chen, Kea-Tiong Tang, Hui-Yao Kao, Wei-Yu Lin, Yen-Cheng Chiu, Jiafang Li, Ting-Wei Chang, Chih-Cheng Hsieh, and Wei-Chen Wei
- Subjects
Non-volatile memory ,business.industry ,Computer science ,Clamper ,Sense amplifier ,Circuit design ,Electrical and Electronic Engineering ,Macro ,business ,Computer hardware ,Resistive random-access memory - Abstract
Computing-in-memory (CIM) based on embedded nonvolatile memory is a promising candidate for energy-efficient multiply-and-accumulate (MAC) operations in artificial intelligence (AI) edge devices. However, circuit design for NVM-based CIM (nvCIM) imposes a number of challenges, including an area-latency-energy tradeoff for multibit MAC operations, pattern-dependent degradation in signal margin, and small read margin. To overcome these challenges, this article proposes the following: 1) a serial-input non-weighted product (SINWP) structure; 2) a down-scaling weighted current translator (DSWCT) and positive–negative current-subtractor (PN-ISUB); 3) a current-aware bitline clamper (CABLC) scheme; and 4) a triple-margin small-offset current-mode sense amplifier (TMCSA). A 55-nm 1-Mb ReRAM-CIM macro was fabricated to demonstrate the MAC operation of 2-b-input, 3-b-weight with 4-b-out. This nvCIM macro achieved $T_{\text {MAC}}= 14.6$ ns at 4-b-out with peak energy efficiency of 53.17 TOPS/W.
- Published
- 2020
6. 15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices
- Author
-
Jing-Hong Wang, Meng-Fan Chang, Yen-Kai Chen, Ta-Wei Liu, Cheng-Xin Xue, Tsung-Yuan Huang, Yi-Ren Chen, Hui-Yao Kao, Sheng-Po Huang, Yun-Chen Lo, Tzu-Hsiang Hsu, Chih-Cheng Hsieh, Chung-Chuan Lo, Je-Syu Liu, Tai-Hsing Wen, Wei-Chen Wei, Ren-Shuo Liu, Ting-Wei Chang, Shih-Ying Wei, and Kea-Tiong Tang
- Subjects
Physics ,Artificial neural network ,Edge device ,Sense amplifier ,Binary number ,Biasing ,Macro ,Topology ,Resistive random-access memory ,Voltage - Abstract
Nonvolatile computing-in-memory (nvCIM) can improve the latency (t AC ) and energy-efficiency (EF MAC ) of tiny AI edge devices performing multiply-and-accumulate (MAC) computing after system wake-up. Prior nvCIMs have proven effective for binary input (IN) and weight (W), and 3b output (OUT) [1], 1-8-1b IN-W-OUT [2], and 2-3-4b IN-W-OUT [3] neural networks; however, the higher precision (4-4b IN-W) for MAC operations is needed for multi-bit CNNs to achieved high-inference accuracy [4]. As Fig.15.4.1 shows, improving the precision of nvCIM macros involves various challenges. (1) A large number of activated WLs provides a wide range of BL current (I BL ) resulting in an inaccurate BL-clamping voltage (V BLC ); as well as a large (I BL ) requiring a large array area due to the need for wide metal lines to support high-current density. (2) Previous “WL = input” approaches suffer from: (a) few parallel inputs (IN#) due to (1), and (b) long (t AC ) in multiple cycles of binary WL inputs on 1T1R cells for multibit inputs. (3) Previous positive-negative-split weight-mapping consumes high total (l BL ) and area overhead (needing 2x(m-1) cells for a signed m-bit weight) for cell arrays with high-weight precision. (4) Long (t AC ) and a large number of reference currents (IREF#) for high-precision outputs. To overcome these challenges, this work proposes: (1) a BL-IN-OUT multibit computing (BLIOMC) scheme using a single WL-on and input-aware multibit BL clamping (IA-MBC) to shorten (l BL ) for multibit inputs, increase IN#, and reduce the (l BL ) range/size for accurate (V BLC ) and a compact array area. (2) Scrambled 2's complement (S2C) weight mapping (S2CWM), input-aware source-line (SL) voltage biasing (IA-SLVB), and an S2C value combiner (S2CVC) to reduce area overhead and l BL in the cell array. (3) A dual-bit small-offset current-mode sense amplifier (DbSO-CSA) to reduce IREF# and t AC . A fabricated 22nm 2Mb ReRAM-CIM macro presents the first 4b-input nvCIM macro, featuring a 9.8-18.3ns t AC and an EF MAC of 121.3-28.9TOPS/W from binary to 4bIN-4bW-11bOUT compute precisions.
- Published
- 2020
7. 24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors
- Author
-
Je-Syu Liu, Wei-En Lin, Chorng-Jung Lin, Jing-Hong Wang, Wei-Chen Wei, Wei-Hao Chen, Meng-Fan Chang, Cheng-Xin Xue, Chun-Ying Lee, Tung-Cheng Chang, Ren-Shuo Liu, Wei-Yu Lin, Hui-Yao Kao, Ting-Wei Chang, Shih-Ying Wei, Jiafang Li, Chih-Cheng Hsieh, Kea-Tiong Tang, Ya-Chin King, Tsung-Yuan Huang, Yen-Cheng Chiu, and Chung-Chuan Lo
- Subjects
Computer science ,business.industry ,Sense amplifier ,020208 electrical & electronic engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Resistive random-access memory ,Non-volatile memory ,Subtractor ,0202 electrical engineering, electronic engineering, information engineering ,Latency (engineering) ,Macro ,business ,Computer hardware - Abstract
Embedded nonvolatile memory (NVM) and computing-in-memory (CIM) are significantly reducing the latency (t MAC ) and energy consumption (E MAC ) of multiply- and-accumulate (MAC) operations in artificial intelligence (AI) edge devices [1, 2]. Previous ReRAM CIM macros demonstrated MAC operations for lb-input, ternary- weighted, 3b-output CNNs [1] or lb-input, 8b-weighted, 1b-output fully-connected networks with limited accuracy [2]. To support higher-accuracy convolution neural network heavy applications NVM-CIM should support multibit inputs/weights and multi-bit output (MAC-OUT) for CNN operations. One way to achieve multibit weights is to use a multi-level ReRAM cell to store the weight. However, as shown in Fig. 24.1.1, multibit ReRAM CIM faces several challenges. (1) a tradeoff between area and speed for multibit input/weight/MAC-OUT MAC operations; (2) sense amplifier’s high input offset, large area, and high parasitic load on the read-path due to large BL currents (I BL ) from multibit MAC; (3) limited accuracy due to a small read/sensing margin (I SM ) across MAC-OUT or variation in cell resistance (particularly MLC cells). To overcome these challenges, this work proposes, (1) a serial-input non-weighted product (SINWP) structure to optimize the tradeoff between area, t MAC and E MAC , (2) a down-scaling weighted current translator (DSWCT) and positive-negative current- subtractor (PN-ISUB) for short delay, a small offset and a compact read-path area; and (3) a triple-margin small-offset current-mode sense amplifier (TMCSA) to tolerate a small I SM . A fabricated 55nm 1Mb ReRAM-CIM macro is the first ReRAM CIM macro to support CNN operations using multibit input/weight MAC-OUT. This device achieves the shortest CIM-MAC-access time (t AC ) among existing ReRAM-CIMs (t MAC =14.6ns with 2b-input, 3b-weight with 4b-MAC-OUT) and the best peak E MAC of 53.17 TOPS/W (in binary mode).
- Published
- 2019
8. A Neuromorphic Computing System for Bitwise Neural Networks Based on ReRAM Synaptic Array
- Author
-
Meng-Fan Chang, Jian-Hao Huang, Wei-Chen Wei, Wei-Yu Lin, Je-Syu Liu, Wei-Hao Chen, Chih-Cheng Hsieh, Ren-Shuo Liu, Kea-Tiong Tang, Cheng-Han Yang, Pin-Yi Li, and zu Hsiang Hsu
- Subjects
Artificial neural network ,Computer science ,Sense amplifier ,business.industry ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Resistive random-access memory ,Neuromorphic engineering ,Gate array ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Field-programmable gate array ,Bitwise operation ,Computer hardware ,MNIST database ,0105 earth and related environmental sciences - Abstract
Recent advances in neuromorphic computing system have shown resistive random-access memory (ReRAM) can be used to efficiently implement compact parallel computing arrays, which are inherently suitable for neural networks that require large amounts of matrix-vector multiplications (MVMs). In this work, we proposed a neuromorphic computing system based on ReRAM synaptic array to implement bitwise neural networks. The system contains a ReRAM synaptic array for parallel computation of bitwise MVMs, and a field-programmable gate array for data buffering and processing. To deploy the network on the system, a customized training scheme was required to adapt the trained network to the characteristic of ReRAM synaptic array with bitwise weights and inputs. We also managed the resolution of partial sum to reduce the bit width requirement of sense amplifier, thereby reducing power consumption. The measurement results show that the ReRAM synaptic array consumed only 0.27mW at 1V supply by using 1-bit sense amplifier while the system still maintained 97.52% accuracy on MNIST dataset.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.