Author: "Zhe Yuan" / Topic: computer science - Searchworks@Jio Institute Digital Library Search Results

1. A 65-nm Energy-Efficient Interframe Data Reuse Neural Network Accelerator for Video Applications

Author: Wenyu Sun, Yixiong Yang, Yue Jinshan, Zhe Yuan, Yongpan Liu, Ruoyang Liu, Jingyu Wang, Zhuqing Yuan, Huazhong Yang, Xiaoyu Feng, and Xueqing Li
Subjects: Artificial neural network, Computer engineering, Computer science, Frame (networking), Inter frame, Energy consumption, Electrical and Electronic Engineering, Convolutional neural network, Efficient energy use, Convolution, Sparse matrix
Abstract: An energy-efficient convolutional neural network (CNN) accelerator is proposed for the video application. Previous works exploited the sparsity of differential (Diff) frame activation, but the improvement is limited as many Diff-frame data is small but non-zero. Processing of irregular sparse data also leads to low hardware utilization. To solve these problems, two key innovations are proposed in this article. First, we implement a hybrid-precision inter-frame-reuse architecture which takes advantage of both low bit-width and high sparsity of Diff-frame data. This technology can accelerate 3.2x inference speed with no accuracy loss. Second, we design a conv-pattern-aware processing array that achieves the 2.48x-14.2x PE utilization rate to process sparse data for different convolution kernels. The accelerator chip was implemented in 65-nm CMOS technology. To the best of our knowledge, it is the first silicon-proven CNN accelerator that supports inter-frame data reuse. Attributed to the inter-frame similarity, this video CNN accelerator reaches the minimum energy consumption of 24.7 μJ/frame in the MobileNet-slim model, which is 76.3% less than the baseline.
Published: 2022
Full Text: View/download PDF

2. On-Chip Cascaded Bandpass Filter and Wavelength Router Using an Intelligent Algorithm

Author: Jing Li, Hongyi Yuan, Xiaoyong Hu, Lifeng Ma, Shuai Feng, Zhe Yuan, and Cuicui Lu
Subjects: Router, Computer science, on-chip integration, Bandwidth (signal processing), Optical communication, QC350-467, Optics. Light, Chip, Atomic and Molecular Physics, and Optics, TA1501-1820, Transmission (telecommunications), Band-pass filter, Simulated annealing, genetic algorithm, Applied optics. Photonics, System on a chip, Bandpass filter, wavelength router, Electrical and Electronic Engineering, Algorithm
Abstract: Cascaded nanophotonic devices play a vital role in all-optical connection, all-optical computation and all-optical network. However, there is almost no effective method for the direct design of on-chip cascaded nanophotonic devices, since current study of nanophotonic devices mostly focuses on single device. Here, on-chip cascaded nanophotonic devices are designed based on an intelligent algorithm by combining genetic algorithm, simulated annealing algorithm and finite element method for the first time, and verified experimentally by using silicon-based planar structures. The cascaded devices consist of a bandpass filter and a wavelength router operating in optical communication range. The operation bandwidth of the bandpass filter is 408 nm with transmission more than 80%, within which the communication wavelengths of 1,300 nm and 1,550 nm are routed into different output ports through the wavelength router component. The footprint is only 3.62 μm2 for the bandpass filter and only 2.56 μm2 for the wavelength router, which are easy for integration with planar structures and ultrasmall size. This work provides a highly-efficient scheme for the realization of on-chip cascaded nanophotonic devices on the same chip, and lays a foundation for the realization of photonic chip based on intelligent algorithm.
Published: 2021
Full Text: View/download PDF

3. STICKER-T: An Energy-Efficient Neural Network Processor Using Block-Circulant Algorithm and Unified Frequency-Domain Acceleration

Author: Jinshan Yue, Huazhong Yang, Zhe Yuan, Yung-Ning Tu, Xueqing Li, Ao Ren, Wenyu Sun, Meng-Fan Chang, Yi-Ju Chen, Ruoyang Liu, Yanzhi Wang, and Yongpan Liu
Subjects: Artificial neural network, CMOS, Computer science, Frequency domain, Fast Fourier transform, Overhead (computing), Enhanced Data Rates for GSM Evolution, Electrical and Electronic Engineering, Algorithm, Circulant matrix, Efficient energy use, Block (data storage)
Abstract: The emerging edge intelligence requires low-cost energy-efficient neural network (NN) processors. Supporting various types of edge NN models leads to extra circuit overhead. Designing a unified NN processor with high energy/area efficiency is challenging. This work presents a frequency-domain-accelerated unified NN processor, named STICKER-T. It combines algorithm, architecture, and circuit-level optimization to achieve high energy/area efficiency. By utilizing the block-circulant NN (CirCNN) algorithm, this work supports frequency-domain acceleration and a unified workflow for convolutional, fully connected, and recurrent NN (CNN/FC/RNN). Three key innovations are proposed. First, a block-circulant-accelerated chip architecture is implemented to support unified CNN/FC/RNN workflow. Second, a multi-bit 8-128-point global-parallel local-bit-serial fast Fourier transform (FFT) module is designed for efficient high-throughput FFT/inverse FFT (IFFT) operation. Third, by utilizing a 6T hierarchical-bitline-switching transpose-SRAM (HBST-TRAM), 2-D data reuse is enabled in the proposed multi-bit frequency-domain multiply–accumulate (MAC) array. STICKER-T was fabricated in a 65-nm CMOS technology. It can operate at 0.54–1.15 V and 25–200 MHz with 13.3–339-mW power consumption. The peak energy efficiency achieves 140.3 TOPS/W. It shows 8.1 $\times $ area efficiency and 4.2 $\times $ energy efficiency at 4-bit precision compared with the state-of-the-art reconfigurable NN processor.
Published: 2021
Full Text: View/download PDF

4. State Recognition of Surface Discharges by Visible Images and Machine Learning

Author: Zhe Yuan, Ziqing Guo, Qizheng Ye, and Yuwei Wang
Subjects: Surface (mathematics), Artificial neural network, business.industry, Computer science, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Division (mathematics), Fault (power engineering), Machine learning, computer.software_genre, Image (mathematics), Artificial intelligence, Electrical and Electronic Engineering, Cluster analysis, business, Instrumentation, computer
Abstract: To solve the problems of low resolution, weak antinoise performance, and imprecise fault location in discharge recognition of ultraviolet imaging, an intelligent state recognition method of surface discharge based on visible images and machine learning (ML) is proposed in this article. A visible image library of surface discharge (ac) under different experimental conditions was established, and the chromatic, gray-scale, and morphological features of images were extracted. The image library was divided into four stages by clustering. Combined with spectrum correlation experiments, the physical meaning of the division results was explained. Supervised learning algorithms, including classic algorithms and deep learning, were used to train intelligent recognition models. The results showed that models based on chromatic features had a higher accuracy for recognition; the accuracy of artificial neural networks reached 0.982, which was obviously higher than that of classical learning algorithms (0.886). The results show that visible images can be used for the state recognition of surface discharges effectively. The application of ML ensures high efficiency and accuracy. This method will enable state recognition and fault location to be completed simultaneously.
Published: 2021
Full Text: View/download PDF

5. STICKER: An Energy-Efficient Multi-Sparsity Compatible Accelerator for Convolutional Neural Networks in 65-nm CMOS

Author: Huazhong Yang, Jinshan Yue, Zhe Yuan, Yixiong Yang, Xiaoyu Feng, Xueqing Li, Yongpan Liu, Jian Zhao, and Jingyu Wang
Subjects: Computer science, Computation, 020208 electrical & electronic engineering, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, Electrical and Electronic Engineering, Algorithm, Convolutional neural network, Storage efficiency, Convolution, Efficient energy use
Abstract: STICKER is an energy-efficient convolutional neural network (NN) processor. It mainly improves energy efficiency by making full use of sparsity. The network sparsity can potentially lower storage and computation requirements. However, the sparsity distribution of both activations and weights ranges from 2% to 99% in different layers or models. Therefore, good support for the sparsity distribution is the key to improve the energy efficiency. Three new features are proposed in this article to support wide sparsity distribution efficiently. First, multi-sparsity control and data flow are implemented for finer sparsity granularity support. It can automatically switch the processor among nine sparsity modes for higher energy efficiency. Second, a multi-mode hierarchical data memory which can be reconfigured for networks with different sparsity modes is designed for higher storage efficiency. Third, a multi-sparsity-compatible set-associative convolution processing element (PE) array is designed to efficiently carry out convolution operations under different sparsity modes, especially when both activations and weights are sparse. STICKER was implemented in a 65-nm CMOS technology. With its wide-range sparsity-supported capacity, the peak energy efficiency reaches 62.1 TOPS/W when sparsity ratios of both activations and weights are 5%. In a completely pruned Alexnet model, STICKER achieves 2.82 TOPS/W energy efficiency 1.8 $\times $ higher than that of the state-of-the-art processors.
Published: 2020
Full Text: View/download PDF

6. Green content sharing mode: D2D Coordination Multiple Points Transmission

Author: Zhe Yuan, Wenqin Zhuang, Xuguang Zhang, and Jianxin Chen
Subjects: Matching (statistics), Transmission delay, Computer Networks and Communications, business.industry, Computer science, Network packet, 020206 networking & telecommunications, 02 engineering and technology, Energy consumption, Hardware and Architecture, Distributed algorithm, 0202 electrical engineering, electronic engineering, information engineering, Cellular network, 020201 artificial intelligence & image processing, business, Greedy algorithm, Mobile device, Software, Computer network
Abstract: In next-generation 5G cellular networks, device-to-device (D2D) communication has emerged as an effective solution to support the growing popularity of multimedia contents for local service. Conventionally, the D2D content sharing mode is “one-to-one” matching, i.e, one demander will select one provider to request files from it. Under this mode, it is hard to cope with the growing demand for multimedia services for mobile users due to limited battery capacity for mobile devices. In this work, we propose an energy-efficient content sharing system via a novel D2D Coordination Multiple Points Transmission (D2D-CoMP), which shares content among multiple users to reduce the power consumption per user device. The highlights of this work lie in three parts. Firstly, the strategy for matching providers to demanders subject to self-interference constraints is formulated as a classical maximum weighted matching problem, in which the optimal solution can be derived when network-wide information is known, and also an effective distributed algorithm. Secondly, we design an optimal packet split algorithm for D2D-CoMP under comprehensive consideration of two aspects of communication efficiency and energy consumption to solve the problem how many data packets each provider transfers. Thirdly, we model the file reconstruction problem of collaboration demanders as the shortest Hamiltonian path problem and illustrate the file reconstruction process. Moreover, we develop a best-effort distributed greedy algorithm framework to find the shortest file reconstruction path. Importantly, numerical results demonstrate that the proposed mechanism can greatly reduce the energy consumption of each device with little or no increase in transmission delay.
Published: 2019
Full Text: View/download PDF

7. 15.2 A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating

Author: Yen-Lin Chung, Meng-Fan Chang, Jiaxin Liu, Zhe Yuan, Jinshan Yue, Yongpan Liu, Jian-Wei Su, Yifan He, Nan Sun, Mingtao Zhan, Yipeng Wang, Ping-Chun Wu, Yuxuan Huang, Huazhong Yang, Li-Yang Hung, Xiaoyu Feng, and Xueqing Li
Subjects: Electronic system-level design and verification, Power gating, Artificial neural network, Computer engineering, Edge device, Computer science, System on a chip, Static random-access memory, Macro, Block (data storage)
Abstract: Computing-in-memory (CIM) is an attractive approach for energy-efficient neural network (NN) processors, especially for low-power edge devices. Previous CIM chips [1] –[5] have demonstrated macro and system-level design enabling multi-bit operations and sparsity support. However, several challenges exist, as shown in Fig. 15.2.1. First, though a previously proposed block-wise sparsity strategy [5] can power off ADCs, zeros still contributed to storage requirements, and power gating was not applied to computing resources. Second, on-chip SRAM CIM macros are not large enough to hold all weights. Updating weights between computing operations leads to significant performance loss. Finally, the limited sensing margin incurs poor accuracy for large NN models on practical datasets, such as ImageNet. The precision and power of the ADCs should be optimized and adjusted.
Published: 2021
Full Text: View/download PDF

8. Rolling bearing fault diagnosis based on adaptive smooth ITD and MF-DFA method

Author: Zhe Yuan, Mihai Alin Pop, Tingting Peng, Daniel Cristea, and Dong An
Subjects: Acoustics and Ultrasonics, Computer science, lcsh:Control engineering systems. Automatic machinery (General), lcsh:QC221-246, 02 engineering and technology, Fault (power engineering), Signal, law.invention, lcsh:TJ212-225, law, 0202 electrical engineering, electronic engineering, information engineering, Feature set, Civil and Structural Engineering, Bearing (mechanical), business.industry, Mechanical Engineering, Bearing vibration, 020208 electrical & electronic engineering, Pattern recognition, Building and Construction, Geophysics, Mechanics of Materials, lcsh:Acoustics. Sound, Detrended fluctuation analysis, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: To effectively utilize a feature set to further improve fault diagnosis of a rolling bearing vibration signal, a method based on multi-fractal detrended fluctuation analysis (MF-DFA) and smooth intrinsic time-scale decomposition (SITD) was proposed. The vibration signal was decomposed into several proper rotation components by applying this new SITD method to overcome noise effects, preserve the effective signal, and improve the signal-to-noise ratio. Wavelet analysis was embedded in iteration procedures of intrinsic time-scale decomposition (ITD). For better results, an adaptive threshold function was used for signal recovery from noisy proper rotation components in the wavelet domain. Additionally, MF-DFA was used to reveal the multi-fractality present in the instantaneous amplitude of the proper rotation components. Finally, linear local tangent space alignment was applied for feature dimension reduction and to obtain fault characteristics of different types, further improving identification accuracy. The performance of the proposed method is determined to be superior to that of the ITD-MF-DFA method.
Published: 2020

9. Intelligent Recognition of Corona Discharges by Visible Images

Author: Mengting Han, Zhenpeng Tang, Yanbin Xu, Qizheng Ye, Zhe Yuan, and Du Wenjiao
Subjects: Feature data, Computer science, business.industry, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Filter (signal processing), Unsupervised learning, RGB color model, Noise (video), Artificial intelligence, business, Image resolution, Corona discharge
Abstract: This paper proposed a corona discharge intelligent diagnosis method based on visible images and machine learning. Through multiple groups corona discharge experiments, a visible image library containing corona discharge under different gap distances was established. The color, morphological, and gray features of the image were extracted to form 7-dimensional feature data. At the same time, the RGB gray level histogram (RGB-GLHs) and the gray level histogram (GLHs) were extracted as contrast features. Unsupervised learning method was used to divide the image library according to the severities; then supervised learning algorithms were applied to train intelligent diagnosis models, which were able to identify the discharge stage corresponding to the image automatically. The recognition results showed that the 7-dimensional feature data had a higher recognition accuracy than RGB-GLHs and GLHs. This showed that the 7-dimensional feature data could effectively cover the discharge information contained in original images, and filter out useless information and noise, improving recognition accuracy. The use of high-resolution visible images solves the problems of low spatial resolution and severe noise interference in ultraviolet imaging diagnosis method, and enables the state recognition and fault location to be finished at the same time. The application of machine learning eliminated the interference of human subjective factors in traditional diagnosis methods, greatly improving efficiency and accuracy of diagnoses.
Published: 2020
Full Text: View/download PDF

10. A 112-765 GOPS/W FPGA-based CNN Accelerator using Importance Map Guided Adaptive Activation Sparsification for Pix2pix Applications

Author: Huazhong Yang, Wenyu Sun, Chen Tang, Zhuqing Yuan, Zhe Yuan, and Yongpan Liu
Subjects: business.industry, Computer science, Importance map, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Power (physics), Acceleration, External data, Transmission (telecommunications), Power consumption, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Field-programmable gate array, Computer hardware, 0105 earth and related environmental sciences, Efficient energy use
Abstract: This paper proposes an algorithm and hardware co-design methodology to accelerate CNNs for pix2pix tasks. An importance map is introduced to train an activation-sparse CNN model, which can effectively reduce the computing cost and external data transmission. Moreover, the model also supports sparse controlling by means of the importance map, making it adaptive for applications with different precision/power requirements. An FPGA-based accelerator with adaptive sparse controlling is designed to support such importance map guided activation sparsity, and demonstrated for super-resolution (SR) application as an example. The accelerator shows advantages in both model accuracy and power consumption. It achieves up to 765 GOPS/W energy efficiency, which is 5.28× batter than the previous FPGA-based SR accelerator.
Published: 2020
Full Text: View/download PDF

11. A Temperature Identification Method Based on Chromaticity Statistical Features of Raw Format Visible Image and K-nearest Neighbor Algorithm

Author: Zhe Yuan, Qizheng Ye, Yang He, and Wenmao Li
Subjects: Pixel, Infrared, Computer science, business.industry, Feature vector, Feature extraction, Pattern recognition, Filter (signal processing), Fault (power engineering), k-nearest neighbors algorithm, Computer Science::Computer Vision and Pattern Recognition, Artificial intelligence, Chromaticity, business
Abstract: Temperature monitoring is important to ensure the safe operation of power grid. The fault temperature is generally in the normal range; therefore, infrared detection is generally used. In this paper, the chromaticity information of raw format visible images of aluminum plate is studied. First, establish image library of aluminum plate at different temperatures, extract gray values of R, G, and B components of images according to the pixel arrangement of filter, and calculate gray frequency to obtain the gray frequency distribution. Then the statistical features of the gray frequency distribution are calculated and selected by Fisher discrimination. Finally, the selected features are combined into input feature vector, and the KNN algorithm is used for temperature identification. The results show that the accuracy of temperature prediction model is about 1.1 °C. The above results provide a new technical route for detecting normal temperature using visible image information.
Published: 2020
Full Text: View/download PDF

12. RL Based Network Accelerator Compiler for Joint Compression Hyper-Parameter Search

Author: Jinshan Yue, Yongpan Liu, Xiaoyu Feng, Huazhong Yang, and Zhe Yuan
Subjects: Quantization (physics), Computer science, Compression (functional analysis), Reinforcement learning, Energy consumption, Pruning (decision trees), Joint (audio engineering), Algorithm, Computer Science::Databases, Energy (signal processing), Efficient energy use
Abstract: Although compression techniques like pruning or quantization are beneficial for accelerators' energy efficiency, the large search space makes finding the appropriate compression scheme difficult. Besides, most existing works ignore the combination of both pruning and quantization. In this paper, we propose a reinforcement learning (RL) based joint compression framework to find the appropriate pruning ratio and quantization bit-width for accelerators. By interacting with the energy model of the target accelerator, the RL agent can learn the effect of compression scheme on both accuracy and energy efficiency. Through a long trial-and-error process, the agent can finally reach an optimal trade-off between accuracy and energy efficiency. Compared with control groups whose compression hyper-parameters are not jointly optimized, the proposed framework can achieve at least 25% energy reduction with higher accuracy or much higher accuracy with small disadvantages on energy. Compared with 8-bit quantized baseline, the framework can achieve 90% and 85% energy reduction on Cifar10 and Cifar100 respectively.
Published: 2020
Full Text: View/download PDF

13. Multi-channel precision-sparsity-adapted inter-frame differential data codec for video neural network processor

Author: Zhuqing Yuan, Fanyang Cheng, Zhe Yuan, Yongpan Liu, Fang Su, Yixiong Yang, and Huazhong Yang
Subjects: Artificial neural network, Computer science, business.industry, 020208 electrical & electronic engineering, Inter frame, Data compression ratio, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, ENCODE, 020202 computer hardware & architecture, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Codec, business, Computer hardware, Communication channel, Coding (social sciences)
Abstract: Activation I/O traffic is a critical bottleneck of video neural network processor. Recent works adopted an inter-frame difference method to reduce activation size. However, current methods can't fully adapt to the various precision and sparsity in differential data. In this paper, we propose the multi-channel precision-sparsity-adapted codec, which will separate the differential activation and encode activation in multiple channels. We analyze the most adapted encoding of each channel, and select the optimal channel number with the best performance. A two-channel codec hardware has been implemented in the ASIC accelerator, which can encode/decode activations in parallel. Experiment results show that our coding achieves 2.2x-18.2x compression rate in three scenarios with no accuracy loss, and the hardware has 42x/174x improvement on speed and energy-efficiency compared with the software codec.
Published: 2020
Full Text: View/download PDF

14. High PE Utilization CNN Accelerator with Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

Author: Xueqing Li, Zhuqing Yuan, Zhe Yuan, Jinshan Yue, Huazhong Yang, Songming Yu, Jingyu Wang, and Yongpan Liu
Subjects: 010302 applied physics, Artificial neural network, Computer science, 02 engineering and technology, Huffman coding, 01 natural sciences, 020202 computer hardware & architecture, Convolution, Power (physics), Reduction (complexity), symbols.namesake, Computer engineering, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, symbols, Pruning (decision trees), Communication channel
Abstract: Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of sparse CNN is not efficient. Furthermore, the performance of existing accelerators suffers from no-load PEs on sparse networks. This work proposes a software-hardware co-design to address these problems. The software includes an ADMM-based method which compresses the patterns of convolution kernels with acceptable accuracy loss, and a Huffman encoding method which reduces index storage overhead. The hardware is a fusion-enabled systolic architecture, which can reduce PEs’ no-load rate and improve performance by supporting the channel fusion. On CIFAR-10, this work achieves 5.63x index storage reduction with 2-7 patterns among different layers with 0.87% top-1 accuracy loss. Compared with the state-of-art accelerator, this work achieves 1.54x-1.79x performance and 25%-34% reduction of no-load rate with reasonable area and power overheads.
Published: 2020
Full Text: View/download PDF

15. Robotic mobile fulfilment systems considering customer classes

Author: Zhe Yuan, Yeming Gong, Mingzhou Jin, emlyon business school, and business school, emlyon
Subjects: 0209 industrial biotechnology, Order picking, 021103 operations research, order picking, Computer science, Warehouse management, Strategy and Management, 0211 other engineering and technologies, 02 engineering and technology, Management Science and Operations Research, warehouse management, [SHS.ECO]Humanities and Social Sciences/Economics and Finance, Industrial engineering, Industrial and Manufacturing Engineering, Supply Chain Delivery, robotic mobile fulfilment systems, 020901 industrial engineering & automation, Order (business), Robot, [SHS.GESTION]Humanities and Social Sciences/Business administration, [SHS.ECO] Humanities and Social Sciences/Economics and Finance, [SHS.GESTION] Humanities and Social Sciences/Business administration, facility planning and design
Abstract: International audience; This paper studies a Robotic Mobile Fulfilment System (RMFS), featured by a number of robots lifting and transporting movables storage shelves from storage grids to order pickers. In such systems, online retailers often classify their customers by two major classes ‘expedited shipping’ and ‘standard shipping’. We build high-dimension Markov models to describe this system with customer classes, calculate the throughput of this system given the number of robots and provide design rules to determine the optimal number of robots and their capacities considering the trade-off between capacities of picker stations and robots. We verify the analytic results of Markov models with simulations. We further consider multiple-picker RMFS and study its optimal design.
Published: 2020

16. A 3.77TOPS/W Convolutional Neural Network Processor With Priority-Driven Kernel Optimization

Author: Jinshan Yue, Zhe Yuan, Chengmo Yang, Qingwei Guo, Huazhong Yang, Zhibo Wang, Jinyang Li, and Yongpan Liu
Subjects: Adder, Contextual image classification, Computer science, 020208 electrical & electronic engineering, 02 engineering and technology, Parallel computing, Convolutional neural network, Kernel (image processing), Application-specific integrated circuit, Convolutional code, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, System on a chip, Electrical and Electronic Engineering, Efficient energy use
Abstract: Convolutional neural network (CNN) has become very popular in image classification tasks. With the increasing demand on intelligent classification on battery-powered devices, energy-efficient ASICs for CNN are badly needed. While previous CNN ASIC processors support operations of different kernel sizes, they sacrifice efficiency to support flexible convolution operations. In fact, convolution operations with a certain kernel size are dominating in many real-case CNNs. This brief proposes a kernel-optimized architecture for $3\,{\times }\,3$ kernels (KOP3), which are dominating operations in mainstream image classification CNNs. Although KOP3 aims at $3\,{\times }\,3$ kernel operations, it also provides programmability to support arbitrary kernel sizes. KOP3 achieves average energy efficiency of 3.77TOPS/W, which is $4.01{ \times }$ better than the best state-of-the-art CNN ASIC processor.
Published: 2019
Full Text: View/download PDF

17. Anticipative Tracking with the Short-Term Synaptic Plasticity of Spintronic Devices

Author: Zhe Yuan, Qi Zheng, Yuanyuan Mi, Ke Xia, and Xiaorui Zhu
Subjects: Data processing, Elementary cognitive task, Quantitative Biology::Neurons and Cognition, Computer science, Computation, FOS: Physical sciences, General Physics and Astronomy, Applied Physics (physics.app-ph), Disordered Systems and Neural Networks (cond-mat.dis-nn), 02 engineering and technology, Physics - Applied Physics, Condensed Matter - Disordered Systems and Neural Networks, 021001 nanoscience & nanotechnology, Tracking (particle physics), 01 natural sciences, Term (time), Synaptic weight, Neuromorphic engineering, 0103 physical sciences, Synaptic plasticity, Electronic engineering, 010306 general physics, 0210 nano-technology
Abstract: Real-time tracking of high-speed objects in cognitive tasks is challenging in the present artificial intelligence techniques because the data processing and computation are time-consuming resulting in impeditive time delays. A brain-inspired continuous attractor neural network (CANN) can be used to track quickly moving targets, where the time delays are intrinsically compensated if the dynamical synapses in the network have the short-term plasticity. Here, we show that synapses with short-term depression can be realized by a magnetic tunnel junction, which perfectly reproduces the dynamics of the synaptic weight in a widely applied mathematical model. Then, these dynamical synapses are incorporated into one-dimensional and two-dimensional CANNs, which are demonstrated to have the ability to predict a moving object via micromagnetic simulations. This portable spintronics-based hardware for neuromorphic computing needs no training and is therefore very promising for the tracking technology for moving targets., 9 pages, 6 figures
Published: 2020

18. PATH: Performance-Aware Task Scheduling for Energy-Harvesting Nonvolatile Processors

Author: Jinshan Yue, Jinyang Li, Chenchen Fu, Yongpan Liu, Xiaoyu Feng, Chun Jason Xue, Hehe Li, Jingtong Hu, Huazhong Yang, and Zhe Yuan
Subjects: Job shop scheduling, Linear programming, Computer science, business.industry, 020208 electrical & electronic engineering, Processor scheduling, 02 engineering and technology, 020202 computer hardware & architecture, Scheduling (computing), Non-volatile memory, Hardware and Architecture, Embedded system, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Electrical and Electronic Engineering, Standby power, business, Energy harvesting, Integer programming, Software
Abstract: Nonvolatile processors (NVPs) have strong vitality in battery-less energy-harvesting sensor nodes (EHSNs) due to their characteristics of zero standby power, resilience to power failures, and fast read/write operations. However, I/O and sensing operations cannot store their system states after power OFF; hence, they are sensitive to power failures and high power switching overhead is induced during power oscillation, which significantly degrades the system performance. In this paper, we propose a novel performance-aware task scheduling technique considering power switching overhead for energy-harvesting NVPs. We first present the analysis of the power switching overhead on EHSNs. Then, the scheduling problem is formulated by mixed-integer linear programming (MILP). Furthermore, offline and online performance-aware heuristic scheduling algorithms with the task splitting (TS) strategy are proposed to solve the scheduling problem efficiently. Experimental results show that comparing with the state-of-the-art energy-harvesting oblivious scheduling strategy, the proposed MILP scheduling approach can improve the performance by 16% on average, and the proposed scheduling algorithm with the TS strategy can reduce the average execution time by 24.8% and 22.5%.
Published: 2018
Full Text: View/download PDF

19. Relational variable for more accurate prediction of models

Author: Ruinan Yang, Qi Zhang, Liangxiao Zhang, Zhe Yuan, Peiwu Li, and Jin Mao
Subjects: 0301 basic medicine, Computer science, Feature selection, Machine learning, computer.software_genre, 01 natural sciences, Analytical Chemistry, 03 medical and health sciences, Data acquisition, Software, Preprocessor, MATLAB, Spectroscopy, computer.programming_language, Valuation (algebra), business.industry, Process Chemistry and Technology, 010401 analytical chemistry, 0104 chemical sciences, Computer Science Applications, Nonlinear system, Variable (computer science), 030104 developmental biology, Artificial intelligence, business, computer
Abstract: In natural science, models could grant new insights into phenomena or scientific problems which are hard to be observed or otherwise explained to overcome the limitations of human beings. Routinely, scientists strive to develop new methods for data acquisition, preprocessing, variable selection, modeling and valuation with the help of statistics and machine learning theories. Theoretically, the aim of these methods is global or local optimization in the space of variables and linear/nonlinear combinations for classification or regression. However, the relationships between responses and features are often complex and therefore sometimes far from linear or fixed nonlinear model. In this study, we proposed the relational variable (e.g. ratio between two variables) for more accurate prediction performance of models and illustrated its application on three classic data. We found that the selected relational variables could significantly improve the accuracy of prediction. The software was complemented on the MATLAB R2015a platform in Windows Server 2012 R2 standard. The Matlab codes used in this study are publicly available at http://www.libpls.net .
Published: 2018
Full Text: View/download PDF

20. Mode division multiple access: a new scheme based on orbital angular momentum in millimetre wave communications for fifth generation

Author: Hikmet Sari, Fa Jiang, Zhe Yuan, Lei Wang, Guan Gui, and Jie Yang
Subjects: Angular momentum, Computer science, Frequency band, Degrees of freedom (statistics), Phase (waves), 020206 networking & telecommunications, 02 engineering and technology, Division (mathematics), Communications system, Topology, 01 natural sciences, Electromagnetic radiation, Computer Science Applications, Orthogonality, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, 010306 general physics
Abstract: Compared with the conventional degrees of freedom, the orbital angular momentum (OAM), which describes the helical phase structure of electromagnetic wave, provides a new degree of freedom. As a new multiple access scheme, mode division multiple access (MDMA) is constructed in millimetre wave frequency band utilising the orthogonality and high dimensionality in this study. Various traditional resources such as frequency, time and code pattern have been shared. Therefore, addresses of signals from different terminal users can be distinguished by OAM mode to realise multi-address connection. In this study, the theoretical analysis of the number of terminals in MDMA scheme is carried out. According to the analysis results, infinite terminals can be connected together in the ideal case. Moreover, the simulation results show that compared with the conventional multi-input multi-output millimetre wave communication systems, the performance indicators of MDMA millimetre wave communication systems are improved remarkably.
Published: 2018
Full Text: View/download PDF

21. A new jig-shape optimization method for the high aspect ratio wing

Author: Shihui Huo, Jianting Ren, and Zhe Yuan
Subjects: Wing, business.industry, Computer science, 0211 other engineering and technologies, Aerospace Engineering, Torsion (mechanics), Inversion (meteorology), Single parameter, 02 engineering and technology, Aerodynamics, Structural engineering, Quadratic function, 020303 mechanical engineering & transports, 0203 mechanical engineering, Shape optimization, Structural deformation, business, 021106 design practice & management
Abstract: Purpose Computational efficiency is always the major concern in aircraft design. The purpose of this research is to investigate an efficient jig-shape optimization design method. A new jig-shape optimization method is presented in the current study and its application on the high aspect ratio wing is discussed. Design/methodology/approach First, the effects of bending and torsion on aerodynamic distribution were discussed. The effect of bending deformation was equivalent to the change of attack angle through a new equivalent method. The equivalent attack angle showed a linear dependence on the quadratic function of bending. Then, a new jig-shape optimization method taking integrated structural deformation into account was proposed. The method was realized by four substeps: object decomposition, optimization design, inversion and evaluation. Findings After the new jig-shape optimization design, both aerodynamic distribution and structural configuration have satisfactory results. Meanwhile, the method takes both bending and torsion deformation into account. Practical implications The new jig-shape optimization method can be well used for the high aspect ratio wing. Originality/value The new method is an innovation based on the traditional single parameter design method. It is suitable for engineering application.
Published: 2018
Full Text: View/download PDF

22. Optimizing linearity of weight updating in TaO -based memristors by depression pulse scheme for neuromorphic computing

Author: He-Ming Huang, Xin Guo, and Zhe-Yuan Shao
Subjects: Artificial neural network, Computer science, Linearity, General Chemistry, Memristor, Condensed Matter Physics, law.invention, Pulse (physics), Synaptic weight, Nonlinear system, Neuromorphic engineering, law, Electronic engineering, General Materials Science, MNIST database
Abstract: To achieve high classification accuracy in the neuromorphic computing, the good linearity of the weight updating in memristors is critical. However, it is very challenging to realize a good linearity behavior in filamentary memristors. In this work, we develop a unidirectional depression pulse scheme to optimize the linearity in TaOx-based memristors. Devices with the conventional identical pulse scheme perform high nonlinearity because of the abrupt SET process, while the proposed scheme offers an optimized linearity and uses two devices to define the synaptic weight. Moreover, the classification accuracy of MNIST handwritten digits by the neural network based on the TaOx memristors is enhanced from 45% to 94% with this pulse scheme.
Published: 2021
Full Text: View/download PDF

23. Performance evaluation of organizations considering economic incentives for emission reduction: A carbon emission permit trading approach

Author: Junfei Chu, Zhe Yuan, Jie Wu, Ali Emrouznejad, and Caifeng Shao
Subjects: Economics and Econometrics, Computer science, Astrophysics::High Energy Astrophysical Phenomena, 020209 energy, 05 social sciences, chemistry.chemical_element, Thermal power station, 02 engineering and technology, Environmental economics, Reduction (complexity), General Energy, Incentive, chemistry, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Data envelopment analysis, Production (economics), Emissions trading, 050207 economics, Carbon
Abstract: The emissions trading system allows organizations to transact emission permits to fit their production practice. This paper develops a new nonparametric methodology for performance evaluation of organizations (or decision-making units, DMUs) considering carbon emission permit trading. Explicit production axioms are discussed, and a new production technology considering carbon emission permit trading is proposed. Models based on the new production technology are established for evaluating the carbon emission reduction potential and performance of the DMUs. Comparing the proposed models with previous ones, the adoption of carbon emission permit trading increases the potentials of DMUs to reduce carbon dioxide emission and improve inputs and outputs. In addition, a proper increase of the carbon emission permit trading price can increase the potential of DMUs to reduce carbon dioxide emissions. The proposed approach contributes to the literature by explicitly explaining how adopting carbon emission permit trading affects production technology. A numeral example illustrates the proposed approach while the usefulness and practicality of the models are explained by applying them to China's thermal power industry.
Published: 2021
Full Text: View/download PDF

24. A 65-nm ReRAM-Enabled Nonvolatile Processor With Time-Space Domain Adaption and Self-Write-Termination Achieving $> 4\times $ Faster Clock Frequency and $> 6\times $ Higher Restore Speed

Author: Wei-Hao Chen, Chrong Jung Lin, Kang-Lung Wang, Albert Lee, Meng-Fan Chang, Jinyang Li, Ya-Chin King, Zhibo Wang, Fang Su, Chieh-Pu Lo, Huazhong Yang, Zhe Yuan, Pedram Khalili Amiri, Yongpan Liu, Hsiao-Yun Chiu, Chien-Chen Lin, and Wei-En Lin
Subjects: Random access memory, Hardware_MEMORYSTRUCTURES, Computer science, business.industry, 020208 electrical & electronic engineering, Clock rate, 02 engineering and technology, 020202 computer hardware & architecture, Resistive random-access memory, Non-volatile memory, nvSRAM, Data redundancy, Ferroelectric RAM, 0202 electrical engineering, electronic engineering, information engineering, Static random-access memory, Electrical and Electronic Engineering, business, Computer hardware, Efficient energy use
Abstract: With an ever-increasing demand for energy efficiency, processors with instant-on and zero leakage features are highly appreciated in energy harvesting as well as “normally off” applications. Recently, zero-standby power and fast switching nonvolatile processors (NVPs) have been proposed based on emerging nonvolatile memories (NVMs), such as ferroelectric RAM or spin-transfer-torque magnetic RAM. However, previous NVPs store all data to NVM upon every power interruption, resulting in high-energy consumption and degraded NVM endurance. This paper presents a 65-nm fully CMOS-logic-compatible ReRAM-based NVP supporting time-space domain adaption. It incorporates adaptive nonvolatile controller, nonvolatile flip-flops, and nonvolatile static random access memory (nvSRAM) with self-write termination. Data redundancy in both time and space domain is fully exploited to reduce store/restore time/energy and boost clock frequency. The NVP operates at >100 MHz and achieves 20 ns/0.45 nJ restore time/energy, realizing >6 $\times $ and >6000 $\times $ higher speed and energy efficiency of restore and >4 $\times $ faster operating frequency compared with that of state of the art.
Published: 2017
Full Text: View/download PDF

25. A ReRAM-Based Nonvolatile Flip-Flop With Self-Write-Termination Scheme for Frequent-OFF Fast-Wake-Up Nonvolatile Processors

Author: Fang Su, Chrong Jung Lin, Chien-Chen Lin, Qi Wei, Yongpan Liu, Yu Wang, Kang-Lung Wang, Zhe Yuan, Chieh-Pu Lo, Albert Lee, Pedram Khalili Amiri, Wei-Hao Chen, Meng-Fan Chang, K. C. Hsu, Zhibo Wang, Ya-Chin King, Hochul Lee, and Huazhong Yang
Subjects: 010302 applied physics, Computer science, 020208 electrical & electronic engineering, 02 engineering and technology, Parallel computing, Topology, 01 natural sciences, Power (physics), law.invention, Resistive random-access memory, Non-volatile memory, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Flip-flop, Voltage
Abstract: Nonvolatile flip-flops (nvFFs) enable frequent-off processors to achieve fast power-off and wake-up time while maintaining critical local computing states through parallel data movement between volatile FFs and local nonvolatile memory (NVM) devices. However, current nvFFs face challenges in large store energy ( $\text{E}_{\mathrm {S}}$ ) and long voltage stress time on the device ( $\text{T}_{\mathrm {STRESS}}$ ), due to wide distribution in the write time of NVM device as well as unnecessary writes. Moreover, heavy parasitic load on the power rail cause long wake-up time for restoring data from NVM to FFs. This paper proposes the resistive RAM (ReRAM)-based nvFF with self-write termination (SWT) and reduced loading on power rail to: 1) reduce 93+% waste of $\text{E}_{\mathrm {S}}$ from fast switching or matched cells; 2) suppress endurance and reliability degradation resulted from overprogramming and long $\text{T}_{\mathrm {STRESS}}$ ; and 3) achieve reliable and 26+ times faster restore operation compared with previous nvFFs. We have fabricated a nonvolatile processor and a test chip with SWT-nvFFs using logic-process ReRAM in a 65-nm CMOS process. Measured results show sub-2-ns termination response time and sub-20-ns chip-level restore time.
Published: 2017
Full Text: View/download PDF

26. CP-FPGA: Energy-Efficient Nonvolatile FPGA With Offline/Online Checkpointing Optimization

Author: Jingtong Hu, Huazhong Yang, Chun Jason Xue, Zhe Yuan, Yongpan Liu, and Jinyang Li
Subjects: 010302 applied physics, Computer science, business.industry, 02 engineering and technology, Energy consumption, 01 natural sciences, Reconfigurable computing, 020202 computer hardware & architecture, Non-volatile memory, Hardware and Architecture, Backup, Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Electrical and Electronic Engineering, Field-programmable gate array, business, Software, Rollback, Volatile memory, Efficient energy use
Abstract: Field-programmable gate arrays (FPGAs) have drawn lots of attentions due to their programmability and high performance. Recently, ultralow-power FPGAs for Internet of Things, together with energy-harvesting technique, have become an emerging self-powered computing platform. However, volatile memory in FPGA will lose their states under unstable power supplies and cannot work efficiently. Nonvolatile FPGA becomes a promising alternative. This paper proposes a hardware/software codesign nonvolatile FPGA with efficient offline/online checkpointing strategy (CP-FPGA). Backup energy is reduced by offline selecting proper checkpointing locations to minimize backup data. An online scheduler is further proposed to balance computation rollback overhead against backup energy. Experimental results show that the proposed CP-FPGA reduces 39.5% energy consumption on average compared with the state-of-the-art techniques.
Published: 2017
Full Text: View/download PDF

27. Spintronic devices for neuromorphic computing

Author: YaJun Zhang, Xiaorui Zhu, Zhe Yuan, Ke Xia, and Qi Zheng
Subjects: Tunnel magnetoresistance, Artificial neural network, Spintronics, Neuromorphic engineering, Computer science, business.industry, Electrical engineering, General Physics and Astronomy, business
Published: 2020
Full Text: View/download PDF

28. 14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse

Author: Meng-Fan Chang, Jinshan Yue, Xin Si, Zhixiao Zhang, Yifan He, Xiaoyu Feng, Huazhong Yang, Zhe Yuan, Yongpan Liu, Ruhui Liu, and Xueqing Li
Subjects: Random access memory, Computer engineering, Artificial neural network, Computer science, ComputerApplications_COMPUTERSINOTHERSYSTEMS, Data compression ratio, Macro, Efficient energy use, Sparse matrix, Power (physics)
Abstract: Computing-in-Memory (CIM) is a promising solution for energy-efficient neural network (NN) processors. Previous CIM chips [1], [4] mainly focus on the memory macro itself, lacking insight on the overall system integration. Recently, a CIM-based system processor [5] for speech recognition demonstrated promising energy efficiency. No prior work systematically explores sparsity optimization for a CIM processor. Directly mapping sparse NN models onto regular CIM macros is ineffective, since sparse data is usually randomly distributed and CIM macros cannot be power gated even when many zeros exist. For a high compression rate and high efficiency, the granularity of sparsity [6] needs to be explored based on CIM characteristics. Moreover, system-level weight mapping to a CIM macro and data-reuse strategies are not well explored - these directions are important for CIM macro utilization and energy efficiency.
Published: 2020
Full Text: View/download PDF

29. 14.2 A 65nm 24.7µJ/Frame 12.3mW Activation-Similarity-Aware Convolutional Neural Network Video Processor Using Hybrid Precision, Inter-Frame Data Reuse and Mixed-Bit-Width Difference-Frame Data Codec

Author: Yixiong Yang, Jinshan Yue, Xiulong Wu, Xueqing Li, Ruoyang Liu, Zhiting Lin, Huazhong Yang, Yongpan Liu, Xiaoyu Feng, and Zhe Yuan
Subjects: 010302 applied physics, Similarity (geometry), business.industry, Computer science, 020208 electrical & electronic engineering, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Process (computing), Inter frame, 02 engineering and technology, 01 natural sciences, Convolutional neural network, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Codec, Computer vision, Artificial intelligence, business, Data transmission
Abstract: Convolutional Neural Networks (CNNs) have become widely used in image signal processing, such as tracking, classification and post-processing. Modern CNNs use millions of weights and activations, leading to critical challenges for both computation and data transmission. Video applications, such as autopilot and surveillance cameras, have to process a large number of sequential images/frames within limited time, making the situation even worse. As shown in Fig. 14.2.1, adjacent activation frames of typical video applications are similar to each other most of the time, providing an opportunity to reduce both computing and data transmission complexity significantly.
Published: 2020
Full Text: View/download PDF

30. Incipient Fault Detection of Rolling Element Bearings Based on Deep EMD-PCA Algorithm

Author: Zhenpeng Liu, Maxiao Hou, Jie Sun, Huaitao Shi, Zhe Yuan, and Jin Guo
Subjects: 0209 industrial biotechnology, Article Subject, Computer science, Noise (signal processing), Physics, QC1-999, Mechanical Engineering, 020208 electrical & electronic engineering, 02 engineering and technology, Geotechnical Engineering and Engineering Geology, Condensed Matter Physics, Fault (power engineering), Signal, Fault detection and isolation, Hilbert–Huang transform, Vibration, 020901 industrial engineering & automation, Mechanics of Materials, Principal component analysis, 0202 electrical engineering, electronic engineering, information engineering, Point (geometry), Algorithm, Civil and Structural Engineering
Abstract: Due to the relatively weak early fault characteristics of rolling bearings, the difficulty of early fault detection increases. For unsolving this problem, an incipient fault detection method based on deep empirical mode decomposition and principal component analysis (Deep EMD-PCA) is proposed. In this method, multiple data processing layers are created to extract weak incipient fault features, and EMD is used to decompose the vibration signal. This method establishes an accurate data mode, which can improve the incipient fault detection capability. It overcomes the difficulties of incipient fault detection, in which weak fault features can be extracted from the background of strong noise. From a theoretical point of view, this paper proves that the Deep EMD-PCA method can retain more variance information and has a good early fault detection ability. The experiment results indicate that the detection rate of Deep EMD-PCA is about 85%, and the failure detection delay time is almost zero. The incipient faults of rolling element bearings can be detected accurately and timely by Deep EMD-PCA. The method effectively improves the accuracy and timeliness of fault detection under actual working conditions and has good practical application value.
Published: 2020
Full Text: View/download PDF

31. Ultra-Reliability Connectivity with Redundant D2D Transmission Scheme for Tactile Internet

Author: Xin Wei, Wenqin Zhuang, Liang Zhou, Bin Kang, Jianxin Chen, and Zhe Yuan
Subjects: Base station, Computer science, business.industry, The Internet, Latency (engineering), business, 5G, Computer network
Abstract: The Tactile Internet, as a promising communication infrastructure, will be rapidly developed in the 5G era. It supports real-time human-to-machine interactions through delivery of touch and sensation information. The realization of Tactile Internet is associated with the demands of ultra-reliable and ultra-low latency connectivity, which is an important issue needing to be handled. Device-to-device (D2D) communication is a short-distance ultra-low latency transmission mode in 5G, which is capable of D2D edge nodes communicating with each other directly without an infrastructure of access point. In addition, D2D communication can reuse the licensed band resources under the management of the base station. Therefore, to meet the reliability and latency demands, D2D can be advocated as a significant component of the Tactile Internet. This paper focuses on a reliability design of D2D which can support ultra-high reliability and ultra-low latency connectivity in the Tactile Internet. Specifically, we firstly simulate the reliability of D2D communication as a birth and death process. Then, we propose a redundant D2D transmission scheme, which enhances D2D reliability under the conditions of latency and resources. Finally, we design a dynamic programming (DP) algorithm to solve the problem of redundant D2D link selection. Simulation results about the transmission reliability and latency show that our proposed scheme can have better performance than the competing schemes, which can satisfy connectivity requirements of the Tactile Internet.
Published: 2019
Full Text: View/download PDF

32. A Sparse-Adaptive CNN Processor with Area/Performance balanced N-Way Set-Associate PE Arrays Assisted by a Collision-Aware Scheduler

Author: Zhibo Wang, Zhe Yuan, Yixiong Yang, Jinshan Yue, Xiaoyu Feng, Yanzhi Wang, Jingyu Wang, Xueqing Li, Yongpan Liu, and Huazhong Yang
Subjects: Computer science, Quantization (signal processing), Computation, 020208 electrical & electronic engineering, 02 engineering and technology, TOPS, Chip, Collision, Convolutional neural network, 020202 computer hardware & architecture, Computer engineering, 0202 electrical engineering, electronic engineering, information engineering, Efficient energy use, Electronic circuit
Abstract: Convolutional Neural Networks give heavy storage and computation burden to accelerators, whose energy efficiency can be improved by leveraging their sparsity. However, using sparsity in networks will introduce large overhead especially when networks have various sparse situations and multiple quantization. This work named STICKER-II firstly proposes an area/performance balanced chip with simultaneously sparsi-ty/quantization adaptive capability, enabled by multi-sparsity and multi-quantization compatible storage and computation circuits. Further more, N-way set-associate PE architecture is explored to trade off its performance and area with the collision-aware hardware scheduler. This chip achieves 4.64 TOPS/W energy efficiency on Alexnet, 1.65x better than the state-of-the-art sparse processors, along with 19.4% PE area reduction.
Published: 2019
Full Text: View/download PDF

33. Learning One-class Support Vector Machine by Using Artificial Bee Colony Algorithm and Its Application for Disease Classification

Author: Chen-Yu Hong, Yung-Nien Sun, Yu-Lun Hong, Ming-Huwi Horng, and Zhe-Yuan Zhan
Subjects: Computer science, business.industry, Document classification, Sample (statistics), Information repository, computer.software_genre, Machine learning, Class (biology), Signature (logic), Artificial bee colony algorithm, Set (abstract data type), Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Artificial intelligence, business, computer
Abstract: The one-classification support vector (OCSVM) is a variant of SVM which only uses the positive class sample set in training stage. It has been widely used in the applications of disease diagnose, handwritten signature verification, remote sensing and document classification. However, there are many parameters needed to regulate. The mistake of parameter setting makes OCSVM it to be not effectiveness. Therefore, in this paper we proposed a learning algorithm based on the artificial bee colony algorithm to select the parameters. The construction algorithm of OSCVM is called the artificial bee colony based OSCVM (ABC-OCSVM) algorithm. Experimental results of two medical datasets of UCI data repository showed that our proposed ABC-OCSVM method outperforms the conventional LIBSVM package.
Published: 2019
Full Text: View/download PDF

34. Joint Social-Aware and Mobility-Aware Caching in Cooperative D2D

Author: Zhe Yuan, Liang Zhou, Wenqin Zhuang, and Xin Wei
Subjects: Computer science, business.industry, Retransmission, Reliability (computer networking), media_common.quotation_subject, 020206 networking & telecommunications, 02 engineering and technology, Network topology, Transmission (telecommunications), Knapsack problem, 0202 electrical engineering, electronic engineering, information engineering, Selfishness, 020201 artificial intelligence & image processing, Cache, business, Greedy algorithm, media_common, Computer network
Abstract: The cooperative D2D content sharing mode is that multiple users can share content with each in a cooperative way. In this mode, we can improve the transmission efficiency and transmission stability of communication. Although there are many literatures on cooperative D2D, the mobility problem in this transmission mode has been ignored by many people. Moreover, traditional D2D communication is a one-to-one transmission mode, but cooperative D2D is a many-to-one transmission mode. There is a big difference of link establishment and content transmission between these two modes. And the impact of mobility on network topology is also very different. Therefore, the existing caching scheme and retransmission mechanism for D2D can not be applied to cooperative D2D. Meanwhile, whether in D2D or cooperative D2D, the success of the user requesting content is closely related to the social relationship between users, due to the social selfishness of D2D users. Therefore, in order to improve the performance of cooperative D2D content sharing, we exploit user interest similarity and mobility and propose a social-and-mobility-aware caching strategy for collaborative D2D scenarios. Not only that, we model the retransmission problem as a Knapsack problem and design the retransmission mechanism when the transmission link is interrupted to ensure the reliability of content sharing with the greedy algorithm. Finally, our simulation results show that our proposed caching placement scheme and retransmission scheme can improve the performance of cooperative D2D content sharing. In addition, we achieve a valuable caching guideline in cooperative D2D scenarios.
Published: 2019
Full Text: View/download PDF

35. Selecting common projection direction in DEA directional distance function based on directional extensibility

Author: Zhe Yuan, Jie Wu, Fangqing Wei, and Junfei Chu
Subjects: 021103 operations research, Similarity (geometry), General Computer Science, Computer science, 0211 other engineering and technologies, General Engineering, 02 engineering and technology, Extensibility, Zero (linguistics), Set (abstract data type), Projection direction, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm
Abstract: This paper develops a new approach to select a common projection direction for performance evaluation of decision-making units (DMUs) using the data envelopment analysis (DEA) directional distance function. First, we define the concept of directional extensibility of a specific projection direction with respect to a set of inefficient DMUs. The concept shows the projection direction’s ability in reducing inefficiencies, simultaneously, for all inefficient DMUs. Using this concept, we propose a model which identifies the common projection direction that has the maximum directional extensibility. However, this common projection direction may contain zero elements. To avoid this problem and to select the final common projection direction, an algorithm is developed to select the common projection direction which is resulted in a trade-off between the directional extensibility and the similarity among the elements of the common projection direction. Our developments contribute by formally defining a concept (i.e., the directional extensibility) which can well reflect a common projection direction’s ability in reducing the DMUs’ inefficiencies. Moreover, the selected final common projection direction has both good directional extensibility and no zero elements, which helps to guide the inefficient DMUs to develop in a balanced way because the direction suggests the DMUs to improve in all inputs and outputs. Finally, we demonstrate the usefulness of our proposed approach by using an empirical case of 18 logistics companies listed among China’s top 500 enterprises in 2018.
Published: 2021
Full Text: View/download PDF

36. A rapid aeroelasticity optimization method based on the stiffness characteristics

Author: Zhe Yuan, Jianting Ren, and Shihui Huo
Subjects: Engineering, Civil, computational efficiency, Computer science, stiffness characteristics, Engineering, Multidisciplinary, Aeroelasticity optimization method, Matemàtiques i estadística::Anàlisi numèrica [Àrees temàtiques de la UPC], medicine, Engineering, Geological, Engineering, Ocean, Engineering, Aerospace, Anàlisi numèrica, aerodynamic characteristics, business.industry, Applied Mathematics, General Engineering, Stiffness, Structural engineering, Aeroelasticity, Engineering, Marine, Engineering, Mechanical, Engineering, Manufacturing, divergent velocity, Engineering, Industrial, Mathematical & Computational Biology, medicine.symptom, business, Numerical analysis
Abstract: A rapid aeroelasticity optimization method based on the stiffness characteristics was proposed in the present study. Large time expense in static aeroelasticity analysis based on traditional time domain aeroelasticity method is solved. Elastic axis location and torsional stiffness are discussed firstly. Both torsional stiffness and the distance between stiffness center and aerodynamic center have a direct impact on divergent velocity. The divergent velocity can be adjusted by changing the correlative structural parameters. The relation between structural parameters and divergent velocity is introduced to aeroelasticity optimization design as a constraint condition. After optimization, the structural and aerodynamic characteristics have a large improvement while satisfying the constraint conditions. The optimization method can be well used in high aspect ratio wing and has great computational efficiency.
Published: 2019

37. 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

Author: Yung-Ning Tu, Ao Ren, Zhe Yuan, Ruoyang Liu, Xueqing Li, Jinshan Yue, Wenyu Sun, Yongpan Liu, Meng-Fan Chang, Yanzhi Wang, Yi-Ju Chen, Zhibo Wang, and Huazhong Yang
Subjects: 010302 applied physics, Artificial neural network, Computer science, 020208 electrical & electronic engineering, 02 engineering and technology, Parallel computing, TOPS, 01 natural sciences, Domain (software engineering), Acceleration, Transpose, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Architecture, Circulant matrix, Block (data storage)
Abstract: Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1–3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128$\times$ storage savings and a O($\mathrm{n}^{2}$)-to-O(nlog(n)) computation complexity reduction.
Published: 2019
Full Text: View/download PDF

38. AERIS

Author: Shuangchen Li, Fang Su, Zhibo Wang, Zhe Yuan, Huazhong Yang, Wenyu Sun, Yongpan Liu, Xueqing Li, and Jinshan Yue
Subjects: Power gating, XNOR gate, Computer science, Electronic engineering, Neural network system, Chip, Scheduling (computing), Efficient energy use, Electronic circuit, Resistive random-access memory
Abstract: ReRAM-based processing-in-memory (PIM) architecture is a promising solution for deep neural networks (NN), due to its high energy efficiency and small footprint. However, traditional PIM architecture has to use a separate crossbar array to store either positive or negative (P/N) weights, which limits both energy efficiency and area efficiency. Even worse, imbalance running time of different layers and idle ADCs/DACs even lower down the whole system efficiency. This paper proposes AERIS, an Area/Energy-efficient 1T2R ReRAM based processing-In-memory NN System-on-a-chip to enhance both energy and area efficiency. We propose an area-efficient 1T2R ReRAM structure to represent both P/N weights in a single array, and a reference current cancelling scheme (RCS) is also presented for better accuracy. Moreover, a layer-balance scheduling strategy, as well as the power gating technique for interface circuits, such as ADCs/DACs, is adopted for higher energy efficiency. Experiment results show that compared with state-of-the-art ReRAM-based architectures, AERIS achieves 8.5x/1.3x peak energy/area efficiency improvements in total, due to layer-balance scheduling for different layers, power gating of interface circuits, and 1T2R ReRAM circuits. Furthermore, we demonstrate that the proposed RCS compensates the non-ideal factors of ReRAM and improves NN accuracy by 5.2% in the XNOR net on CIFAR-10 dataset.
Published: 2019
Full Text: View/download PDF

39. An N-way group association architecture and sparse data group association load balancing algorithm for sparse CNN accelerators

Author: Huazhong Yang, Ruoyang Liu, Zhe Yuan, Jingyu Wang, and Yongpan Liu
Subjects: Computer science, 02 engineering and technology, Load balancing (computing), Collision, 020202 computer hardware & architecture, Performance limit, Application-specific integrated circuit, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Architecture, Algorithm, Sparse matrix, Collision rate, Efficient energy use
Abstract: In recent years, ASIC CNN Accelerators have attracted great attention among researchers for the high performance and energy efficiency. Some former works utilize the sparsity of CNN networks to improve the performance and the energy efficiency. However, these methods bring tremendous overhead to the output memory, and the performance suffers from the hash collision. This paper presents: 1) an N-Way Group Association Architecture to reduce the memory overhead for Sparse CNN Accelerators; 2) a Sparse Data Group Association Load Balancing Algorithm which is implemented by the Scheduler module in the architecture to reduce the collision rate and improve the performance. Compared with the state-of-art accelerator, this work achieves either 1) 1.74x performance with 50% memory overhead reduction in the 4-way associated design or 2) 1.91x performance without memory overhead reduction the 2-way associated design, which is close to the theoretical performance limit (without collision).
Published: 2019
Full Text: View/download PDF

40. Cold Chain Logistics Service Quality on the Willingness of Online Shopping Fresh Products Based on Logistic Regression Model

Author: Zhe Yuan, Ting Wu, and Shijun Tang
Subjects: Service (business), Service quality, 021103 operations research, Computer science, 05 social sciences, 0211 other engineering and technologies, ComputerApplications_COMPUTERSINOTHERSYSTEMS, 02 engineering and technology, Logistic regression, Affect (psychology), Product (business), 0502 economics and business, Cold chain, Marketing, 050203 business & management, Consumer behaviour
Abstract: Based on the background that fresh-chain e-commerce logistics is rapidly developed recently in China, this paper attempts to reveal factors that may affect the consumers’ willingness to purchase fresh products online and further obtain the corresponding data combining consumer behavior theory and cold-chain service elements. Logistic regression model was used to quantitatively analyze the influence of different cold chain service elements on consumers’ willingness to purchase fresh products online. Combining the empirical results, the corresponding improvement suggestions for fresh product cold chain logistics service quality were proposed.
Published: 2019
Full Text: View/download PDF

41. Design of Software-Defined-Satellite-based PID Attitude Control Application in Python

Author: Yu-Jia Zhai and Zhe-Yuan Lin
Subjects: Computer science, business.industry, PID controller, Satellite system, Control engineering, Python (programming language), Euler angles, Attitude control, symbols.namesake, Software, Physics::Space Physics, Python language, symbols, Satellite, business, computer, computer.programming_language
Abstract: Up to now, due to the inconvenience of updating the software of conventional satellite systems, we proposed a new satellite structure called software defined satellite system. Based on this system, we complete the design of a supported software application using Python language, in which the PID attitude controller meets the need to control the real time attitude described by Euler angles of satellite in orbit. The benefits along with the challenges of the new structure are proposed as well.
Published: 2018
Full Text: View/download PDF

42. Energy-Efficient Massive Content Delivery via Devices-to-Devices Communication

Author: Liang Zhou, Xuguang Zhang, Jianxin Chen, and Zhe Yuan
Subjects: Matching (statistics), Computer science, Network packet, business.industry, 020206 networking & telecommunications, 02 engineering and technology, Energy consumption, Distributed algorithm, Path (graph theory), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Greedy algorithm, Mobile device, Efficient energy use, Computer network
Abstract: Conventionally, the D2D content sharing mode is “one-to-one” matching, i.e, one demander will select one provider to request files from it. Under this mode, it is hard to cope with the growing demand for multimedia services for mobile users due to limited battery capacity for mobile devices. In this work, we propose an energy-efficient content sharing system via a novel Devices-to-Devices Communication (Ds2Ds), which shares content among multiple users to reduce the power consumption per user device. The highlights of this work lie in three parts. Firstly, we model the match of providers to demanders as a maximum weighted matching problem, and find a distributed algorithm to solve it. Secondly, we design an optimal packet split algorithm for Ds2Ds under comprehensive consideration of two aspects of communication efficiency and energy consumption to solve the problem how many data packets each provider transfers. Thirdly, we model the file reconstruction problem of collaboration demanders as a NP-hard problem and develop a best-effort distributed greedy algorithm framework to find the shortest file reconstruction path. Finally, numerical results demonstrate that the proposed mechanism can greatly reduce the energy consumption of each device.
Published: 2018
Full Text: View/download PDF

43. Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers

Author: Zhe Yuan, Jinyang Li, Yongpan Liu, Huanrui Yang, Yixiong Yang, Huazhong Yang, Meng-Fan Chang, Zhibo Wang, Jinshan Yue, Qingwei Guo, and Xueqing Li
Subjects: Speedup, Artificial neural network, Computer science, 020208 electrical & electronic engineering, Hash function, 02 engineering and technology, Chip, 020202 computer hardware & architecture, Convolution, Memory bank, Memory management, 0202 electrical engineering, electronic engineering, information engineering, Quantization (image processing), Algorithm
Abstract: Neural Networks (NNs) have emerged as a fundamental technology for machine learning. The sparsity of weight and activation in NNs varies widely from 5%-90% and can potentially lower computation requirements. However, existing designs lack a universal solution to efficiently handle different sparsity in various layers and neural networks. This work, named STICKER, first systematically explores NN sparsity for inference and online tuning operations. Its major contributions are: 1) autonomous NN sparsity detector that switches the processor modes; 2) Multi-sparsity compatible Convolution (CONV) PE arrays that contain a multi-mode memory supporting different sparsity, and the set-associative PEs supporting both dense and sparse operations and reducing 92% memory area compared with previous hash memory banks; 3) Online tuning PE for sparse FCs that achieves 32.5x speedup compared with conventional CPU, using quantization center-based weight updating and Compressed Sparse Column (CSC) based back propagations. Peak energy efficiency of the 65nm STICKER chip is up to 62.1 TOPS/W at 8bit data length.
Published: 2018
Full Text: View/download PDF

44. Sparse matrix-based image enhancement for target detection using deep learning

Author: Fu-Ming Yang, Zhe-Yuan Kao, Shih-Yu Chen, and Yen-Chung Chen
Subjects: Tree (data structure), Computer science, business.industry, Deep learning, food and beverages, Preprocessor, Pattern recognition, Artificial intelligence, Image enhancement, business, Sparse matrix, Image (mathematics), Constant false alarm rate
Abstract: Leaf maturation from initiation to senescence is a phenological event of plants that is a result of the influences of temperature and water availability on physiological activities during a life cycle. Detection of newly grown leaves (NGL) is therefore useful in diagnosis if growth of trees, tree stress and even climatic change. There are many important applications that can naturally be modeled as a low-rank plus a sparse contribution. This paper develop a new algorithm and application to detect NGL. It uses first sparse matrix as a preprocessing to enhance target and applied deep learning to segment the image. The experimental results show that our proposed method can detect targets effectively and decrease false alarm rate.
Published: 2018
Full Text: View/download PDF

45. A Gradient-Based Cuckoo Search Algorithm for a Reservoir-Generation Scheduling Problem

Author: Jianzhong Zhou, Li Mo, Jiang Wu, Zhe Yuan, Chao Wang, and Feng Yu
Subjects: lcsh:T55.4-60.8, Computer science, 020209 energy, 02 engineering and technology, Standard deviation, long-term hydropower generation scheduling, lcsh:QA75.5-76.95, Theoretical Computer Science, 0202 electrical engineering, electronic engineering, information engineering, lcsh:Industrial engineering. Management engineering, Differential (infinitesimal), Cuckoo search, cascade reservoirs, gradient-based cuckoo search algorithm, Jinsha River, Numerical Analysis, Job shop scheduling, Computational Mathematics, Electricity generation, Computational Theory and Mathematics, Lévy flight, Gradient based algorithm, Cascade, lcsh:Electronic computers. Computer science, Algorithm
Abstract: In this paper, a gradient-based cuckoo search algorithm (GCS) is proposed to solve a reservoir-scheduling problem. The classical cuckoo search (CS) is first improved by a self-adaptive solution-generation technique, together with a differential strategy for Levy flight. This improved CS is then employed to solve the reservoir-scheduling problem, and a two-way solution-correction strategy is introduced to handle variants’ constraints. Moreover, a gradient-based search strategy is developed to improve the search speed and accuracy. Finally, the proposed GCS is used to obtain optimal schemes for cascade reservoirs in the Jinsha River, China. Results show that the mean and standard deviation of power generation obtained by GCS are much better than other methods. The converging speed of GCS is also faster. In the optimal results, the fluctuation of the water level obtained by GCS is small, indicating the proposed GCS’s effectiveness in dealing with reservoir-scheduling problems.
Published: 2018
Full Text: View/download PDF

46. Clustered Underlay Device-to-Device Network: Modeling and Performance Analysis

Author: Lei Wang, Zhe Yuan, Xuguang Zhang, and Liang Zhou
Subjects: Computer science, Distributed computing, 020208 electrical & electronic engineering, 0202 electrical engineering, electronic engineering, information engineering, Resource allocation, 020206 networking & telecommunications, 02 engineering and technology, Spectral efficiency, Underlay, Cluster analysis, Network model, Power control
Abstract: With the emerging demands for the large-scale services, device-to-device (D2D) communications have been viewed as a promising technology to improve network capacity and efficiency and the benefit can be further improved by D2D clustering. The paper develops a new and realistic spatial model for clustered underlay D2D networks by modeling the location of the devices as a Poisson Point Process. In this model, content of interest for a typical device is available at a device chosen uniformly at random from the same cluster. Under the condition, we derive the exact expression and approximation for the coverage probability and the area spectral efficiency (ASE). Extensive simulation results demonstrate the impact of the number of interference on the typical device. In addition, we get the relationship between resource allocation, power control and the performance of the clustered underlay D2D network by analysis.
Published: 2017
Full Text: View/download PDF

47. The Role of Video Coding in D2D Communications

Author: Xuguang Zhang, Zhe Yuan, Jianxin Chen, and Feng Tian
Subjects: business.industry, Computer science, Multiple description coding, 020206 networking & telecommunications, 02 engineering and technology, Scalable Video Coding, Base station, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality of experience, business, Mobile device, Computer network, Coding (social sciences)
Abstract: The development of Device-to-Device (D2D) communication provides potential for distributing data in mobile devices, offloading large amount of data traffic from the base station to mobile devices. It greatly enhances the system capacity, and makes the mass of high-quality video services possible. The existing video transmission schemes for D2D communication seldom consider the unique structural feature of video stream. In this paper, we explore the possibility to develop the encoding structural feature of H.264/AVC, Scalable Video Coding (SVC) and Multiple Description Coding (MDC) schemes. Theoretical and simulation results show that MDC fits best to the D2D network in terms of data offloading and Quality of Experience (QoE) improving.
Published: 2017
Full Text: View/download PDF

48. CNN-based pattern recognition on nonvolatile IoT platform for smart ultraviolet monitoring: (Invited paper)

Author: Zhe Yuan, Jingtong Hu, Fang Su, Jinyang Li, Yongpan Liu, Jinshan Yue, Huazhong Yang, and Qingwei Guo
Subjects: Artificial neural network, Computer science, business.industry, Wearable computer, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Convolutional neural network, Reduction (complexity), Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Energy harvesting, Wearable technology
Abstract: Intelligent computing and maintenance-free powering are two desirable characteristics of wearable IoT devices. Energy harvesting nonvolatile intelligent processor (NIP) with neural network computation capability has the potential to advance these goals. Individual ultraviolet (UV) exposure monitoring progressively becomes one conspicuous application of wearable devices. In resource constrained wearable sensor nodes, we can alleviate the data transmission burden via convolutional neural networks (CNNs) based pattern recognition. Nevertheless, in spite of the substantially improved computing capability of NIP, typically computational and memory intensive CNNs are still too bulky for on-node implementation. We develop an CNN-based pattern recognition system for nonvolatile IoT platform for smart UV monitoring, and propose a optimization method to achieve extremely tiny and efficient CNNs. Experimental results show that the offline-trained CNN can recognize individual UV exposure patterns with accuracy of 85%, and the simplified on-node CNN can achieve 93.2% parameters reduction with only 5% accuracy loss.
Published: 2017
Full Text: View/download PDF

49. CORAL: Coarse-grained reconfigurable architecture for Convolutional Neural Networks

Author: Zhe Yuan, Jinyang Li, Yongpan Liu, Jinshan Yue, and Huazhong Yang
Subjects: Hardware architecture, business.industry, Computer science, Control reconfiguration, Reconfigurability, 020207 software engineering, 02 engineering and technology, Energy consumption, Convolutional neural network, Application-specific integrated circuit, Embedded system, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business, Field-programmable gate array, Efficient energy use
Abstract: Convolutional Neural Network (CNN) has become one of the most successful technologies for visual classification and other applications. As CNN models continue to evolve and adopt different kernel sizes in various applications, it is necessary for the hardware architecture to support reconfigurability. Previous FPGAs and programmable ASICs are fine-grained reconfigurable but with energy efficiency compromise. Considering specific features of CNNs, this paper presents an energy efficient coarse-grained reconfigurable architecture, denoted as CORAL. An application-specific configuration neural block is proposed for convolution operations with reconfigurable data quantization to reduce both energy consumption and on-chip memory requirements. An optimal data loading strategy is presented for CORAL to achieve the best energy efficiency. Experimental results show that CORAL improves 80.0% energy efficiency while reduces 78.9% chip area and 81.0% reconfiguration time compared with the best up-to-date programmable ASIC solution.
Published: 2017
Full Text: View/download PDF

50. Recurrent neural networks made of magnetic tunnel junctions

Author: Qi Zheng, Yuanyuan Mi, Ke Xia, Zhe Yuan, and Xiaorui Zhu
Subjects: Exploit, Computer science, FOS: Physical sciences, General Physics and Astronomy, Applied Physics (physics.app-ph), 02 engineering and technology, Parallel computing, 01 natural sciences, symbols.namesake, Software, 0103 physical sciences, 010302 applied physics, Magnetization dynamics, Artificial neural network, Series (mathematics), business.industry, Efficient algorithm, Physics - Applied Physics, Disordered Systems and Neural Networks (cond-mat.dis-nn), Condensed Matter - Disordered Systems and Neural Networks, 021001 nanoscience & nanotechnology, lcsh:QC1-999, Recurrent neural network, symbols, 0210 nano-technology, business, lcsh:Physics, Von Neumann architecture
Abstract: Artificial intelligence based on artificial neural networks, which are originally inspired by the biological architectures of human brain, has mostly been realized using software but executed on conventional von Neumann computers, where the so-called von Neumann bottleneck essentially limits the executive efficiency due to the separate computing and storage units. Therefore, a suitable hardware platform that can exploit all the advantages of brain-inspired computing is highly desirable. Based upon micromagnetic simulation of the magnetization dynamics, we demonstrate theoretically and numerically that recurrent neural networks consisting of as few as 40 magnetic tunnel junctions can generate and recognize periodic time series after they are trained with an efficient machine-learning algorithm. With ultrahigh operating speed, nonvolatile memory and high endurance and reproducibility, spintronic devices are promising hardware candidates for neuromorphic computing., Comment: 5 pages, 4 figures
Published: 2020
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

85 results on '"Zhe Yuan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources