3,943 results
Search Results
2. CGRA-ME: An Open-Source Framework for CGRA Architecture and CAD Research : (Invited Paper)
- Author
-
Xinyuan Wang, Xiaoyi Ling, Hsuan Hsiao, Rami Beidas, Omar Ragheb, Tianyi Yu, Vimal Chacko, and Jason H. Anderson
- Subjects
Computer science ,CAD ,Solid modeling ,computer.software_genre ,Software framework ,Application-specific integrated circuit ,Computer architecture ,Systems architecture ,Verilog ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,Field-programmable gate array ,computer ,computer.programming_language ,Abstraction (linguistics) - Abstract
Coarse-grained reconfigurable arrays (CGRAs) are programmable hardware platforms that can be used to realize application-specific accelerators for higher performance and energy efficiency. A CGRA is a 2D array of configurable logic blocks & interconnect, where the logic blocks are typically large & ALU-like, and the interconnect is word-wide. CGRA-ME is a software framework that enables the modelling and exploration of CGRA architectures, as well as research on CGRA CAD algorithms. With CGRA-ME, an architect can specify a CGRA architecture at a high level of abstraction. A set of applications can be mapped onto the architecture to assess the mappability, power, performance and cost. CGRA-ME also allows one to generate synthesizable Verilog RTL for the modelled CGRA, permitting its implementation as an ASIC or FPGA overlay. In this paper, we describe the CGRA-ME framework [5] and overview its capabilities and current limitations. We discuss ongoing and prior research conducted with the framework, as well as outline future plans. We believe CGRA-ME will be a valuable contribution to the community, enabling new research on CGRA CAD & architectures.
- Published
- 2021
3. Reproducibility Companion Paper: Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network
- Author
-
Wei Hu, Bo Wu, Yueqi Zhong, Jan Zahálka, and Xin Wang
- Subjects
Reproducibility ,Experimental Replication ,Computer architecture ,Computer science ,business.industry ,Deep learning ,Compatibility (mechanics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Artificial intelligence ,business ,Software package - Abstract
This companion paper supports the experimental replication of paper "Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network", which is presented at ACM Multimedia 2019. We provide the software package for replicating the implementation of Multi-Layered Comparison Network (MCN), as well as the Polyvore-T dataset and baseline methods compared in the original paper. This paper contains the guides to reproduce the experiment results including outfit compatibility prediction, outfit diagnosis and automatic outfit revision.
- Published
- 2020
4. High performance network components for scalable spaceborne processing needs: Poster, short paper
- Author
-
Richard W. Berger and Joseph R. Marshall
- Subjects
Engineering ,Random access memory ,Computer architecture ,business.industry ,Interface (Java) ,Embedded system ,Emphasis (telecommunications) ,Short paper ,Scalability ,Electromagnetic compatibility ,High performance network ,business ,SpaceWire - Abstract
This paper will describe high performance interface building blocks, compare their networking features and show how they may be used in small and large systems especially as they apply to SpaceVPX modules. Emphasis will be placed on their SpaceWire and other networking capabilities.1
- Published
- 2016
5. Ruche Networks: Wire-Maximal, No-Fuss NoCs : Special Session Paper
- Author
-
Chun Zhao, Dustin Richmond, Scott Davidson, Dai Cheol Jung, and Michael Taylor
- Subjects
Standard cell ,Router ,Very-large-scale integration ,Computer science ,Network packet ,Mesh networking ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Chip ,Column (database) ,020202 computer hardware & architecture ,Computer architecture ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Bandwidth (computing) ,0210 nano-technology - Abstract
Network-On-Chip design has been an active area of academic research for two decades, but many proposed ideas have not been adopted in real chips because they have complex behavior or create significant risks in chip implementation. For this reason, many existing chips just employ fast, replicated vanilla dimension-ordered mesh NoCs. However, these networks do not come close to utilizing the full available VLSI wiring capabilities, and propagate packets at speeds that are significantly below the raw speed of wires.The ideal network would not require any custom circuits, and would decompose easily into a hierarchical CAD flow consisting of a top-level design instantiating a mesh of identical hardened tiles with short-wire neighbor connections.At the same time, this ideal network would easily scale to efficiently utilize the majority of the available chip wiring resources, and would offer a mechanism for scaling this wire usage up or down based on available bandwidth. Packets would spend a significant fraction of their time in wire delay rather than router delay. Finally, the NoC would be simple to understand.This paper proposes Ruche Networks, which fulfill these requirements. They are based on simple 2-D mesh networks but amplify the NoC bandwidth and reduce NoC diameter of tiled architectures by adding long-range physical channels from each tile to other tiles on the same row or column. The more distant the connections, the greater the bandwidth of the network and the lower the diameter. The distance is typically increased until all of the physical VLSI wiring bandwidth have been absorbed.We explain the rational for this "ruching" and provide a simple methodology for designing and implementing these networks using a standard cell VLSI CAD flow.In this paper, we show the steps involved in ruching the HammerBlade Manycore’s mesh networks; these steps can easily apply to other designs.
- Published
- 2020
6. Ascend: a Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing : Industry Track Paper
- Author
-
Yuxing Hu, Jing Xia, Hu Liu, Xiping Zhou, Jiajin Tu, Honghui Yuan, and Heng Liao
- Subjects
Memory hierarchy ,business.industry ,Computer science ,020208 electrical & electronic engineering ,Symmetric multiprocessor system ,02 engineering and technology ,Data access ,Memory management ,Computer architecture ,Datapath ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data center ,business ,Heterogeneous network - Abstract
Deep neural networks (DNNs) have been successfully applied to a great variety of applications, ranging from small IoT devices to large scale services in a data center. In order to improve the efficiency of processing these DNN models, dedicated hardware accelerators are required for all these scenarios. Theoretically, there exists an optimized acceleration architecture for each application. However, considering the cost of chip design and corresponding tool-chain development, researchers need to trade off between efficiency and generality. In this work, we demonstrate that it is practical to use a unified architecture, called Ascend, to support those applications, ranging from IoT devices to data-center services. We provide a lot of design details to explain that the success of Ascend relies on contributions from different levels. First, heterogeneous computing units are employed to support various DNN models. And the datapath is adapted according to the requirement of computing and data access. Second, when scaling the Ascend architecture from a single core to a cluster containing thousands of cores, it involves design efforts, such as memory hierarchy and system level integration. Third, a multi-tier compiler, which provides flexible choices for developers, is the last critical piece. Experimental results show that using accelerators based on the Ascend architecture can achieve comparable or even better performance in different applications. In addition, various chips based on the Ascend architecture have been successfully commercialized. More than 100 million chips have been used in real products.
- Published
- 2021
7. AWD: Best Paper Competition (AWD) Enabling Next Generation Video Applications on Consumer Integrated and Discrete Client GPUs
- Author
-
Jill Macdonald Boyce and Basel Salahieh
- Subjects
Competition (economics) ,Computer architecture ,Computer science ,Encoding (memory) ,Codec ,Content adaptive ,Graphics ,Implementation ,Transform coding ,Power optimization - Abstract
This "success story" panel illustrates how next generation video applications are enabled on PCs today using consumer integrated and discrete GPUs launched in 2020. Intel's XeLP graphics technology with dedicated media hardware powers the integrated graphics in Intel's latest client processor, Tiger Lake, and Intel's first entry-level mainstream discrete graphics card, DG1. XeLP graphics in Tiger Lake and DG1 has democratized access to high performance implementations of the latest emerging video codec standards. Three topics will be covered: (i) 8K HEVC/AV1 Playback with Content Adaptive Power Optimization; (ii) Ludicrous Speed HEVC Encoding with Integrated + Discrete GPU; and (iii) MPEG Immersive Video (MIV) Playback on DG1.
- Published
- 2021
8. MAGICAL: Toward Fully Automated Analog IC Layout Leveraging Human and Machine Intelligence: Invited Paper
- Author
-
Mingjie Liu, Nan Sun, David Z. Pan, Biying Xu, Keren Zhu, Xiyuan Tang, Shaolan Li, and Yibo Lin
- Subjects
Heuristic (computer science) ,business.industry ,Computer science ,020208 electrical & electronic engineering ,Constraint (computer-aided design) ,02 engineering and technology ,Integrated circuit design ,Integrated circuit layout ,Automation ,020202 computer hardware & architecture ,Computer architecture ,Fully automated ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Netlist ,Routing (electronic design automation) ,business - Abstract
Despite tremendous advancement of digital IC design automation tools over the last few decades, analog IC layout is still heavily manual which is very tedious and error-prone. This paper will first review the history, challenges, and current status of analog IC layout automation. Then, we will present MAGICAL, a human-intelligence inspired, fully-automated analog IC layout system currently being developed under the DARPA IDEA program. It starts from an unannotated netlist, performs automatic layout constraint extraction and device generation, then performs placement and post-placement optimization, followed by routing to obtain the final GDSII layout. Various analytical, heuristic, and machine learning algorithms will be discussed. MAGICAL has obtained promising preliminary results. We will conclude the paper with further discussions on challenges and future directions for fully-automated analog IC layout.
- Published
- 2019
9. Wavelength-Routed Optical NoCs: Design and EDA — State of the Art and Future Directions: Invited Paper
- Author
-
Ulf Schlichtmann, Alexandre Truppel, Mengchu Li, Mahdi Nikdast, and Tsun-Ming Tseng
- Subjects
Range (mathematics) ,Computer architecture ,Computer science ,020208 electrical & electronic engineering ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Electronic design automation ,02 engineering and technology ,State (computer science) ,Routing (electronic design automation) ,Component placement ,Waveguide (optics) ,020202 computer hardware & architecture - Abstract
Wavelength-routed optical network-on-chip (WRONoC) design consists of topological and physical synthesis. It covers many interacting design aspects such as wavelength assignment, message routing, network construction, component placement, and waveguide routing. Due to the high complexity of the design problem, current manual design usually trades optimality for scalability and feasibility, which results in performance degradation and waste of resources. In this paper, we will present an overview of the existing design automation approaches that have demonstrated their effectiveness in customizing and optimizing application-specific WRONoC designs, and of the potential design automation directions to address a wider range of design challenges. We will also discuss the advantages of comprehensive optimization considering multiple design aspects simultaneously, and the possible barriers that need to be removed to achieve this goal.
- Published
- 2019
10. Deep neural networks compiler for a trace-based accelerator (short WIP paper)
- Author
-
Aliasger Zaidy, Eugenio Culurciello, Lukasz Burzawa, and Andre Xian Ming Chang
- Subjects
020203 distributed computing ,business.industry ,Computer science ,Dataflow ,Deep learning ,Image processing ,Memory bandwidth ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Computer Graphics and Computer-Aided Design ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,Artificial intelligence ,Compiler ,business ,Field-programmable gate array ,computer ,Software ,0105 earth and related environmental sciences ,TRACE (psycholinguistics) - Abstract
Deep Neural Networks (DNNs) are the algorithm of choice for image processing applications. DNNs present highly parallel workloads that lead to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized in different tasks require a programmable custom hardware and a compiler/mapper to efficiently translate different DNNs into an efficient dataflow in the accelerator. The goal of this paper is to present a compiler for running DNNs on Snowflake, which is a programmable hardware accelerator that targets DNNs. The compiler correctly generates instructions for various DL models: AlexNet, VGG, ResNet and LightCNN9. Snowflake, with a varying number of processing units, was implemented on FPGA to measure the compiler and Snowflake performance properties upon scaling up. The system achieves 70 frames/s and 4.5 GB/s of off-chip memory bandwidth for AlexNet without linear layers on Xilinx’s Zynq-SoC XC7Z045 FPGA.
- Published
- 2018
11. HPDM: A Survey Paper
- Author
-
Li Wang
- Subjects
MIMD ,Focus (computing) ,Computer architecture ,Shared memory ,Workstation ,law ,Computer science ,Carry (arithmetic) ,Component (UML) ,Parallelism (grammar) ,SIMD ,law.invention - Abstract
This survey reviews several approaches of HPDM from many research groups world wide. Modern computer hardware supports the development of high-performance applications for data analysis on many different levels. The focus is on modern multi-core processors built into today's commodity computers, which are typically found at university institutes both as small server and workstation computers. So they are deliberately not high-performance computers. Modern multi-core processors consist of several (2 to over 100) computer cores, which work independently of each other according to the principle of "multiple instruction multiple data'' (MIMD). They have a common main memory (shared memory). Each of these computer cores has several (2-16) arithmetic-logic units, which can simultaneously carry out the same arithmetic operation on several data in a vector-like manner (single instruction multiple data, SIMD). HPDM algorithms must use both types of parallelism (SIMD and MIMD), with access to the main memory (centralized component) being the main barrier to increased efficiency.
- Published
- 2020
12. LSOracle: a Logic Synthesis Framework Driven by Artificial Intelligence: Invited Paper
- Author
-
Pierre-Emmanuel Gaillardon, Luca Amaru, Max Austin, Scott Temple, Xifan Tang, and Walter Lau Neto
- Subjects
Standard cell ,Computer science ,Context (language use) ,02 engineering and technology ,Integrated circuit ,020202 computer hardware & architecture ,law.invention ,Logic synthesis ,Computer architecture ,Application-specific integrated circuit ,law ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Electronic design automation ,Hardware_LOGICDESIGN ,Electronic circuit - Abstract
The increasing complexity of modern Integrated Circuits (ICs) leads to systems composed of various different Intellectual Property (IPs) blocks, known as System-on-Chip (SoC). Such complexity requires strong expertise from engineers, that rely on expansive commercial EDA tools. To overcome such a limitation, an automated open-source logic synthesis flow is required. In this context, this work proposes LSOracle: a novel automated mixed logic synthesis framework. LSOracle is the first to exploit state-of-the-art And-Inverter Graph (AIG) and Majority-Inverter Graph (MIG) logic optimizers and relies on a Deep Neural Network (DNN) to automatically decide which optimizer should handle different portions of the circuit. To do so, LSOracle applies $k-way$ partitioning to split a DAG into multiple partitions and uses a to chose the best-fit optimizer. Post-tech mapping ASIC results, targeting the 7nm ASAP standard cell library, for a set of mixed-logic circuits, show an average improvement in area-delay product of 6.87% (up to 10.26%) and 2.70% (up to 6.27%) when compared to AIG and MIG, respectively. In addition, we show that for the considered circuits, LSOracle achieves an area close to AIGs (which delivered smaller circuits) with a similar performance of MIGs, which delivered faster circuits.
- Published
- 2019
13. Ultra-Low Power and Minimal Design Effort Interfaces for the Internet of Things: Invited paper
- Author
-
Orazio Aiello, Paolo Stefano Crovetti, and Massimo Alioto
- Subjects
Ultra low power ,Computer science ,business.industry ,020208 electrical & electronic engineering ,Design flow ,Digital-to-analog converter ,Reconfigurability ,020206 networking & telecommunications ,02 engineering and technology ,law.invention ,Software portability ,Computer architecture ,law ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Internet of Things ,business - Abstract
This paper reviews the results of recent researches aimed to extend the standard-cell based digital design flow to analog building blocks, so that to enhance scalability, reconfigurability and portability across technology nodes and to reduce design effort, time-to-market and costs. In this framework, the application of the proposed fully digital design approach to a wake up oscillator and to a Digital-to-Analog Converter, which are two building blocks widely employed in IoT sensor nodes, is illustrated in detail.
- Published
- 2019
14. Short Paper: Neuromorphic Chip Embedded Electronic Systems to Expand Artificial Intelligence
- Author
-
Hamid Abdi and Lahiru L. Abeysekara
- Subjects
medicine.anatomical_structure ,Neuromorphic engineering ,Artificial neural network ,Application-specific integrated circuit ,Computer architecture ,Computer science ,medicine ,Human brain ,Applications of artificial intelligence ,Electronics ,Electronic hardware ,Chip - Abstract
Neuromorphic chips are electronic hardware mimicking neurons in human brain in an electronic structure. These ASICs (Application Specific Integrated Circuits) provide artificial neural networks with computational power comparatively higher than most neural networks generated by software algorithms. 'CM1K' is an electronic chip in this family of products. It has a parallel neural network of 1024 neurons. These neurons provide K-Nearest Neighbor (KNN) data classification. The chip requires to be embedded in an electronic system to access all its capabilities. This paper deliver a novel hardware system embedding CM1K neuromorphic chip. The system was implemented in image and video frame analysis for evaluation. The results prove that the system could benefit various applications including security, asset management, home appliances, mail sorting and manufacturing. Since the embedded system provide opportunity to integrate AI in to simple electronics, it helps on extending AI applications.
- Published
- 2019
15. Exploiting reconfigurable computing in 5G: a case study of latency critical function: Invited Paper
- Author
-
Piero Castoldi, F. Civerchia, Maxime Pelcat, Luca Valcarenghi, Scuola Universitaria Superiore Sant'Anna [Pisa] (SSSUP), Institut d'Electronique et de Télécommunications de Rennes (IETR), Université de Nantes (UN)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Institut Pascal - Clermont Auvergne (IP), Sigma CLERMONT (Sigma CLERMONT)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA), Institut d'Électronique et des Technologies du numéRique (IETR), Université de Nantes (UN)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), and Nantes Université (NU)-Université de Rennes 1 (UR1)
- Subjects
OpenCL ,business.industry ,Orthogonal frequency-division multiplexing ,Computer science ,Hardware Acceleration ,030204 cardiovascular system & hematology ,Reconfigurable computing ,[SPI]Engineering Sciences [physics] ,03 medical and health sciences ,0302 clinical medicine ,Software ,Computer architecture ,Reconfigurable Computing ,Hardware acceleration ,030212 general & internal medicine ,Mobile telephony ,Latency (engineering) ,business ,Field-programmable gate array ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,5G ,ComputingMilieux_MISCELLANEOUS - Abstract
The fifth generation of mobile communications (5G) is expected to dramatically improve performance compared to preceding standards by offering very high bandwidths and low latencies. To provide this performance, heavy processing is required and must meet strong timing constraints. Reconfigurable computing, managing processing in software and exploiting reconfigurable hardware acceleration, is an innovative approach that should be considered for 5G for its capacity to combine high throughput and high flexibility. This paper presents a case study for Orthogonal Frequency Division Multiplexing (OFDM) computation reconfigurable offloading onto an Field Programmable Gate Array (FPGA). The implementation is based on Open Computing Language (OpenCL) that represents a versatile solution, as this language can be compiled for several architectures, provided that a Host+Accelerator structure is used. The objective of our study is to demonstrate that, by means of hardware offloading, the 5G architecture resources can reach high computational load, avoiding processing stalls and latency increase. Results show that around 15% of the software processing can be freed through hardware acceleration and reallocated to support other tasks.
- Published
- 2019
16. Full-chip monolithic 3D IC design and power performance analysis with ASAP7 library: (Invited Paper)
- Author
-
Bon Woong Ku, Sung Kyu Lim, Kyungwook Chang, and Saurabh Sinha
- Subjects
Computer architecture ,Computer science ,020208 electrical & electronic engineering ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Power performance ,Process design ,Node (circuits) ,02 engineering and technology ,Chip ,3d ic design ,020202 computer hardware & architecture - Abstract
In this paper, we present full-chip designs and their power, performance, and area (PPA) metrics using the ASAP7 process design kit (PDK) and library. Reliable cell library is a key element in evaluating new technological options such as monolithic 3D (M3D) ICs. Given an RTL, we conduct synthesis and place/route to obtain commercial-quality 2D and M3D IC designs and compare PPA. The ASAP7 library is highly useful to build high-quality designs that accurately reflect 7nm technology node. In addition, the full front-end and back-end access provided in ASAP7 allows us to see the impact of various device and interconnect parameters at the full-chip level for both 2D and monolithic 3D ICs. This work demonstrates the critical role of an academic PDK and library in enabling high-quality research in disruptive technologies such as M3D integration.
- Published
- 2017
17. ASAP7 predictive design kit development and cell design technology co-optimization: Invited paper
- Author
-
Vinay Vashishtha, Manoj Vangala, and Lawrence T. Clark
- Subjects
010302 applied physics ,Standard cell ,Computer science ,Extreme ultraviolet lithography ,Process design ,01 natural sciences ,010309 optics ,Computer architecture ,0103 physical sciences ,Parasitic extraction ,Place and route ,Physical design ,Routing (electronic design automation) ,Lithography - Abstract
This work discusses the ASAP7 predictive process design kit (PDK) and associated standard cell library. The necessity for multi-patterning (MP) techniques at advanced nodes results in the standard cell and SRAM architecture becoming entangled with design rules, mandating design-technology co-optimization (DTCO). This paper discusses the DTCO process involving standard cell physical design. An assumption of extreme ultraviolet (EUV) lithography availability in the PDK allows bi-directional M1 geometries that are difficult with MP. Routing and power distribution schemes for self-aligned quadruple patterning (SAQP) friendly, high density standard cell based blocks are shown. Restrictive design rules are required and supported by the automated place and route (APR) setup. Supporting sub-20 nm dimensions with academic tool licenses is described. The APR (QRC techfile) extraction shows high correlation with the Calibre extraction deck. Finally, use of the PDK for academic coursework and research is discussed.
- Published
- 2017
18. Standard cell library design and optimization methodology for ASAP7 PDK: (Invited paper)
- Author
-
Andrew Evans, Brian Cline, Xiaoqing Xu, Saurabh Sinha, Greg Yeric, and Nishi Shah
- Subjects
Standard cell ,Computer science ,Transistor ,Process design ,02 engineering and technology ,Integrated circuit ,021001 nanoscience & nanotechnology ,020202 computer hardware & architecture ,law.invention ,Computer architecture ,law ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Node (circuits) ,0210 nano-technology ,Design methods - Abstract
Standard cell libraries are the foundation for the entire back-end design and optimization flow in modern application-specific integrated circuit designs. At 7nm technology node and beyond, standard cell library design and optimization is becoming increasingly difficult due to extremely complex design constraints, as described in the ASAP7 process design kit (PDK). Notable complexities include discrete transistor sizing due to FinFETs, complicated design rules from lithography and restrictive layout space from modern standard cell architectures. The design methodology presented in this paper enables efficient and high-quality standard cell library design and optimization with the ASAP7 PDK. The key techniques include exhaustive transistor sizing for cell timing optimization, transistor placement with generalized Euler paths and back-end design prototyping for library-level explorations.
- Published
- 2017
19. Multi-broker based software-defined optical networks (Invited paper)
- Author
-
Xiaoliang Chen, Andrea Castro, Roberto Proietti, S.J.B. Yoo, and Zuqing Zhu
- Subjects
Network control ,business.industry ,Computer science ,Quality of service ,Topology (electrical circuits) ,02 engineering and technology ,Blocking (statistics) ,Service provisioning ,Reduction (complexity) ,020210 optoelectronics & photonics ,Software ,Computer architecture ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,business - Abstract
This paper investigates the multi-broker based network control and management paradigm for realizing scalable and cost-effective service provisioning in multi-domain software-defined optical networks. Experimental results verify the feasibility of the proposal and demonstrate ∼ 7.6× blocking reduction comparing with the conventional single-broker based solution.
- Published
- 2017
20. Generating FPGA-based image processing accelerators with Hipacc: (Invited paper)
- Author
-
Richard Membarth, Oliver Reiche, Jürgen Teich, Frank Hannig, and M. Akif Ozkan
- Subjects
020203 distributed computing ,Source code ,Computer science ,media_common.quotation_subject ,Image processing ,02 engineering and technology ,computer.software_genre ,020202 computer hardware & architecture ,Domain (software engineering) ,Digital subscriber line ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Compiler ,Field-programmable gate array ,computer ,media_common ,Abstraction (linguistics) - Abstract
Domain-Specific Languages (DSLs) provide a high-level and domain-specific abstraction to describe algorithms within a certain domain concisely. Since a DSL separates the algorithm description from the actual target implementation, it offers a high flexibility among heterogeneous hardware targets, such as CPUs and GPUs. With the recent uprise of promising High-Level Synthesis (HLS) tools, like Vivado HLS and Altera OpenCL, FPGAs are becoming another attractive target architecture. Particularly in the domain of image processing, applications often come with stringent requirements regarding performance, energy efficiency, and power, for which FPGA have been proven to be among the most suitable architectures. In this work, we present the Hipacc framework, a DSL and source-to-source compiler for image processing. We show that domain knowledge can be captured to generate tailored implementations for C-based HLS from a common high-level DSL description targeting FPGAs. Our approach includes FPGA-specific memory architectures for handling point and local operators, as well as several high-level transformations. We evaluate our approach by comparing the resulting hardware accelerators to GPU implementations, generated from exactly the same DSL source code.
- Published
- 2017
21. 2017 International Symposium on Computer Architecture Influential Paper Award
- Author
-
David Brooks
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer architecture ,Hardware and Architecture ,Computer science ,Hardware_INTEGRATEDCIRCUITS ,Hardware_PERFORMANCEANDRELIABILITY ,Electrical and Electronic Engineering ,Software ,Hardware_LOGICDESIGN - Abstract
This article discusses the 2017 ACM SIGARCH/IEEE-CS TCCA Influential ISCA Paper Award, which was given to the 2002 ISCA paper, “Drowsy Caches: Simple Techniques for Reducing Leakage Power.”
- Published
- 2017
22. 4.2: Invited Paper: OLCD: a low cost, area‐scalable manufacturing process for flexible displays
- Author
-
Paul Cain
- Subjects
Computer architecture ,Manufacturing process ,Flexible display ,Computer science ,Scalability - Published
- 2019
23. 2014 International Symposium on Computer Architecture Influential Paper Award; 2014 Maurice Wilkes Award Given to Ravi Rajwar
- Author
-
Dean M. Tullsen and Stephen W. Keckler
- Subjects
Computer architecture ,Hardware and Architecture ,Computer science ,Electrical and Electronic Engineering ,ComputingMilieux_MISCELLANEOUS ,GeneralLiterature_MISCELLANEOUS ,Software - Abstract
This column discusses two awards given in 2014: the International Symposium on Computer Architecture Influential Paper Award, which was given to the authors of the paper "PipeRench: A Coprocessor for Streaming Multimedia Acceleration," and the Maurice Wilkes Award, which was given to Ravi Rajwar.
- Published
- 2014
24. Performance analysis and benchmarking of all-spin spiking neural networks (Special session paper)
- Author
-
Kaushik Roy, Aayush Ankit, and Abhronil Sengupta
- Subjects
010302 applied physics ,Spiking neural network ,Network complexity ,Speedup ,Artificial neural network ,Computer science ,business.industry ,Node (networking) ,02 engineering and technology ,021001 nanoscience & nanotechnology ,01 natural sciences ,Bottleneck ,Synapse ,Computer architecture ,Embedded system ,0103 physical sciences ,Benchmark (computing) ,Crossbar switch ,0210 nano-technology ,business - Abstract
Spiking Neural Network based brain-inspired computing paradigms are becoming increasingly popular tools for various cognitive tasks. The sparse event-driven processing capability enabled by such networks can be potentially appealing for implementation of low-power neural computing platforms. However, the parallel and memory-intensive computations involved in such algorithms is in complete contrast to the sequential fetch, decode, execute cycles of conventional von-Neumann processors. Recent proposals have investigated the design of spintronic “in-memory” crossbar based computing architectures driving “spin neurons” that can potentially alleviate the memory-access bottleneck of CMOS based systems and simultaneously offer the prospect of low-power inner product computations. In this article, we perform a rigorous system-level simulation study of such All-Spin Spiking Neural Networks on a benchmark suite of 6 recognition problems ranging in network complexity from 10k–7.4M synapses and 195–9.2k neurons. System level simulations indicate that the proposed spintronic architecture can potentially achieve ∼1292× energy efficiency and ∼ 235× speedup on average over the benchmark suite in comparison to an optimized CMOS implementation at 45nm technology node.
- Published
- 2017
25. A PAPER SURVEY ON THE IMPLEMENTATION OF THE PARALLEL FDTD ON MULTIPROCESSORS USING MPI
- Author
-
Oyku Akaydin, Adamu Abubakar Isah, and Mehmet Kusaf
- Subjects
Computer architecture ,Computer science ,Interface (computing) ,010401 analytical chemistry ,0202 electrical engineering, electronic engineering, information engineering ,Local area network ,Finite-difference time-domain method ,020206 networking & telecommunications ,02 engineering and technology ,General Medicine ,Parallel computing ,01 natural sciences ,0104 chemical sciences - Abstract
The research work explains a cost-effective, highperformance computing platform for the parallel implementation of the FDTD algorithm on PC clusters using the message-passing interface (MPI) library, which is a local area network system consisting of multiple interconnected personal computers (PCs), and is already widely employed for parallel computing.
- Published
- 2017
26. Hybrid large-area systems and their interconnection backbone (invited paper)
- Author
-
Warren Rieutort-Louis, Yu Hen Hu, Josue Sanz-Robinson, Naveen Verma, Tiffany Moy, Liechao Huang, Yasmin Afsar, Sigurd Wagner, Levent E. Aygun, and James C. Sturm
- Subjects
010302 applied physics ,Interconnection ,business.industry ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,Integrated circuit ,Modular design ,01 natural sciences ,law.invention ,CMOS ,Computer architecture ,law ,Hybrid system ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Electronics ,Telecommunications ,business - Abstract
Hybrid systems combine Large-Area Electronics (LAE) with high-performance technologies (e.g., silicon CMOS) [1]. With architectural concepts for hybrid systems broadening to match the range of emerging applications, this paper examines modular approaches for multi-sheet, multi-technology integration. It identifies the interfaces required as a critical backbone. For interfaces associated with various system functionalities (sensing, processing, powering), specific approaches are surveyed and analyzed, taking from insights derived from several previous experimental demonstrations of complete hybrid systems.
- Published
- 2016
27. Hardware optimizations for crypto implementations (Invited paper)
- Author
-
Sandeep K. Shukla and M. Mohamed Asan Basiri
- Subjects
Very-large-scale integration ,Cryptographic primitive ,Computer science ,business.industry ,020208 electrical & electronic engineering ,02 engineering and technology ,Fault injection ,Encryption ,Multiplexing ,Multiplexer ,020202 computer hardware & architecture ,Computer architecture ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Side channel attack ,Elliptic curve cryptography ,business ,Computer hardware - Abstract
Latency, Area, and Power are three important metrics that a VLSI designer wants to optimize. However, often one of these may have to be optimized at the cost of another or the other two. Depending on the application scenario, choice of the metric to optimize is made. In this paper, we consider hardware implementations of a number of cryptographic primitives and present a number of optimizations. We consider three areas of cryptoengineering. They are building physical unclonable functions (PUFs), implementing encryption/decryption algorithms, and side channel proof crypto implementations. The techniques we employ range from area optimization through customized multiplexer design, fusing multiple operations into a single hardware element, folding and unrolling of iterative algorithms, creating reconfigurable implementations to achieve multiple operations with the same set of hardware elements, to techniques of obfuscation to defeat fault injection based attacks on the crypto implementation. All the proposed and existing designs are implemented with 45 nm CMOS library.
- Published
- 2016
28. Neuromorphic hardware acceleration enabled by emerging technologies (Invited paper)
- Author
-
Mengjie Mao, Qing Wu, Yi Chen, Xiaoxiao Liu, Hai Li, and Mark Bamell
- Subjects
Speedup ,Artificial neural network ,business.industry ,Computer science ,Symmetric multiprocessor system ,Memristor ,law.invention ,symbols.namesake ,Neuromorphic engineering ,Computer architecture ,law ,Embedded system ,Scalability ,symbols ,Unconventional computing ,business ,Von Neumann architecture - Abstract
The explosion of big data applications imposes severe challenges of data processing speed and scalability on traditional computer systems. However, the performance of the von Neumann machine is greatly hindered by the increasing performance gap between CPU and memory, motivating the active research on new or alternative computing architectures. As one important instance, neuromorphic computing systems inspired by the working mechanism of human brains have gained considerable attention. In this work, we propose a heterogeneous computing system with neuromorphic computing accelerators (NCAs) that are built with emerging memristor technology. In the proposed system, NCA is designed to speed up the artificial neural network (ANN) executions in many high-performance applications by leveraging the extremely efficient mixed-signal computation capability of nanoscale memristor-based crossbar (MBC) arrays. The hierarchical MBC arrays of the NCA can be flexibly configured to different ANN topologies through the help of an analog Network-on-Chip (A-NoC). A general approach which translates the target codes within a program to the corresponding NCA instructions is also developed to facilitate the utilization of the NCA. Our simulation results show that compared to the baseline general purpose processor, the proposed heterogeneous system can achieve on average 18.2x performance speedup and 20.1x energy reduction over nine representative applications while constraining the computation accuracy degradation within an acceptable range.
- Published
- 2014
29. An Integrated, Scalable, Electronic Video Consent Process to Power Precision Health Research: Large, Population-Based, Cohort Implementation and Scalability Study
- Author
-
Clara Lajonchere, Arash Naeim, Sarah Dry, Neil Wenger, David Elashoff, Sitaram Vangala, Antonia Petruse, Maryam Ariannejad, Clara Magyar, Liliana Johansen, Gabriela Werre, Maxwell Kroloff, and Daniel Geschwind
- Subjects
Adult ,data collection ,Adolescent ,Computer science ,precision medicine ,Large population ,Health Informatics ,privacy ,video ,electronic consent ,Cohort Studies ,biobanking ,research methods ,Humans ,scalability ,validation ,Original Paper ,patient privacy ,research ,Informed Consent ,Power (physics) ,recruitment ,Computer architecture ,Scalability ,Cohort ,consent ,clinical data ,eHealth ,Preprint ,Electronics ,population health ,Laboratories, Clinical - Abstract
Background Obtaining explicit consent from patients to use their remnant biological samples and deidentified clinical data for research is essential for advancing precision medicine. Objective We aimed to describe the operational implementation and scalability of an electronic universal consent process that was used to power an institutional precision health biobank across a large academic health system. Methods The University of California, Los Angeles, implemented the use of innovative electronic consent videos as the primary recruitment tool for precision health research. The consent videos targeted patients aged ≥18 years across ambulatory clinical laboratories, perioperative settings, and hospital settings. Each of these major areas had slightly different workflows and patient populations. Sociodemographic information, comorbidity data, health utilization data (ambulatory visits, emergency room visits, and hospital admissions), and consent decision data were collected. Results The consenting approach proved scalable across 22 clinical sites (hospital and ambulatory settings). Over 40,000 participants completed the consent process at a rate of 800 to 1000 patients per week over a 2-year time period. Participants were representative of the adult University of California, Los Angeles, Health population. The opt-in rates in the perioperative (16,500/22,519, 73.3%) and ambulatory clinics (2308/3390, 68.1%) were higher than those in clinical laboratories (7506/14,235, 52.7%; P Conclusions This is one of the few large-scale, electronic video–based consent implementation programs that reports a 65.5% (26,314/40,144) average overall opt-in rate across a large academic health system. This rate is higher than those previously reported for email (3.6%) and electronic biobank (50%) informed consent rates. This study demonstrates a scalable recruitment approach for population health research.
- Published
- 2021
30. ASIP design for multiuser MIMO broadcast precoding
- Author
-
Olli Silven, Markku Juntti, and Shahriar Shahabuddin
- Subjects
Computer science ,business.industry ,ASIC ,MU-MIMO ,MIMO ,Precoding ,020206 networking & telecommunications ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,Transport triggered architecture ,020202 computer hardware & architecture ,Scheduling (computing) ,Base station ,ASIP ,Computer architecture ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Zero-forcing precoding ,Dirty paper coding ,Single-core ,TTA ,business - Abstract
This paper presents an application-specific instruction-set processor (ASIP) for multiuser multiple-input multiple-output (MU-MIMO) broadcast precoding. The ASIP is designed for a base station (BS) with four antennas to perform user scheduling and precoding. Transport triggered architecture (TTA) is used as the processor template and high level language is used to program the ASIP. Several special function units (SFU) are designed to accelerate norm-based greedy user scheduling and minimum-mean square error (MMSE) precoding. We also program zero forcing dirty paper coding (ZF-DPC) to demonstrate the reusability of the ASIP. A single core provides a throughput of 52.17 Mbps for MMSE precoding and takes an area of 87.53 kgates at 200 MHz on 90 nm technology.
- Published
- 2017
31. Extending the Flexibility of Case-Based Design Support Tools: A Use Case in the Architectural Domain
- Author
-
Klaus-Dieter Althoff, Viktor Ayzenshtadt, Frank Petzold, Christoph Langenhan, Andreas Dengel, and Syed Saqib Bukhari
- Subjects
Conceptualization ,Computer science ,business.industry ,User modeling ,0211 other engineering and technologies ,02 engineering and technology ,Building design ,Business process modeling ,Architectural geometry ,Business Process Model and Notation ,Computer architecture ,021105 building & construction ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Architecture ,Software engineering ,business ,Paper prototyping - Abstract
This paper presents results of a user study into extending the functionality of an existing case-based search engine for similar architectural designs to a flexible process-oriented case-based support tool for the architectural conceptualization phase. Based on a research examining the target group’s (architects) thinking and working processes during the early conceptualization phase (especially during the search for similar architectural references), we identified common features for defining retrieval strategies for a more flexible case-based search for similar building designs within our system. Furthermore, we were also able to infer a definition for implementing these strategies into the early conceptualization process in architecture, that is, to outline a definition for this process as a wrapping structure for a user model. The study was conducted among the target group representatives (architects, architecture students and teaching personnel) by means of applying the paper prototyping method and Business Processing Model and Notation (BPMN). The results of this work are intended as a foundation for our upcoming research, but we also think it could be of wider interest for the case-based design research area.
- Published
- 2017
32. A Roadmap and Plan of Action for Community-Supported Empirical Evaluation in Computer Architecture
- Author
-
Daniel Mosse, Bruce R. Childers, and Alex K. Jones
- Subjects
Computer architecture ,Computer science ,Interoperability ,General Earth and Planetary Sciences ,Leverage (statistics) ,Position paper ,General Environmental Science - Abstract
A framework of open interoperable simulators for computer architecture is long overdue. Today there are many separate, uncoordinated efforts to develop simulation and modeling artifacts (tools) for computer architecture research. The artifacts are used to empirically evaluate new computer architecture innovations and compare them with the state of the art. The artifacts are usually developed by individual groups, often for a specific purpose, and may not be publicly released. Consequently, it is difficult to leverage investment in artifact development and to repeat or reproduce experiments. In this position paper, we present recommendations and a roadmap for sharing and building open-source, interoperable simulation and modeling artifacts. The recommendations are the outcome of a community workshop involving industry, government and academia to determine how to coordinate effort, share tools and improve methodology.
- Published
- 2015
33. Analysis of High Performance Applications Using Workload Requirements
- Author
-
Giacomo V. Mc Evoy, Mariza Ferro, and Bruno Schulze
- Subjects
010101 applied mathematics ,Computer architecture ,Computer science ,Short paper ,Virtual cluster ,Performance prediction ,Workload ,010103 numerical & computational mathematics ,0101 mathematics ,Virtualization ,computer.software_genre ,01 natural sciences ,computer - Abstract
This short paper proposes two novel methodologies for analyzing scientific applications in distributed environments, using workload requirements. The first explores the impact of features such as problem size and programming language, over different computational architectures. The second explores the impact of mapping virtual cluster resources on the performance of parallel applications.
- Published
- 2017
34. Tree-Shaped Formats of Address Programming Language
- Author
-
Yuschenko, Yury
- Subjects
масиви ,програмування ,dereference operator ,programming history ,programming ,Ф-операція ,Адресна мова програмування ,списки ,arrays ,ЕОМ «Київ» ,computer architecture ,abstract data types ,деревоподібні формати ,dereference ,indirection ,tree-shaped formats ,computer “Kyiv” ,pointers ,Address Programming ,абстрактні типи даних ,вказівники ,історія програмування ,multiply indirection ,IT history ,історія ІТ ,Адресне програмування ,комп’ютер «Київ» ,Address Programming Language ,Ф-operation ,lists ,архітектура комп’ютерів - Abstract
In the Address Programming Language (1955), the concept of indirect addressing of higher ranks (Pointers) was introduced, which allows the arbitrary connection of the computer’s RAM cells. This connection is based on standard sequences of the cell addresses in RAM and addressing sequences, which is determined by the programmer with indirect addressing. Two types of sequences allow programmers to determine an arbitrary connection of RAM cells with the arbitrary content: data, addresses, subroutines, program labels, etc. Therefore, the formed connections of cells can relate to each other. The result of connecting cells with the arbitrary content and any structure is called tree-shaped formats. Tree-shaped formats allow programmers to combine data into complex data structures that are like abstract data types. For tree-shaped formats, the concept of “review scheme” is defined, which is like the concept of “bypassing” trees. Programmers can define multiple overview diagrams for the one tree-shaped format. Programmers can create tree-shaped formats over the connected cells to define the desired overview schemes for these connected cells. The work gives a modern interpretation of the concept of tree-shaped formats in Address Programming. Tree-shaped formats are based on “stroke-operation” (pointer dereference), which was hardware implemented in the command system of computer “Kyiv”. Group operations of modernization of computer “Kyiv” addresses accelerate the processing of tree-shaped formats and are designed as organized cycles, like those in high-level imperative programming languages. The commands of computer “Kyiv”, due to operations with indirect addressing, have more capabilities than the first high-level programming language – Plankalkül. Machine commands of the computer “Kyiv” allow direct access to the i-th element of the “list” by its serial number in the same way as such access is obtained to the i-th element of the array by its index. Given examples of singly linked lists show the features of tree-shaped formats and their differences from abstract data types. The article opens a new branch of theoretical research, the purpose of which is to analyze the expe- diency of partial inclusion of Address Programming in modern programming languages., В Адресному програмуванні було введено поняття опосередкованої адресації вищих рангів (Pointers), яка дає змогу довільним чином з’єднувати комірки оперативної пам’яті комп’ютера. Засадами цього з’єднання є стандартне слідування адрес комірок в оперативній пам’яті та адресне слідування, яке визначається опосередкованою адресацією. Використання двох відношень дає можливість визначати довільне об’єднання комірок пам’яті з будь-яким їх змістом. Машинні команди комп’ютера «Київ» надають прямий доступ до елементу «списку» за його порядковим номером. У статті на прикладах однозв’язних «списків» продемонстровано особливості деревоподібних форматів та їх відмінності від абстрактних типів даних. Стаття відкриває нову галузь теоретичних досліджень, метою яких є аналіз доцільності включення у сучасні мови програмування окремих можливостей Адресного програмування.
- Published
- 2021
35. Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices
- Author
-
Meng-Fan Chang, Je-Min Hung, Chuan-Jia Jhang, Cheng-Xin Xue, and Fu-Chun Chang
- Subjects
Hardware_MEMORYSTRUCTURES ,Edge device ,Computer science ,Bottleneck ,Computing architecture ,symbols.namesake ,Computer architecture ,Memory wall ,symbols ,Static random-access memory ,Electrical and Electronic Engineering ,Macro ,Von Neumann architecture ,Efficient energy use - Abstract
When applied to artificial intelligence edge devices, the conventionally von Neumann computing architecture imposes numerous challenges (e.g., improving the energy efficiency), due to the memory-wall bottleneck involving the frequent movement of data between the memory and the processing elements (PE). Computing-in-memory (CIM) is a promising candidate approach to breaking through this so-called memory wall bottleneck. SRAM cells provide unlimited endurance and compatibility with state-of-the-art logic processes. This paper outlines the background, trends, and challenges involved in the further development of SRAM-CIM macros. This paper also reviews recent silicon-verified SRAM-CIM macros designed for logic and multiplication-accumulation (MAC) operations.
- Published
- 2021
36. Incremental Delta-Sigma ADCs: A Tutorial Review
- Author
-
Zhichao Tan, Youngcheol Chae, Chia-Hung Chen, and Gabor C. Temes
- Subjects
Decimation ,Computer science ,020208 electrical & electronic engineering ,020206 networking & telecommunications ,Topology (electrical circuits) ,02 engineering and technology ,Delta-sigma modulation ,Computer architecture ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Electrical and Electronic Engineering ,Efficient energy use ,Block (data storage) - Abstract
In many sensor applications, a high-resolution analog-to-digital converter (ADC) is a key block. The use of an incremental delta-sigma ADC (IADC) is often well suited for such applications. While the energy-efficiency of IADCs has improved by several orders of magnitude over the past decade, the implementation of high performance IADCs, especially in battery-powered systems, is still challenging. This paper presents a tutorial review on energy-efficient IADCs and addresses the progress in this area. This paper describes the fundamentals of IADCs and energy-efficient hybrid IADC architectures. Various design techniques for improving the energy-efficiency of the IADCs are described. This paper is intended to serve as a starting point for the development of a new energy-efficient IADC.
- Published
- 2020
37. The Development of Silicon for AI: Different Design Approaches
- Author
-
Hoi-Jun Yoo, Jinmook Lee, Sungpill Choi, and Kyuho Jason Lee
- Subjects
Digital electronics ,Artificial neural network ,Machine vision ,business.industry ,Computer science ,media_common.quotation_subject ,020208 electrical & electronic engineering ,02 engineering and technology ,Perceptron ,020202 computer hardware & architecture ,Neuromorphic engineering ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Function (engineering) ,business ,Implementation ,media_common ,Block (data storage) - Abstract
This paper provides a review of design approaches towards artificial intelligence (AI) System-on-Chip. AI algorithms have progressed over the past decades from perceptron-based neural network (NN) and neuro-fuzzy (NF) system to today’s deep neural network (DNN) and neuromorphic computing. Recent DNN hardware accelerators focus on energy-efficient integration of digital circuits to realize real-time DNN operation while neuromorphic processors deploy new memory technologies with analog computation for low power consumption. However, different design approaches can be applied to such processor implementation with their pros and cons. This paper reviews from the early processor designs for NN and NF in both mixed-mode and digital implementations to the recent DNN SoC designs that we have proposed for a decade. The former content deals with NN and NF processors used as a functional building block of a machine vision SoC, while the latter concentrates on integration of the whole DNN function. We also provide a discussion on the approaches, and provide perspective on future research directions.
- Published
- 2020
38. Ultra-Low-Power FDSOI Neural Circuits for Extreme-Edge Neuromorphic Intelligence
- Author
-
Rubino, Arianna, Livanelioglu, Can, Qiao, Ning, Payvand, Melika, Indiveri, Giacomo, University of Zurich, and Rubino, Arianna
- Subjects
FOS: Computer and information sciences ,Computer science ,B.7 ,C.3 ,FOS: Physical sciences ,Computer Science - Emerging Technologies ,02 engineering and technology ,symbols.namesake ,Computer Science::Emerging Technologies ,0202 electrical engineering, electronic engineering, information engineering ,Neural and Evolutionary Computing (cs.NE) ,Edge computing ,silicon neurons ,FDSOI ,ultralow-power ,slow synaptic dynamics ,IoT ,real-time ,analog circuit ,Electrical and Electronic Engineering ,10194 Institute of Neuroinformatics ,Electronic circuit ,Spiking neural network ,Digital electronics ,Quantitative Biology::Neurons and Cognition ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Computer Science - Neural and Evolutionary Computing ,Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Emerging Technologies (cs.ET) ,Computer architecture ,Neuromorphic engineering ,Integrator ,Logic gate ,symbols ,570 Life sciences ,biology ,business ,Adaptation and Self-Organizing Systems (nlin.AO) ,Von Neumann architecture - Abstract
Recent years have seen an increasing interest in the development of artificial intelligence circuits and systems for edge computing applications. In-memory computing mixed-signal neuromorphic architectures provide promising ultra-low-power solutions for edge-computing sensory-processing applications, thanks to their ability to emulate spiking neural networks in real-time. The fine-grain parallelism offered by this approach allows such neural circuits to process the sensory data efficiently by adapting their dynamics to the ones of the sensed signals, without having to resort to the time-multiplexed computing paradigm of von Neumann architectures. To reduce power consumption even further, we present a set of mixed-signal analog/digital circuits that exploit the features of advanced Fully-Depleted Silicon on Insulator (FDSOI) integration processes. Specifically, we explore the options of advanced FDSOI technologies to address analog design issues and optimize the design of the synapse integrator and of the adaptive neuron circuits accordingly. We present circuit simulation results and demonstrate the circuit's ability to produce biologically plausible neural dynamics with compact designs, optimized for the realization of large-scale spiking neural networks in neuromorphic processors., 11 pages, 9 figures, TCAS submission
- Published
- 2021
- Full Text
- View/download PDF
39. Investigation of asynchronous pipeline circuits based on bundled-data encoding: Implementation styles, behavioral modeling, and timing analysis
- Author
-
Yu Zhou
- Subjects
Very-large-scale integration ,Handshaking ,Multidisciplinary ,Computer architecture ,Handshake ,Computer science ,Asynchronous communication ,Pipeline (computing) ,Static timing analysis ,Behavioral modeling ,Electronic circuit - Abstract
As VLSI technology enters the post-Moore era, there has been an increasing interest in asynchronous design because of its potential advantages in power consumption, electromagnetic emission, and automatic speed scaling capacity under supply voltage variations. In most practical asynchronous circuits, a pipeline forms the micro-architecture backbone, and its characteristics play a vital role in determining the overall circuit performance. In this paper, we investigate a series of typical asynchronous pipeline circuits based on bundled-data encoding, spanning different handshake signaling protocols such as 2-phase (micropipeline, Mousetrap, and Click), 4-phase (simple, semi-decoupled, and fully-decoupled), and single-track (GasP). An in-depth review of each selected circuit is conducted regarding the handshaking and data latching mechanisms behind the circuit implementations, as well as the analysis of its performance and timing constraints based on formal behavior models. Overall, this paper aims at providing a survey of asynchronous bundled-data pipeline circuits, and it will be a reference for designers interested in experimenting with asynchronous circuits.
- Published
- 2022
40. Innovative Folding Bed Cum Chair Based on IoT-Cloud Technology
- Author
-
Kaushal Rajesh, Badotra Sumit, Narayan Panda Surya, Singh Simranjeet, and Kumar Naveen
- Subjects
Control and Optimization ,Computer architecture ,Computer Networks and Communications ,Computer science ,business.industry ,Cloud computing ,Folding (DSP implementation) ,Electrical and Electronic Engineering ,Internet of Things ,business ,Computer Science Applications - Abstract
Aim: The method of utilization of IoT and other evolving techniques in medical equipment design field is discussed in the present paper. A remotely managed interface equipped in a wheelchair cum bed is embedded for elderly or physically challenged people. With the help of a camera embedded in the proposed solution, a real-time remote monitoring of the patient is achieved using an android application on the concerned person. For achieving the above mentioned purpose, the use of linear actuators has been done. This paper further aims to explore the hidden potentials of the merger of all these fields to benefit the end users. Objectives: Remote monitoring of health of the patient through a cloud-based android application. Automatic adjustment of the wheelchair into bed and vice-versa. Automatic stool passing chamber facility is available under the proposed model. Injuries during transportation of the patient from one chair to another (or chair to bed) have been limited in our designed model. Methods: The basic mechanism of proposed wheelchair has been designed using the computer-aided design software. The basic methodology adopted for development of prototype & subsequent “user review analysis” is displayed. The CAD Model of the wheelchair cum stretcher was designed using the “Solid works solid modelling techniques”. The basic structure has been designed with several modifications when compared to the conventional wheelchair design. The computer made design was then utilized for final fabrication of the prototype. The prototype was tested for endurance, load bearing capacity and customer comfort during various phases of development. The feedbacks of several subjects were recorded for future utilization in improved design & fabrication. Results: The proposed model is of utmost importance as the number of critical patients like accident cases, critical pre and post-surgery cases is increasing day by day. Sometimes these patients need intime medication during transition in ambulance while they are picked up from houses and referred to nearby big hospitals. During transition or in hospital, critical patients can be handled efficiently by a specialist doctor through his/her smart phone applications. It also optimizes the services of specialist doctors as we can find the shortage of specialists in Indian hospitals. In a nutshell, this WheelChair system can be moved anywhere due to its portability. Following are the most highlighted features: 1. Authorized relatives and Doctors can see and interact with the patient remotely at any time on his/her smart phone. 2. Authorized relatives and Doctors can see the Vital Sign of patient like BP, ECG, and Pulse etc. at any time through Smart Phone. 3. Doctor can instruct the caretaker to release the emergency drugs through Infusion Pump. 4. Doctor can plan the exceptions, drug infusion, alarm, etc. 5. System is portable and can easily be shifted to ambulance. All transmissions are wireless, so there is no hassle of wires and connectivity. Conclusion: The presented work is limited to the design and fabrication of a new model of wheelchair, which works as a stretcher and has locomotive capabilities. The key feature of the design is its versatility and adaptability to various working conditions. The feedback obtained from various subjects during the testing of wheelchair shows their confidence and a fair degree of comfort which they felt while using the wheelchair. The easy and user-friendly use of the android application helps to monitor the health of the patient. The smartphone camera helped to achieve this data. The analysis of the data can be done in the cloud-based station. The linear actuator has proved to be the low cost and highly reliable equipment to propel the wheelchair.
- Published
- 2022
41. A survey on improving the wireless communication with adaptive antenna selection by intelligent method
- Author
-
Chin-Feng Lai and ChienHsiang Wu
- Subjects
Beamforming ,Feature data ,Computer Networks and Communications ,Wireless network ,business.industry ,Computer science ,Phased array ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,Field (computer science) ,Transmission (telecommunications) ,Computer architecture ,Wireless ,Antenna (radio) ,business - Abstract
Transmission applications in wireless networks have brought unprecedented demands. The demand for high-performance wireless transmission is increasing day by day. Antenna technology is an indispensable part of the development of wireless communication. One potential solution is to resort t intelligent learning techniques to help breakthroughs in the limited antenna technical field. It is based on an adaptive antenna using intelligent learning. It has laid the foundation for signal strength adjustment to enhance wireless transmission efficiency. This paper evaluates the most advanced literature and techniques. A comprehensive description from different perspectives covers several adaptive antenna structures, including diversity antennas, phased array antennas, and beamforming specific learning methods. After that, this paper divides it into different categories, from intelligent learning algorithms and feature data perspectives in a different light to analyze and discuss. This article expects to help readers understand the latest intelligent technology based on adaptive antennas. Further, it sheds novel light on future research directions to meet the development needs of adaptive antennas for future wireless networks.
- Published
- 2022
42. A Review of Design Approaches for Enhancing the Performance of NoCs at Communication Centric Level
- Author
-
Roohie Naaz Mir, Misbah Manzoor, and Najeeb-ud-din Hakim
- Subjects
General Computer Science ,Computer architecture ,Computer science ,Hardware_INTEGRATEDCIRCUITS ,Hardware_PERFORMANCEANDRELIABILITY ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Latency (engineering) ,Throughput (business) - Abstract
As the trend of technology shrinking continues a vast amount of processors are being incorporated in a limited space. Due to this almost half of the chip area in Multi-Processor Systems-on-Chips (MPSoCs) is under interconnections, which pose a big problem for communication. Network-on-Chips (NoCs) evolved as a significant scalable solution for removing wiring congestion and communication problem in MPSoCs. NoCs provide the advantage of customized architecture, increased scalability and bandwidth. NoC is a structured framework where communication is the prime concern. In this review paper we present an overview of research and design approaches in the communication centric areas of NoCs. Here we have tried to discuss and iterate most of the available work done for communication in 2D NoCs. This paper gives the insight of different attributes and performance parameters of NoCs. Further it gives a detailed description of how topology, flow control and routing mechanisms can affect the qualitative aspects (performance) of NoCs. It then explains how various attributes of routing can help in increasing the efficacy of NoCs. Subsequently a brief review of different simulators used for NoCs is given. All of this is provided based on the survey of academic, theoretical and experimental approaches presented in the past. Finally some suggestions for future work are also given.
- Published
- 2021
43. Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication
- Author
-
Shuo Liu, Ray C. C. Cheung, Qiaoling Wang, Yao Liu, Junyi Zhang, and Wangchen Dai
- Subjects
Hardware architecture ,Speedup ,Computer architecture ,Computer science ,Retransmission ,Server ,Scalability ,Bandwidth (computing) ,Electrical and Electronic Engineering ,Field-programmable gate array ,Network topology - Abstract
The Ring-AllReduce framework is currently the most popular solution to deploy industry-level distributed machine learning tasks. However, only about half of the maximum bandwidth can be achieved in the optimal condition. In recent years, several in-network aggregation frameworks have been proposed to overcome the drawback, but limited hardware information have been disclosed. In this paper, we propose a scalable fully-pipelined architecture that handles tasks like forwarding, aggregation and retransmission with no bandwidth loss. The architecture is implemented on a Xilinx Ultrascale FPGA that connects to 8 working servers with 10 Gb/s network adapters, and it is able to scale to more complicated scenarios involving more workers. Compared with Ring-AllReduce, using AllReduce-Switch improves the efficient bandwidth of AllReduce communication with a ratio of $1.75\times $ . In image training tasks, the proposed hardware architecture helps to achieve up to $1.67\times $ speedup to the training process. For computing-intensive models, the speedup from communication may be partially hidden by computing. In particular, for ResNet-50, AllReduce-Switch improves the training process with MPI and NCCL by $1.30\times $ and $1.04\times $ respectively.
- Published
- 2021
44. Blockchain-based Service Orchestration for 5G Vertical Industries in Multi-Cloud Environment
- Author
-
Engin Zeydan, Jorge Baranda, Josep Nangues, Y. Turk, and S. B. Ozturk
- Subjects
Distributed ledger ,Cloud-computing ,Vertical service ,Internet of things ,Telecommunications infrastructures ,Network architecture ,Network management ,Block-chain ,Telecommunication services ,5G mobile communication systems ,Blockchain ,Multi-clouds ,Cloud environments ,Networks management ,Security ,Service orchestration ,Computer architecture - Abstract
Blockchain technologies are gradually being deployed in a variety of industries, including telecommunications. In this paper, due to the strict governance of telecommunication infrastructure, we propose a permissioned distributed ledger (PDL)-based blockchain supported architecture for a network management and orchestration platform. The work focuses on creating a trusted environment for both Cloud Service Providers (CSPs) and Mobile Network Operators (MNOs) for managing the lifecycle of network services (e.g., instantiation, scaling, termination, etc.) in a multi-cloud environment. We also validate our proposed approach with an experimental scenario using the Quorum blockchain network (BCN) to measure various performance metrics (e.g., number of transactions and blocks, time to write, and transactions per second) of different service orchestrator (SO)-related instantiation metrics. Our evaluation results show that the values for the service instantiation time and the corresponding BCN metrics can be completely different, suggesting that some logs arrive very quickly and generate a high transaction load, while others take longer and generate a low number of transactions. As a solution, at the end of the paper, we also provide some recommendations for appropriate optimizations during transfer of SO-related logs to BCNs and some observed challenges. IEEE
- Published
- 2022
45. Cost Effective Reconfigurable Architecture for Stream Processing Applications
- Author
-
Lev Kirischian, Valeri Kirischian, and Vadim Geurkov
- Subjects
Stream processing ,Computer architecture ,Application-specific integrated circuit ,business.industry ,Computer science ,Adaptive system ,Embedded system ,Image processing ,Algorithm design ,Architecture ,Field-programmable gate array ,business ,Reconfigurable computing - Abstract
This paper presents an approach for development of costeffective custom video / image processing systems. The approach utilizes the concept of temporal partitioning of resources in the partially reconfigurable FPGA devices. Paper proposes architecture of the multi-mode video-stream processor with cyclically reconfigurable structure. The cost-effectiveness of the proposed approach has been analyzed on the basis of experiments conducted on Multi-mode Adaptive Reconfigurable System (MARS) platform that was developed for that purpose. The video-processing cores associated with stereo-vision algorithms have been developed, tested and analyzed. The experiments have shown that the cost-effectiveness of the systems based on proposed approach can be better than the traditional approaches based on large statically configured FPGAs.
- Published
- 2022
46. A Newly Proposed Prospective and Robust Computer Networking Model Architecture Based on the Infrastructure of Cloud Computing Contrivance
- Author
-
Shivankur Thapliyal
- Subjects
Model architecture ,Computer architecture ,Computer science ,business.industry ,Cloud computing ,business - Abstract
Computer Networking Play’s a major role for data communication or data sharing and data transmissions from one location to another, which are geographically differ, but in today’s scenario where the main and primary major concerns are not to data transfer but also utilize all resources with greater efficiency and also preserves the confidentiality and integrity of the messages with respect to speed and time with lower Bandwidth and also consume a very low computational costs with low power supply and redirect to optimality. Cloud Computing also play’s a significant role to access data at geographically different locations. So In this paper we create a fusion of Computer Networking Architecture and Cloud Computing Architecture and released a very much superior fundamentally strong Cloud computing based Computer Networking model, which works on the concepts of ‘Virtualization’. Because when the number of hardware components (Servers) drastically increases all factors which are responsible to make possible networking among nodes are also consume each resources at extreme level, and networking becomes complex and slow, that’s why we used the concept of Virtual Machine. In this paper we proposed a Computer Networking model using the concepts of Cloud Computing. This model also suitable for data transmission but also take concern the most significant feature of Computer Networking, which is Data Security. This model also used some Proxy servers/ firewalls to take concern some security mechanisms. In this paper we also proposed Communication Oriented model among the Intercluster domains that how one node which belongs to another CLOUD cluster make possible communication among other InterCLOUD clusters with respect to data security measures. In this paper we proposed three models related to this networking model, which is CLOUD Networking Infrastructure, Connection Oriented model, Communication Oriented model. The detailed description of all three models are in the upcoming sections of this paper. Keywords: Cloud computing based computer networking model, A virtual model for computer networking, Computer Networking model based on virtualization, Virtualization based computer networking model.
- Published
- 2021
47. QoS aware web service selection using orthogonal array learning on fruit fly optimization approach
- Author
-
Manik Chandra and Rajdeep Niyogi
- Subjects
General Computer Science ,Computer architecture ,Computer science ,Orthogonal array ,Web service ,computer.software_genre ,computer ,Selection (genetic algorithm) ,Qos aware ,Theoretical Computer Science - Abstract
Purpose This paper aims to solve the web service selection problem using an efficient meta-heuristic algorithm. The problem of selecting a set of web services from a large-scale service environment (web service repository) while maintaining Quality-of-Service (QoS), is referred to as web service selection (WSS). With the explosive growth of internet services, managing and selecting the proper services (or say web service) has become a pertinent research issue. Design/methodology/approach In this paper, to address WSS problem, the authors propose a new modified fruit fly optimization approach, called orthogonal array-based learning in fruit fly optimizer (OL-FOA). In OL-FOA, they adopt a chaotic map to initialize the population; they add the adaptive DE/best/2mutation operator to improve the exploration capability of the fruit fly approach; and finally, to improve the efficiency of the search process (by reducing the search space), the authors use the orthogonal learning mechanism. Findings To test the efficiency of the proposed approach, a test suite of 2500 web services is chosen from the public repository. To establish the competitiveness of the proposed approach, it compared against four other meta-heuristic approaches (including classical as well as state-of-the-art), namely, fruit fly optimization (FOA), differential evolution (DE), modified artificial bee colony algorithm (mABC) and global-best ABC (GABC). The empirical results show that the proposed approach outperforms its counterparts in terms of response time, latency, availability and reliability. Originality/value In this paper, the authors have developed a population-based novel approach (OL-FOA) for the QoS aware web services selection (WSS). To justify the results, the authors compared against four other meta-heuristic approaches (including classical as well as state-of-the-art), namely, fruit fly optimization (FOA), differential evolution (DE), modified artificial bee colony algorithm (mABC) and global-best ABC (GABC) over the four QoS parameter response time, latency, availability and reliability. The authors found that the approach outperforms overall competitive approaches. To satisfy all objective simultaneously, the authors would like to extend this approach in the frame of multi-objective WSS optimization problem. Further, this is declared that this paper is not submitted to any other journal or under review.
- Published
- 2021
48. An overview of rate control techniques in HEVC and SHVC video encoding
- Author
-
Viswanathan Swaminathan, Saifullah Khalid, Ishfaq Ahmad, and Alexander Aved
- Subjects
Computer Networks and Communications ,business.industry ,Image quality ,Computer science ,Rate control ,Ranging ,Variety (cybernetics) ,Computer architecture ,Hardware and Architecture ,Video encoding ,Scalability ,Media Technology ,Internet of Things ,business ,Software ,Coding (social sciences) - Abstract
Video standards are crucial for exchanging video content, enabling a myriad of services and supporting a wide variety of devices ranging from personal devices to clouds and IoT. One of the core requirements in video standards is the rate control that regulates the bit allocation and picture quality. This paper presents an overview of rate control techniques in the HEVC video coding standard. While providing an insight into the rate control mechanism specific to HEVC, it describes the basic operating principle of rate control algorithms, including their essential parameter, outputs, and performance measures. We review rate control in past coding standards and bring out the basic features of HEVC that drive the need for new rate control algorithms. Alongside, we delineate the Rate-Distortion model-based taxonomy of various algorithms, including their classification criteria. The paper gives out another classification of the rate control algorithms based on their basic principle and mechanisms. The article also explains the scalable extension of HEVC, namely SHVC, while highlighting some of the possible SHVC rate control design challenges. Finally, we present some of the unresolved research issues in HEVC rate control and outline possible future research directions.
- Published
- 2021
49. Digital Media Application Technology of Mobile Terminals Based on Edge Computing and Virtual Reality
- Author
-
Tao Jiang
- Subjects
Network architecture ,Article Subject ,Computer Networks and Communications ,Computer science ,business.industry ,Latency (audio) ,020206 networking & telecommunications ,TK5101-6720 ,02 engineering and technology ,Virtual reality ,Facial recognition system ,Computer Science Applications ,Digital media ,Cloud computing architecture ,Computer architecture ,User experience design ,Telecommunication ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Edge computing - Abstract
With the maturity of the most advanced technology and virtual reality, the application of digital media technology in various fields has become more and more extensive. Digital media technology has greatly promoted the development of all classes of society. Therefore, how to develop digital media technology applications has become a major issue. This article aims to study the digital media application technology of mobile terminals based on edge computing and virtual reality. In this paper, we use edge computing and virtual reality methods to study the digital media application technology of mobile terminals. We used the SD-CEN architecture and FWA in edge computing to conduct simulation experiments and compared the delay time performance of the FWA with that of the PSO-CO algorithm, WRR algorithm, and Pick-KX algorithm. The results of this paper show that it can effectively reduce the response delay of real-time face recognition services and improve user experience. Compared with the traditional cloud computing architecture and a single MEC device, the SD-CEN network architecture based on FWA strategy has more advantages. The latency of the FWA is increased by 61%, 46%, and 17%, respectively, compared with that of the WRR, Pick-KX, and PSO-CO algorithms.
- Published
- 2021
50. RRAM for Compute-in-Memory: From Inference to Training
- Author
-
Yandong Luo, Shimeng Yu, Xiaochen Peng, and Wonbo Shim
- Subjects
Computer science ,020208 electrical & electronic engineering ,Inference ,02 engineering and technology ,Resistive random-access memory ,Computer architecture ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Hardware acceleration ,Electrical and Electronic Engineering ,Inference engine ,Throughput (business) ,Volatile memory - Abstract
To efficiently deploy machine learning applications to the edge, compute-in-memory (CIM) based hardware accelerator is a promising solution with improved throughput and energy efficiency. Instant-on inference is further enabled by emerging non-volatile memory technologies such as resistive random access memory (RRAM). This paper reviews the recent progresses of the RRAM based CIM accelerator design. First, the multilevel states RRAM characteristics are measured from a test vehicle to examine the key device properties for inference. Second, a benchmark is performed to study the scalability of the RRAM CIM inference engine and the feasibility towards monolithic 3D integration that stacks RRAM arrays on top of advanced logic process node. Third, grand challenges associated with in-situ training are presented. To support accurate and fast in-situ training and enable subsequent inference in an integrated platform, a hybrid precision synapse that combines RRAM with volatile memory (e.g. capacitor) is designed and evaluated at system-level. Prospects and future research needs are discussed.
- Published
- 2021
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.