Author: "A. M.-H. Ho" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"A. M.-H. Ho"' showing total 8 results

Start Over Author "A. M.-H. Ho" Publisher ieee

8 results on '"A. M.-H. Ho"'

1. NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs

Author: Hayden K.-H. So and Sam M. H. Ho
Subjects: Polynomial, Generator (computer programming), Artificial neural network, Computer science, business.industry, Clock rate, Approximation algorithm, Parameterized complexity, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, 020202 computer hardware & architecture, 0202 electrical engineering, electronic engineering, information engineering, Piecewise, Artificial intelligence, Field-programmable gate array, business, computer, 0105 earth and related environmental sciences
Abstract: Efficient implementation of machine learning applications on FPGAs often requires non-linear numerical functions with a non-standard numerical precision that is not readily available from vendor provided standard libraries. While application-specific designs of such functions can result in superior numerical accuracy and area efficiency when compared to ad-hoc composition using vendor-provided primitives, the effort devoted to this challenging task can hardly be portable to other similar applications. In this work, we present an open source generator, NnCore, for floating-point non-linear operator cores built using fixed-point piecewise polynomial segments. The proposed framework takes advantage of properties such as oddness/evenness and intercept-at-origin, often found in the numerical functions commonly used in machine learning applications, and applied an improved segmentation algorithm that specifically handles “outlier” segments, to reduce the required memory size for storing polynomial coefficients. Experimental results show that, at single-precision setting, NnCore generated cores use up to 65% fewer BRAMs, 63% fewer shift-registers, and runs at up to 2.2 χ the clock speed, compare with cores generated from a previous generic function generator. At half-precision, cores can run around 1.2 χ higher clock speed while requiring higher resource usage, or use a comparable number of resource but run at 12% to 45% lower clock speed. The use of HLS C++ as output format allows core integration into modern high-level workflow such as Xilinx SDAccel.
Published: 2017
Full Text: View/download PDF

2. A Parameterizable Activation Function Generator for FPGA-Based Neural Network Applications

Author: Ho-Cheung Ng, Maolin Wang, Hayden K.-H. So, Sam M. H. Ho, and C.-H. Dominic Hung
Subjects: Binary search algorithm, Generator (computer programming), Artificial neural network, Computer science, Activation function, 0202 electrical engineering, electronic engineering, information engineering, Elementary function, Approximation algorithm, 02 engineering and technology, Parallel computing, Field-programmable gate array, Electronic mail, 020202 computer hardware & architecture
Abstract: Neural network applications on FPGAs at times require arithmetic operators that are either not available in the manufacturer's core library, or are complex operators made up of several elementary functions, requiring more resources than if they were built as single operators. In this work, we built an open-source, parameterized floating-point core generator named NnCore, for operators used as activation functions, and their derivatives. We propose a binary search algorithm to search for minimax-polynomial segments, with adjusting steps for ensuring monotonicity between different segments.
Published: 2017
Full Text: View/download PDF

3. Towards FPGA-assisted spark: An SVM training acceleration case study

Author: Ho-Cheung Ng, Maolin Wang, Hayden K.-H. So, and Sam M. H. Ho
Subjects: 0301 basic medicine, Data processing, Speedup, Computer science, business.industry, Support vector machine, 03 medical and health sciences, Acceleration, 030104 developmental biology, Software, Embedded system, Spark (mathematics), Field-programmable gate array, business, Host (network)
Abstract: A system that augments the Apache Spark data processing framework with FPGA accelerators is presented as a way to model and deploy FPGA-assisted applications in large-scale clusters. In our proposed framework, FPGAs can optionally be used in place of the host CPU for Resilient distributed datasets (RDDs) transformations, allowing seamless integration between gateware and software processing. Using the case of training an Support Vector Machine (SVM) cell image classifier as a case study, we explore the feasibilities, benefits and challenges of such technique. In our experiments where data communication between CPU and FPGA is tightly controlled, a consistent speedup of up to 1.6x can be achieved for the target SVM training application as the cluster size increases. Hardware-software techniques that are crucial to achieve acceleration such as the management of data partition size are explored. We demonstrate the benefits of the proposed framework, while also illustrate the importance of careful hardware-software management to avoid excessive CPU-FPGA communication that can quickly diminish the benefits of FPGA acceleration.
Published: 2016
Full Text: View/download PDF

4. High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform

Author: Ho-Cheung Ng, Ho Cheung Shum, Hayden K.-H. So, Manish Kumar Jaiswal, Sam M. H. Ho, Bob M. F. Chung, B. Sharat Chandra Varma, Kevin K. Tsia, and Maolin Wang
Subjects: Ethernet, 010308 nuclear & particles physics, Computer science, business.industry, 02 engineering and technology, Frame rate, 01 natural sciences, 020202 computer hardware & architecture, Atom (system on chip), Gigabit, Embedded system, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Oscilloscope, business, Field-programmable gate array, Throughput (business), Computer hardware, Data transmission
Abstract: Asymmetric-Detection Time-Stretch Optical Microscopy (ATOM) is a recently emerged technology that provides ultra-fast cell imaging with a frame rate up to MHz — orders-of-magnitude higher than any classical imaging systems. However, existing measuring instruments are unable to fully exploit the capability of ATOM. For example, the volume of imaging data-set of ATOM quickly increases beyond the capacity of available onboard buffer of a modern high-speed oscilloscope. This paper presents an open source, FPGA-based solution which serves as a dual role of collecting low-level signals from ATOM frontend as well as processing and transferring data to backing store. Optical signals are sampled by a high-speed analog-to-digital converter and the resulting values are collected by an FPGA. The quantized values received are then further processed and divided into four segments for subsequent data transfer with 10 Gbit Ethernet. Four computing units are attached to these channels with direct connection in order to reliably receive the data for post-processing. Experiments show that, with decent quality images for single-cell analysis, the proposed system can store 10x more dataset than existing high-end oscilloscope. With 8x decrease in equipment cost, the proposed FPGA-based system will definitely be beneficial for many bio imaging applications with ATOM technology such as rare cancer cell imaging and identification.
Published: 2016
Full Text: View/download PDF

5. Design and Implementation of a DSRC Based Vehicular Warning and Notification System

Author: Y. H. Lee, Shiow-Fen Hwang, M. H. Ho, and Chyi-Ren Dow
Subjects: Hazard (logic), business.industry, Computer science, Floating car data, Vehicle Information and Communication System, Notification system, computer.software_genre, Dedicated short-range communications, CAN bus, ComputerSystemsOrganization_MISCELLANEOUS, The Internet, Web service, business, computer, Computer network
Abstract: Many cities have established Web-based road hazard report/notification systems to improve citizeni¦s driving safety and comfort. However, some of the systems are not very user-friendly. On the other hand, a driver sometimes does not know his/her vehicle already has some problems, but the malfunctioning vehicle is still driven on the road. Under this condition, the driver and other road users are all not very safe. This paper proposes a DSRC vehicular warning and notification system. The proposed system consists of multiple communication interfaces to provide real-time traffic warning and rapid notification of traffic information. It is capable of detecting road hazards and transmitting traffic information to the Internet through intervehicular communications via DSRC. In addition, the vehicular status self-check data can be acquired through Control Area Network (CAN) bus. With Web Services, the proposed system could report traffic information to city traffic centers for informing other road users, or car maintenance factories for car fault rescue. A prototype system was implemented, and the road test results show the proposed system can be used not only to enhance driving safety but also to simplify the notification of road problems.
Published: 2011
Full Text: View/download PDF

6. Structured ASIC: Methodology and comparison

Author: Kong-Pang Pun, Steve C. L. Yuen, Philip H. W. Leong, Yan-Qing Ai, Oliver C. S. Choy, Thomas C. P. Chau, Sam M. H. Ho, and Hiu Ching Poon
Subjects: Computer science, business.industry, Controller (computing), Process (computing), Integrated circuit design, CMOS, Mask set, Application-specific integrated circuit, Logic gate, Embedded system, Hardware_INTEGRATEDCIRCUITS, Hardware_ARITHMETICANDLOGICSTRUCTURES, business, Field-programmable gate array, Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION, Hardware_LOGICDESIGN
Abstract: As fabrication process technology continues to advance, mask set costs have become prohibitively expensive. Structured ASICs can offer price and performance between ASICs and FPGAs. They are attractive for mid-volume production and offer good intellectual property security. In this paper, a structured ASIC methodology, where 2 metal- and 1 via-mask are customised, is described. The CAD tools are fully compatible with conventional ASIC design flows and a comparison of area and delay performance with ASICs and FPGAs is given. A prototype structured ASIC implementing an LED-backlit LCD controller was fabricated in a 0.13µm CMOS process. It was verified and power consumption compared with an ASIC design.
Published: 2010
Full Text: View/download PDF

7. Design of a single layer programmable Structured ASIC library

Author: Steve C. L. Yuen, Yan-Qing Ai, Philip H. W. Leong, Oliver C. S. Choy, Thomas C. P. Chau, Kong-Pang Pun, Oscar K. L. Lau, Brian P. W. Chan, David W. L. Wu, and Sam M. H. Ho
Subjects: Engineering, business.industry, Clock rate, Integrated circuit design, Integrated circuit, Programmable logic array, law.invention, Application-specific integrated circuit, law, Logic gate, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, business, Simple programmable logic device, Field-programmable gate array, Computer hardware
Abstract: A Structured Application-specific Integrated Circuit (SASIC) is a programmable fabric in which a small set of masks are customized for a particular application, serving to reduce the associated non-recurring engineering cost (NRE). In this paper we describe the implementation of a SASIC logic cell which is programmable via a single metal layer. A SASIC fabric prototype is fabricated and all implemented functions are verified on silicon. Experimental measurement verifies correct operation of our SASIC with a clock frequency of over 250MHz.
Published: 2010
Full Text: View/download PDF

8. Generation of Synthetic Floating-Point benchmark circuits

Author: Peter Zipf, Thomas C. P. Chau, Manfred Glesner, Philip H. W. Leong, and Sam M. H. Ho
Subjects: Floating point, Generator (computer programming), Software, Computer science, Dataflow, business.industry, Linear algebra, Benchmark (computing), Parallel computing, Field-programmable gate array, business, Basic Linear Algebra Subprograms
Abstract: Synthetic Floating-Point (SFP), a synthetic benchmark generator program for floating-point circuits is presented. SFP consists of two independent modules for characterisation and generation. The characterisation module extracts key dataflow statistics of an arbitrary software program. Generation involves producing randomised circuits with desired statistics which are either the output of the characterisation module or directly generated by the user. Using the basic linear algebra subprograms (BLAS) library, Whetstone benchmark and LINPACK benchmark, it is demonstrated that SFP can be used to generate floating-point benchmarks with different user-specified properties as well as benchmarks that mimic real computational programs.
Published: 2009
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"A. M.-H. Ho"'

1. NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs

2. A Parameterizable Activation Function Generator for FPGA-Based Neural Network Applications

3. Towards FPGA-assisted spark: An SVM training acceleration case study

4. High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform

5. Design and Implementation of a DSRC Based Vehicular Warning and Notification System

6. Structured ASIC: Methodology and comparison

7. Design of a single layer programmable Structured ASIC library

8. Generation of Synthetic Floating-Point benchmark circuits

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

8 results on '"A. M.-H. Ho"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources