79 results on '"M. Pierre"'
Search Results
2. An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs
- Author
-
J. M. Pierre Langlois, Mehdi Ahmadi, and Shervin Vakili
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Dataflow ,business.industry ,Computer science ,Deep learning ,020208 electrical & electronic engineering ,02 engineering and technology ,Energy consumption ,Convolutional neural network ,020202 computer hardware & architecture ,Convolution ,Embedded system ,Hardware Architecture (cs.AR) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Hardware acceleration ,Artificial intelligence ,Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Hardware Architecture ,business ,Dram ,Efficient energy use - Abstract
Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in long processing time and low battery life. An important factor in designing CNN hardware accelerators is to efficiently map the convolution computation onto hardware resources. In addition, to save battery life and reduce energy consumption, it is essential to reduce the number of DRAM accesses since DRAM consumes orders of magnitude more energy compared to other operations in hardware. In this paper, we propose an energy-efficient architecture which maximally utilizes its computational units for convolution operations while requiring a low number of DRAM accesses. The implementation results show that the proposed architecture performs one image recognition task using the VGGNet model with a latency of 393 ms and only 251.5 MB of DRAM accesses., Comment: 4 pages
- Published
- 2020
3. Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators
- Author
-
Mehdi Ahmadi, Shervin Vakili, and J. M. Pierre Langlois
- Subjects
Deep cnn ,Hardware_MEMORYSTRUCTURES ,Computer science ,020208 electrical & electronic engineering ,02 engineering and technology ,Parallel computing ,Convolutional neural network ,020202 computer hardware & architecture ,Visual recognition ,Transfer (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Static random-access memory ,Dram ,Data transmission ,Efficient energy use - Abstract
Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.
- Published
- 2020
4. Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators
- Author
-
Ahmadi, Mehdi, primary, Vakili, Shervin, additional, and Langlois, J. M. Pierre, additional
- Published
- 2020
- Full Text
- View/download PDF
5. An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs
- Author
-
Ahmadi, Mehdi, primary, Vakili, Shervin, additional, and Langlois, J. M. Pierre, additional
- Published
- 2020
- Full Text
- View/download PDF
6. POLYCiNN: Multiclass Binary Inference Engine using Convolutional Decision Forests
- Author
-
J. M. Pierre Langlois, Jean-Pierre David, Ahmed Elsheikh, and Ahmed M. Abdelsalam
- Subjects
Contextual image classification ,Computer science ,business.industry ,Deep learning ,020208 electrical & electronic engineering ,Decision tree ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,Random forest ,Kernel (image processing) ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Inference engine ,business ,computer ,MNIST database ,0105 earth and related environmental sciences - Abstract
Convolutional Neural Networks (CNNs) have achieved significant success in image classification. One of the main reasons that CNNs achieve state-of-the-art accuracy is using many multi-scale learnable windowed feature detectors called kernels. Fetching of kernel feature weights from memory and performing the associated multiply and accumulate computations consume massive amount of energy. This hinders the widespread usage of CNNs, especially in embedded devices. In comparison with CNNs, decision forests are computationally efficient since they are composed of decision trees, which are binary classifiers by nature and can be implemented using AND-OR gates instead of costly multiply and accumulate units. In this paper, we investigate the migration of CNNs to decision forests as one of the promising approaches for reducing both execution time and power consumption while achieving acceptable accuracy. We introduce POLYCiNN, an architecture composed of a stack of decision forests. Each decision forest classifies one of the overlapped sub-images of the original image. Then, all decision forest classifications are fused together to classify the input image. In POLYCiNN, each decision tree is implemented in a single 6-input Look-Up Table and requires no memory access. Therefore, POLYCiNN can be efficiently mapped to simple and densely parallel hardware designs. We validate the performance of POLYCiNN on the benchmark image classification tasks of the MNIST, CIFAR-10 and SVHN datasets.
- Published
- 2019
7. Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design
- Author
-
Jeferson Santiago da Silva, Francois-Raymond Boyer, and J. M. Pierre Langlois
- Subjects
FOS: Computer and information sciences ,020203 distributed computing ,Domain-specific language ,business.industry ,Computer science ,Code reuse ,02 engineering and technology ,Software quality ,020202 computer hardware & architecture ,Abstraction layer ,Software ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer architecture ,High-level synthesis ,VHDL ,0202 electrical engineering, electronic engineering, information engineering ,Code generation ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business ,computer ,computer.programming_language - Abstract
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are normally in conflict with best coding practices, which favor code reuse, modularity, and conciseness. To overcome these limitations, we propose Module-per-Object (MpO), a human-driven HLS design methodology intended for both hardware designers and software developers with limited FPGA expertise. MpO exploits modern C++ to raise the abstraction level while improving QoR, code readability and modularity. To guide HLS designers, we present the five characteristics of MpO classes. Each characteristic exploits the power of HLS-supported modern C++ features to build C++-based hardware modules. These characteristics lead to high-quality software descriptions and efficient hardware generation. We also present a use case of MpO, where we use C++ as the intermediate language for FPGA-targeted code generation from P4, a packet processing domain specific language. The MpO methodology is evaluated using three design experiments: a packet parser, a flow-based traffic manager, and a digital up-converter. Based on experiments, we show that MpO can be comparable to hand-written VHDL code while keeping a high abstraction level, human-readable coding style and modularity. Compared to traditional C-based HLS design, MpO leads to more efficient circuit generation, both in terms of performance and resource utilization. Also, the MpO approach notably improves software quality, augmenting parametrization while eliminating the incidence of code duplication., 9 pages. Paper accepted for publication at The 27th IEEE International Symposium on Field-Programmable Custom Computing Machines, San Diego CA, April 28 - May 1, 2019
- Published
- 2019
8. An Efficient FPGA-based Overlay Inference Architecture for Fully Connected DNNs
- Author
-
J. M. Pierre Langlois, Felix Boulet, Ahmed M. Abdelsalam, Gabriel Demers, and Farida Cheriet
- Subjects
Artificial neural network ,business.industry ,Computer science ,Deep learning ,020208 electrical & electronic engineering ,02 engineering and technology ,Overlay ,Computer architecture ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Layer (object-oriented design) ,Bitstream ,business ,Field-programmable gate array ,Throughput (business) - Abstract
Deep Neural Networks (DNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of DNNs pose special challenges for FPGA implementation. Moreover, programming FPGAs requires hardware-specific knowledge that many machine-learning researchers do not possess. To make the power and versatility of FPGAs available to a wider DNN user community and to improve DNN design efficiency, we introduce a Single hidden layer Neural Network (SNN) multiplication-free overlay architecture with fully connected DNN-level performance. This FPGA inference overlay can be used for applications that are normally solved with fully connected DNNs. The overlay avoids the time needed to synthesize, place, route and regenerate a new bitstream when the application changes. The SNN overlay inputs and activations are quantized to power-of-two values, which allows utilizing shift units instead of multipliers. Since the overlay is a SNN, we fill the FPGA chip with the maximum possible number of neurons that can work in parallel in the hidden layer. On a ZYNQ-7000 ZC706 FPGA, it is thus possible to implement 2450 neurons in the hidden layer and 30 neurons in the output layer. We evaluate the proposed architecture on typical benchmark datasets and demonstrate higher throughput with respect to the state-of-the-art while achieving the same accuracy.
- Published
- 2018
9. Incremental Lifelong Deep Learning for Autonomous Vehicles
- Author
-
John M. Pierre
- Subjects
Forgetting ,Training set ,Artificial neural network ,business.industry ,Computer science ,Deep learning ,Supervised learning ,Mobile robot ,Machine learning ,computer.software_genre ,Robot learning ,Incremental learning ,Benchmark (computing) ,Artificial intelligence ,business ,computer - Abstract
We investigate a deep learning methodology that can produce more accurate behaviors for autonomous vehicles with a much smaller amount of training data than by using supervised learning alone. In this paper, we develop a Correction-Based Incremental Learning (CBIL) algorithm that adds additional training examples strategically selected from cases where the autonomous vehicle has made mistakes, and is repeated over multiple iterations to dramatically improve mean time to failure. CBIL can be thought of as an online mistake bound learning model that reduces the number of training examples needed to define robust decision boundaries, and is trained offline to solve the problem of catastrophic forgetting. We quantitatively benchmark the performance of CBIL using several experiments related to autonomous platooning performed in truck driving simulations and in the laboratory with mobile robots.
- Published
- 2018
10. POLYBiNN: A Scalable and Efficient Combinatorial Inference Engine for Neural Networks on FPGA
- Author
-
J. M. Pierre Langlois, Ahmed M. Abdelsalam, Ahmed Elsheikh, and Jean-Pierre David
- Subjects
Artificial neural network ,Computer science ,business.industry ,Deep learning ,020208 electrical & electronic engineering ,Decision tree ,02 engineering and technology ,Convolutional neural network ,Computer engineering ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Inference engine ,business ,Field-programmable gate array ,MNIST database - Abstract
Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of deep NN architectures pose particular challenges for their FPGA implementation. Moreover, programming FPGAs requires hardware-specific knowledge that many machine-learning researchers do not possess. To make the power and versatility of FPGAs available to a wider deep learning user community and to improve DNN design efficiency, we introduce POLYBiNN, a scalable and efficient combinatorial inference engine for DNNs and CNNs. POLYBiNN is composed of a stack of decision trees, which are binary classifiers by nature, and utilizes AND-OR gates instead of multipliers and accumulators. POLYBiNN drastically cuts the hardware consumption down while maintaining high accuracy, and it is a memory free inference engine. We also propose a tool that generates automatically a low-level hardware description of the trained POLYBiNN for a given application. We evaluate POLYBiNN for the MNIST dataset when implemented in a ZYNQ-7000 ZC706 FPGA platform. The system achieves a throughput of up to 100 million image classifications per second with 90 ns latency on the MNIST dataset with 97.18% accuracy. The power consumption of PLOYBiNN is less than 1.2 W. We also show how POLYBiNN can be used in the fully connected layers of a CNN and apply this approach to the CIFAR-10 dataset.
- Published
- 2018
11. Power Reduction in CNN Pooling Layers with a Preliminary Partial Computation Strategy
- Author
-
J. M. Pierre Langlois, Warren J. Gross, Shervin Vakili, and Mehdi Ahmadi
- Subjects
Contextual image classification ,Computer science ,Computation ,020208 electrical & electronic engineering ,Pooling ,02 engineering and technology ,Convolutional neural network ,020202 computer hardware & architecture ,Convolution ,Power (physics) ,Reduction (complexity) ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,Algorithm - Abstract
Convolutional neural networks (CNNs) are responsible for many recent successes in the computer vision field and are now the dominant approach for image classification. However, CNN-based methods perform many convolution operations and have high power consumption which makes them difficult to deploy on mobile devices. In this paper, we propose a new method to reduce CNN power consumption by simplifying computations before max-pooling layers. The proposed method estimates the output of the max-pooling layer by approximating the preceding convolutional layer with a preliminary partial computation. Then, the method performs a complementary computation to generate an exact convolution output only for the selected feature. We also present an analysis of the approximation parameters. Simulation results show that the proposed method reduces the power consumption by 21% and the silicon area by 19% with negligible degradation in classification accuracy for the CIFAR−10 dataset.
- Published
- 2018
12. A Low-Latency Memory-Efficient IPv6 Lookup Engine Implemented on FPGA Using High-Level Synthesis
- Author
-
J. M. Pierre Langlois, Yvon Savaria, Normand Belanger, and Thibaut Stimpfling
- Subjects
Hardware architecture ,Ethernet ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Data structure ,020202 computer hardware & architecture ,IPv6 ,Computer architecture ,High-level synthesis ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Field-programmable gate array ,5G - Abstract
The emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support low-latency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a high-level synthesis tool, then implemented on a Virtex-7 FPGA. Compared to other well-known approaches, the proposed IPvThe emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support lowlatency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a highlevel synthesis tool, then implemented on a Virtex-7 FPGA. Compared to the proposed IPv6 lookup architecture, other wellknown approaches use at least 87% more memory per prefix, while increasing the lookup latency by a factor of 2.3×.6 lookup architecture reduces lookup latency by a factor of 2.3x and uses as much as 46% less memory per prefix for a synthetic prefix table holding 580 k entries.
- Published
- 2018
13. Spatio-temporal deep learning for robotic visuomotor control
- Author
-
John M. Pierre
- Subjects
Visual perception ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Mobile robot ,010501 environmental sciences ,Motion control ,01 natural sciences ,Robot learning ,Machine perception ,Robot control ,0103 physical sciences ,Computer vision ,Artificial intelligence ,010306 general physics ,business ,0105 earth and related environmental sciences - Abstract
To perform accurate and smooth behaviors in dynamic environments with moving objects, robotic visuomotor control should include the ability to process spatio-temporal information. We propose a system that uses a spatio-temporal deep neural network (DNN), with video camera pixels as the only input, to handle all the visual perception and visuomotor control functions needed to perform robotic behaviors such as leader following. Our approach combines: (1) end-to-end deep learning for inferring motion control outputs from visual inputs, (2) multi-task learning for simultaneously producing multiple control outputs with the same DNN, and (3) spatio-temporal deep learning for perceiving motion across multiple video frames. We use driving simulations to quantitatively show that spatio-temporal DNNs increase driving accuracy and driving smoothness by improving machine perception of scene kinematics. Experiments conducted with mobile robots in a laboratory test track show real-time embedded systems performance comparable to human reaction times to visual stimuli, and indicate that a spatio-temporal deep learning robot is able to follow a leader for long periods of time, while keeping within lanes and avoiding obstacles.
- Published
- 2018
14. POLYCiNN: Multiclass Binary Inference Engine using Convolutional Decision Forests
- Author
-
Abdelsalam, Ahmed M., primary, Elsheikh, Ahmed, additional, David, Jean-Pierre, additional, and Langlois, J. M. Pierre, additional
- Published
- 2019
- Full Text
- View/download PDF
15. CARLA: A Convolution Accelerator With a Reconfigurable and Low-Energy Architecture.
- Author
-
Ahmadi, Mehdi, Vakili, Shervin, and Langlois, J. M. Pierre
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER architecture ,APPLICATION-specific integrated circuits ,RANDOM access memory ,DEEP learning - Abstract
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery–powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated on convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of 3 × 3 and 1 × 1 convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. Scalable memory-less architecture for string matching with FPGAs
- Author
-
Shervin Vakili, J. M. Pierre Langlois, Ideh Sarbishei, and Yvon Savaria
- Subjects
021110 strategic, defence & security studies ,Hardware_MEMORYSTRUCTURES ,Computer science ,Routing table ,String (computer science) ,0211 other engineering and technologies ,020206 networking & telecommunications ,02 engineering and technology ,String searching algorithm ,Parallel computing ,Prefix ,Lookup table ,0202 electrical engineering, electronic engineering, information engineering ,Longest prefix match ,Routing (electronic design automation) ,Field-programmable gate array ,Throughput (business) - Abstract
String matching hardware engines generally utilize Ternary Content Addressable Memories (TCAMs). Although TCAM-based solutions are fast, they are expensive and power hungry. This paper proposes a high-performance memory-less architecture for string matching called Split-Bucket. It offers a performance comparable to TCAM-based solutions. Moreover, it is reconfigurable and scalable to the size of the target string set and the width of the string. The architecture is characterized using the Longest Prefix Match problem for IP address lookup and is implemented on a Virtex-7 FPGA. For a real-world routing table with 524 k IPv4 prefixes, the Split-Bucket architecture achieves a throughput of 103.4 M packets per second and consumes 23% and 22% of the Look Up Tables and Flip-Flops of a Xilinx XC7V2000T chip, respectively.
- Published
- 2017
17. A Configurable FPGA Implementation of the Tanh Function Using DCT Interpolation
- Author
-
Ahmed M. Abdelsalam, J. M. Pierre Langlois, and Farida Cheriet
- Subjects
Computer science ,020208 electrical & electronic engineering ,Hyperbolic function ,02 engineering and technology ,Function (mathematics) ,Parallel computing ,Transfer function ,Lookup table ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,020201 artificial intelligence & image processing ,Circuit complexity ,Field-programmable gate array ,Algorithm ,Interpolation - Abstract
Efficient implementation of non-linear activationfunctions is essential to the implementation of deep learningmodels on FPGAs. We introduce such an implementation basedon the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed interpolation architecture combines simple arithmeticoperations on the stored samples of the hyperbolic tangentfunction and on input data. It achieves almost 3 better precisionthan previous works while using a similar amount computationalresources and a small amount of memory. Various combinationsof DCTIF parameters can be chosen to trade off the accuracy andthe overall circuit complexity of the tanh function. In one case, the proposed architecture approximates the hyperbolic tangentactivation function with 0.004 maximum error while requiringonly 1.45 kbits BRAM memory and 21 LUTs of a Virtex-7 FPGA.
- Published
- 2017
18. Decimal floating-point multiplier with binary-decimal compression based fixed-point multiplier
- Author
-
Noureddine Chabini, Shuli Gao, Dhamin Al-Khalili, and J. M. Pierre Langlois
- Subjects
Computer science ,business.industry ,Decimal floating point ,Binary-coded decimal ,02 engineering and technology ,Fixed point ,Decimal ,020202 computer hardware & architecture ,Lookup table ,VHDL ,0202 electrical engineering, electronic engineering, information engineering ,Multiplier (economics) ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,business ,Field-programmable gate array ,computer ,Computer hardware ,computer.programming_language - Abstract
This paper presents the design of pipelined IEEE 754-2008 decimal floating-point (DFP) multipliers targeting FPGAs. A key component of the architecture is the fixed-point multiplier function which impacts the overall performance and area utilization. In this paper, we propose a new method to realize this operation by carefully organizing the partial products and developing an algorithm for binary-decimal compression. The DFP multipliers with 5 to 12 pipeline stages are coded in VHDL and implemented on a Xilinx Virtex-5 FPGA. The overall design is compared with another approach based on fixed-point multipliers using a BCD-4221 compression technique. Using post layout extracted design data, our approach achieves a delay improvement in the range of 7.9% to 20.3% and an average LUT reduction of 5%.
- Published
- 2017
19. Power Reduction in CNN Pooling Layers with a Preliminary Partial Computation Strategy
- Author
-
Ahmadi, Mehdi, primary, Vakili, Shervin, additional, Langlois, J. M. Pierre, additional, and Gross, Warren, additional
- Published
- 2018
- Full Text
- View/download PDF
20. Custom Low Power Processor for Polar Decoding
- Author
-
Leonardon, Mathieu, primary, Leroux, Camille, additional, Binet, David, additional, Langlois, J. M. Pierre, additional, Jego, Christophe, additional, and Savaria, Yvon, additional
- Published
- 2018
- Full Text
- View/download PDF
21. A run-length encoding co-processor for retinal image texture analysis
- Author
-
Houssem Ben Tahar, Qifeng Gan, Farida Cheriet, J. M. Pierre Langlois, and Hamza Bendaoudi
- Subjects
ARM architecture ,Programmable logic device ,Software ,Coprocessor ,Speedup ,business.industry ,Image quality ,Computer science ,Encoding (memory) ,Run-length encoding ,Computer vision ,Artificial intelligence ,business - Abstract
This paper presents a Zynq-based system to compute Run-Length encoding Matrix features for retinal image texture analysis. In order to improve the performance of the software implementation, we propose a co-processor architecture implemented in the programmable logic portion of the Zynq platform. Experimental results show a speedup of 26.3× with respect to the software version implemented on the ARM processor alone, for 2496 × 1664 images. The additional area to implement the co-processor is limited to 13% of DSP48E1s slices and about 2% for LUTs and flip-flops.
- Published
- 2015
22. A Configurable FPGA Implementation of the Tanh Function Using DCT Interpolation
- Author
-
Abdelsalam, Ahmed M., primary, Langlois, J. M. Pierre, additional, and Cheriet, F., additional
- Published
- 2017
- Full Text
- View/download PDF
23. Designing customized microprocessors for fixed-point computation
- Author
-
J. M. Pierre Langlois, Shervin Vakili, and Guy Bois
- Subjects
Computer science ,business.industry ,Processor design ,Computation ,Embedded system ,Evolutionary algorithm ,Latency (engineering) ,Fixed point ,business ,Microarchitecture ,Personalization ,Electronic circuit - Abstract
This paper proposes a method to optimize application-specific microprocessors for fixed-point computations. Fixed-point word-length optimization is a well-known research area that aims to find the optimal trade-offs between accuracy and hardware cost in bitwidth allocation signals in fixed point circuits. This work proposes a methodology to combine word-length optimization with application-specific processor customization. The goal is to optimize the following parameters in the processor architecture: (1) datatype word-lengths, (2) size of register-files and (3) architecture of the functional units. Multi-level evolutionary algorithms are employed to perform the optimization. To facilitate evaluation, a new processor design environment was developed that supports necessary customization flexibility to realize and evaluate the proposed methodology. The experimental results show that for five evaluated benchmarks, the proposed methodology can reduce the number of consumed LUTs and flip-flops by an average of 11.9% and 5.1%, respectively, while reducing the latency by an average of 33.4%.
- Published
- 2015
24. Camera intrinsic blur kernel estimation: A reliable framework
- Author
-
Isabelle Begin, Ali Mosleh, Paul Green, Emmanuel Onzon, and J. M. Pierre Langlois
- Subjects
Point spread function ,business.industry ,Kernel density estimation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Astrophysics::Instrumentation and Methods for Astrophysics ,Measure (mathematics) ,Image (mathematics) ,law.invention ,Lens (optics) ,Camera auto-calibration ,law ,Computer vision ,Artificial intelligence ,Noise (video) ,business ,Homography (computer vision) ,Mathematics - Abstract
This paper presents a reliable non-blind method to measure intrinsic lens blur. We first introduce an accurate camera-scene alignment framework that avoids erroneous homography estimation and camera tone curve estimation. This alignment is used to generate a sharp correspondence of a target pattern captured by the camera. Second, we introduce a Point Spread Function (PSF) estimation approach where information about the frequency spectrum of the target image is taken into account. As a result of these steps and the ability to use multiple target images in this framework, we achieve a PSF estimation method robust against noise and suitable for mobile devices. Experimental results show that the proposed method results in PSFs with more than 10 dB higher accuracy in noisy conditions compared with the PSFs generated using state-of-the-art techniques.
- Published
- 2015
25. SHIP: A Scalable High-Performance IPv6 Lookup Algorithm That Exploits Prefix Characteristics.
- Author
-
Stimpfling, Thibaut, Belanger, Normand, Langlois, J. M. Pierre, and Savaria, Yvon
- Subjects
DATA structures ,DYNAMIC positioning systems ,FIELD programmable gate arrays ,GATE array circuits - Abstract
Due to the emergence of new network applications, current IP lookup engines must support high bandwidth, low lookup latency, and the ongoing growth of IPv6 networks. However, the existing solutions are not designed to address jointly these three requirements. This paper introduces SHIP, an IPv6 lookup algorithm that exploits prefix characteristics to build a data structure designed to meet future application requirements. Based on the prefix length distribution and prefix density, prefixes are first clustered into groups sharing similar characteristics and then encoded in hybrid trie-trees. The resulting memory-efficient and scalable data structure can be stored in low-latency memories and allows the traversal process to be parallelized and pipelined in order to support high packet bandwidth in hardware. In addition, SHIP supports incremental updates. Evaluated on real and synthetic IPv6 prefix tables, SHIP has a logarithmic scaling factor in terms of the number of memory accesses and a linear memory consumption scaling. Compared with other well-known approaches, SHIP reduces the required amount of memory per prefix by 87%. When implemented on a state-of-the-art field-programmable gate array (FPGA), the proposed architecture can support processing 588 million packets per second. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
26. FDSOI to nanowires and single electron transistors
- Author
-
Romain Wacquez, R. Coquand, Romain Lavieville, Xavier Jehl, M. Vinet, Olivier Faynot, M. Pierre, B. Roche, Bernard Previtali, S. Barraud, Thierry Poiroux, Veeresh Deshpande, P. Perreau, M. Sanquer, O. Cueto, Laurent Grenouillet, and Benoit Voisin
- Subjects
Materials science ,business.industry ,Transistor ,Nanowire ,Electrical engineering ,Coulomb blockade ,Hardware_PERFORMANCEANDRELIABILITY ,Electron ,law.invention ,Planar ,CMOS ,law ,Hardware_INTEGRATEDCIRCUITS ,Optoelectronics ,Field-effect transistor ,business ,Hardware_LOGICDESIGN ,Electronic circuit - Abstract
This paper reviews how the evolution of FDSOI planar architecture towards Trigate Nanowires leads to a natural Single Electron Transistor and Field Effect Transistor convergence at room temperature. On one hand, this convergence sets up technological specifications to preserve CMOS operation. On the other hand it opens the path to room temperature hybrid circuits based on single electron transistors and MOSFETs. Further on, single electron effects can be downscaled to the ultimate single atom transistors and we demonstrate the practical performance of electron pumps for metrologic applications.
- Published
- 2014
27. A memory transaction model for Sparse Matrix-Vector multiplications on GPUs
- Author
-
Yvon Savaria, Thalie Léna Keklikian, and J. M. Pierre Langlois
- Subjects
Matrix (mathematics) ,CUDA ,Application domain ,Computer science ,CUDA Pinned memory ,Multiplication ,Parallel computing ,General-purpose computing on graphics processing units ,Row ,Sparse matrix - Abstract
The Sparse Matrix-Vector multiplication (SpMV) is an algorithm used in many fields. Since the introduction of CUDA and general purpose programming on GPUs, several efforts to optimize it have been reported. SpMV optimization is complex due to irregular memory accesses depending on the nonzero element distribution of the matrix. In this paper, we propose a model that predicts the number of memory transactions of SpMV for a matrix stored in the CSR format. With the number of memory transactions known in advance, the performance and the execution time can be estimated. The model can be used to select the best suited CUDA implementation for sparse matrices for a given application domain. Predicted results from the model are within 7.5% for the matrices of more than 1000 rows that we have tested on the NVIDIA Tesla K20c and Ge-Force GTX 670.
- Published
- 2014
28. Automatic detection of microaneurysms and haemorrhages in fundus images using dynamic shape features
- Author
-
J. M. Pierre Langlois, Jihed Chelbi, Timothée Faucon, Thomas Hurtut, Farida Cheriet, and Lama Seoud
- Subjects
Pixel ,Computer science ,business.industry ,fundus images ,Noise reduction ,computer aided detection ,Pattern recognition ,Fundus (eye) ,image processing ,features extraction ,Image texture ,Computer vision ,Artificial intelligence ,business - Abstract
This paper presents a novel approach for automatic detection of microaneurysms and haemorrhages in fundus images. First, it begins with a preprocessing stage for shade correction, contrast enhancement and denoising. Second, all regional minima with sufficient contrast are extracted and considered as candidates. Third, in an image flooding scheme, a new set of dynamic shape features is computed as a function of intensity. Finally, a Random Forest classifies the candidates into lesions and non lesions. A set of 143 fundus images with an average of 2210 pixels in diameter was acquired using different cameras and used for training and testing. The proposed approach achieved a global score over the FROC curve of 0.393, while previous work with images of similar resolution reported a score of 0.233., 2014 IEEE 11th International Symposium on Biomedical Imaging, ISBI 2014, 29 April-2 May, 2014, Beijing, China
- Published
- 2014
29. FDSOI nanowires: An opportunity for hybrid circuit with field effect and single electron transistors
- Author
-
Pierre Perreau, O. Faynot, M. Pierre, Marc Sanquer, R. Coquand, Xavier Jehl, Benoit Voisin, S. Barraud, O. Cueto, Bernard Previtali, B. Roche, M. Vinet, L. Tosti, C. Vizioz, Thierry Poiroux, Veeresh Deshpande, Romain Wacquez, and L. Grenouillet
- Subjects
Materials science ,business.industry ,Transistor ,Nanowire ,Coulomb blockade ,Field effect ,Silicon on insulator ,Nanotechnology ,law.invention ,CMOS ,law ,MOSFET ,Optoelectronics ,business ,Electronic circuit - Abstract
Thanks to a well-controlled CMOS FDSOI technology we have recently been able to demonstrate breakthroughs in the combined use of field effect and Coulomb blockade phenomena. On one hand, we have demonstrated room temperaturehybrid circuits based on single electron transistors and MOSFETs. On the other hand, we have shown the practical performance of electron pumps designed with a single silicided Coulomb island and MOSFETs as tunable barriers for metrologic applications.
- Published
- 2013
30. Memory efficient Multi-Scale Line Detector architecture for retinal blood vessel segmentation
- Author
-
Bendaoudi, Hamza, primary, Cheriet, Farida, additional, and Langlois, J. M. Pierre, additional
- Published
- 2016
- Full Text
- View/download PDF
31. Human-machine cooperation to design Intelligent Manufacturing Systems
- Author
-
Pacaux-Lemoine, M-Pierre, primary, Trentesaux, Damien, additional, and Rey, Gabriel Zambrano, additional
- Published
- 2016
- Full Text
- View/download PDF
32. Finite-precision error modeling using affine arithmetic
- Author
-
Guy Bois, Shervin Vakili, and J. M. Pierre Langlois
- Subjects
Affine shape adaptation ,Hazard (logic) ,Propagation of uncertainty ,Mathematical optimization ,Arbitrary-precision arithmetic ,Saturation arithmetic ,Multiplication ,Fixed-point arithmetic ,Algorithm ,Affine arithmetic ,Mathematics - Abstract
This paper introduces a new approach for finite-precision error modeling based on affine arithmetic. The paper demonstrates that there is a common hazard in affine arithmetic-based error modeling methods described in the literature. The hazard is linked to early substitution of the signal terms that emerge in operations such as multiplication and division. The paper proposes postponed substitution combined with function maximization to address this problem. The paper also proposes a modification in the error propagation process to enhance the error modeling accuracy. An existing word length optimization method is reproduced to evaluate the efficiency of this modification. The results demonstrate that the proposed modification can improve the hardware area results by up to 7.0% at the expense of negligible complexity overhead.
- Published
- 2013
33. Explicit Ringing Removal in Image Deblurring.
- Author
-
Mosleh, Ali, Elmi Sola, Yasser, Zargari, Farzad, Onzon, Emmanuel, and Langlois, J. M. Pierre
- Subjects
IMAGE quality analysis ,MATHEMATICAL regularization ,COST functions ,QUANTITATIVE research ,COMPUTER algorithms - Abstract
In this paper, we present a simple yet effective image deblurring method to produce ringing-free deblurred images. Our work is inspired by the observation that large-scale deblurring ringing artifacts are measurable through a multi-resolution pyramid of low-pass filtering of the blurred-deblurred image pair. We propose to model such a quantification as a convex cost function and minimize it directly in the deblurring process in order to reduce ringing regardless of its cause. An efficient primal-dual algorithm is proposed as a solution to this optimization problem. Since the regularization is more biased toward ringing patterns, the details of the reconstructed image are prevented from over-smoothing. An inevitable source of ringing is sensor saturation which can be detected costlessly contrary to most other sources of ringing. However, dealing with the saturation effect in deblurring introduces a non-linear operator in optimization problem. In this paper, we also introduce a linear approximation as a solution to handling saturation in the proposed deblurring method. As a result of these steps, we significantly enhance the quality of the deblurred images. Experimental results and quantitative evaluations demonstrate that the proposed method performs favorably against state-of-the-art image deblurring methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
34. Transport measurement across single and coupled dopants implanted in a CMOS channel
- Author
-
M. Vinet, Xavier Jehl, S. De Franceschi, E. Dupont-Ferrier, M. Sanquer, M. Pierre, B. Roche, Benoit Voisin, and Romain Wacquez
- Subjects
Nanostructure ,Materials science ,Dopant ,business.industry ,Doping ,Transistor ,Analytical chemistry ,law.invention ,Condensed Matter::Materials Science ,CMOS ,law ,Condensed Matter::Superconductivity ,MOSFET ,Physics::Atomic and Molecular Clusters ,Optoelectronics ,Ionization energy ,business ,Microwave - Abstract
We access properties of single dopants embedded in ultra-scaled MOSFET. In such nanostructures, the ionization energy of a single dopant is enhanced. We establish a new method to determine the energy spectrum of a single dopant by connecting two dopants in series and using one dopant as an energy probe for the second one. Gigahertz microwave driving of this double donor system reveals coherent charge transfert in this ultimate “atomic” transistor.
- Published
- 2012
35. A CNFET-based characterization framework for digital circuits
- Author
-
J. M. Pierre Langlois, Jacques L. Athow, Dhamin Al-Khalili, and Come Rozon
- Subjects
Digital electronics ,business.industry ,Computer science ,Process (computing) ,Hardware_PERFORMANCEANDRELIABILITY ,Power (physics) ,CMOS ,Logic gate ,MOSFET ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,Field-effect transistor ,business ,Hardware_LOGICDESIGN ,Voltage - Abstract
This paper introduces a framework to develop and characterize digital circuits using Carbon Nanotube Field Effect Transistors (CNFET). We define a 4-step process that involves design capture, pre-processing, circuit simulation and results extraction and interpretation. The initial work leading to this framework involves the selection of appropriate CNFET model and model parameters, and determination of optimized substrate voltage. Through a set of custom-design automated scripts, various logic gates were simulated, data were compiled and characterization results were obtained. A complete approximate squarer circuit was also designed, implemented and characterized using the framework. To demonstrate the power of Carbon Nanotube technology, the same circuit was also implemented in 16 nm CMOS technology for comparison. An improvement by factor of 17× in PDP was achieved with CNT.
- Published
- 2011
36. Customized embedded processor design for global photographic tone mapping
- Author
-
Guy Bois, Shervin Vakili, Diana C. Gil, Yvon Savaria, and J. M. Pierre Langlois
- Subjects
Reduced instruction set computing ,Logarithm ,business.industry ,Computer science ,Embedded system ,Overhead (computing) ,Algorithm design ,Tone mapping ,Performance improvement ,Graphics ,business ,Computer hardware ,Display device - Abstract
Tone-mapping (TM) aims to adapt high dynamic range images to conventional display devices. TM algorithms are usually implemented on general purpose processors and graphics processing units. Such platforms may not meet performance, area, power and flexibility constraints imposed by the embedded system domain. This paper presents the design and implementation of a customized processor for a global TM algorithm. Using an architecture description language, three custom instructions to calculate luminance, logarithm and maximum luminance were added to a 32-bit RISC-based processor. The logarithm was computed using an improved Mitchell approximation. Experimental results demonstrate a 169% performance improvement when adding all three instructions, with a hardware overhead of only 22%.
- Published
- 2011
37. A tracking algorithm suitable for embedded systems implementation
- Author
-
J. M. Pierre Langlois, Yvon Savaria, Rana Farah, Qifeng Gan, and Guillaume-Alexandre Bilodeau
- Subjects
Iterative method ,Robustness (computer science) ,Computer science ,business.industry ,Video tracking ,Histogram ,Embedded system ,Resampling ,Algorithm design ,Condensation algorithm ,business ,Particle filter ,Algorithm - Abstract
Particle filters have been widely used for video tracking due to their robustness. However, most particle filter algorithm implementations are computationally expensive which makes them ill-suited for real-time embedded systems. There have been some attempts to provide hardware implementations for the particle filter, but none of them tried to simplify the algorithm first in order to make it more efficient for the hardware implementation. In this paper, a new sampling algorithm inspired from the particle filter methodology is proposed. It includes a resampling scheme that uses a new method to assign the number of particles between filter iterations and a criterion to reduce the number of processed samples, both in order to reduce the computational burden. Our experiments demonstrate that the algorithm can be as accurate as the CONDENSATION algorithm, while reducing the computational load by a factor of 30%.
- Published
- 2011
38. RAT: Robust animal tracking
- Author
-
J. M. Pierre Langlois, Guillaume-Alexandre Bilodeau, and Rana Farah
- Subjects
Tracking error ,business.industry ,Computer science ,Video tracking ,Track (disk drive) ,Work (physics) ,Normal laboratory ,Computer vision ,Artificial intelligence ,Noise (video) ,business ,Tracking (particle physics) - Abstract
Determining the motion pattern of laboratory animals is very important in order to monitor their reaction to various stimuli. In this paper, we propose a robust method to track animals, and consequently determine their motion pattern. The method is designed to work under uncontrolled normal laboratory conditions. It consists of two steps. The first step tracks the animal coarsely, using the combination of four features, while the second step refines the boundaries of the tracked area, in order to fit more precisely the boundaries of the animal. The method achieves an average tracking error smaller than 5% for our test videos.
- Published
- 2011
39. Combining ISA extensions and subsetting for improved ASIP performance and cost
- Author
-
Diana C. Gil, Simon Rajotte, and J. M. Pierre Langlois
- Subjects
Speedup ,business.industry ,Computer science ,Application-specific instruction-set processor ,Image processing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Execution time ,Instruction set ,Application-specific integrated circuit ,Computer architecture ,Embedded system ,Encoding (memory) ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,business ,Field-programmable gate array - Abstract
This paper presents a fine-grained configurable processor model used to generate image processing Application Specific Instruction Set Processors (ASIPs). A methodology to develop a minimal instruction set ASIP with the processor model is also proposed. The methodology is based on using specialized instructions in conjunction with Instruction Set Architecture (ISA) subsetting to reduce hardware costs and improve execution time. The performance of an FPGA implementation of the proposed processor model is measured for a two-dimensional Gaussian filter and results are compared to a popular commercial soft core processor. With ISA subsetting and specialized instructions, the proposed processor uses up to 45% fewer slices while achieving a 1.57× speedup.
- Published
- 2011
40. Comparative analysis of contrast enhancement algorithms in surveillance imaging
- Author
-
Yvon Savaria, Diana C. Gil, Guillaume-Alexandre Bilodeau, J. M. Pierre Langlois, and Rana Farah
- Subjects
Brightness ,Pixel ,business.industry ,Computer science ,Image quality ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Histogram ,Key (cryptography) ,Contrast (vision) ,Algorithm design ,Computer vision ,Artificial intelligence ,business ,media_common - Abstract
Image contrast enhancement methods play a key role in many image processing and vision applications. For surveillance applications, real-time contrast improvement over the whole image is required when videos are taken in poor lighting conditions. It is also necessary to highlight details in shadowed regions without introducing artifacts. In this paper, several state-of-the-art contrast enhancement methods are compared. Image quality is evaluated by means of objective metrics such as intensity contrast and brightness error, and by subjective assessment. Execution time is also measured. Experimental results show that a technique based on histogram modification presents a better trade-off considering both aspects.
- Published
- 2011
41. Single dopant impact on electrical characteristics of SOI NMOSFETs with effective length down to 10nm
- Author
-
Giuseppe C. Tettamanzi, Marc Sanquer, Xavier Jehl, J. Verduijn, N. Bove, O. Faynot, M. Pierre, C. Comboroure, B. Roche, Sven Rogge, Bernard Previtali, Veeresh Deshpande, Romain Wacquez, Maud Vinet, C. Vizioz, O. Cueto, and S. Pauliac-Vaujour
- Subjects
010302 applied physics ,Materials science ,Dopant ,Subthreshold conduction ,business.industry ,Transistor ,Doping ,Silicon on insulator ,02 engineering and technology ,021001 nanoscience & nanotechnology ,01 natural sciences ,Temperature measurement ,law.invention ,Threshold voltage ,law ,0103 physical sciences ,MOSFET ,Electronic engineering ,Optoelectronics ,0210 nano-technology ,business - Abstract
Although single dopant signatures have been observed at low temperature [1–2], the impact on transistor performance of a single dopant atom at room temperature is not yet well understood. Here, for the first time, we provide an in-depth understanding of single dopant influence on NMOSFETs characteristics by linking low and room temperature transport. We demonstrate that, for gate length of 30 nm and below (channel length down to 10 nm), the presence of a single dopant dramatically alters the subthreshold behaviour when the dopant is located in the middle of the channel. Moving the dopants away from the channel leads to enhanced variability above the threshold voltage V t .
- Published
- 2010
42. Operation of a silicon CMOS electron pump
- Author
-
Maud Vinet, Romain Wacquez, Marc Sanquer, Xavier Jehl, M. Pierre, N. Feltin, B. Roche, and L. Devoille
- Subjects
Fabrication ,Materials science ,Yield (engineering) ,Silicon ,business.industry ,Electrical engineering ,Physics::Optics ,chemistry.chemical_element ,02 engineering and technology ,Electron ,021001 nanoscience & nanotechnology ,01 natural sciences ,Metrology ,CMOS ,chemistry ,0103 physical sciences ,Optoelectronics ,Microelectronics ,010306 general physics ,0210 nano-technology ,business - Abstract
We show the first measurements of a silicon electron pump produced in an industrial CMOS facility. Sample fabrication derived from state-of-the-art microelectronics processes results in mass production with high yield, very small sizes and easy operation up to high frequencies as these pump do not suffer from cross-capacitance. Further data with metrological assessment will be shown, as well as a a comparison of classical and non-adiabatic pumping modes within the same devices.
- Published
- 2010
43. Dielectric confinement and fluctuations of the local density of state in the source and drain of an ultra scaled SOI NMOS transistor
- Author
-
Veeresh Deshpande, Romain Wacquez, Marc Sanquer, Xavier Jehl, Maud Vinet, M. Pierre, B. Roche, O. Cueto, and Bernard Previtali
- Subjects
010302 applied physics ,Materials science ,Local density of states ,business.industry ,Doping ,Analytical chemistry ,Nanowire ,Silicon on insulator ,Dielectric ,01 natural sciences ,0103 physical sciences ,MOSFET ,Density of states ,Optoelectronics ,010306 general physics ,business ,NMOS logic - Abstract
We fabricated SOI nanowire MOSFETs with a very small channel volume and few dopants between the highly doped source and drain. The ionization energy of these isolated As dopants can be extracted. We found a much higher energy than calculated value for As in bulk Si. This enhancement is due to the so-called dielectric confinement, because of the proximity of the buried oxide. Transport through this single dopant also enables probing the fluctuations of local density of states in the contacts.
- Published
- 2010
44. High performance ASIP implementation of PBDI — A new intra-field deinterlacing method
- Author
-
J. M. Pierre Langlois, Yvon Savaria, Hossein Mahvash Mohammadi, and Philippe Aubertin
- Subjects
Speedup ,Reduced instruction set computing ,Computer science ,business.industry ,Pipeline (computing) ,Application-specific instruction-set processor ,Memory bandwidth ,Instruction set ,Computer architecture ,Deinterlacing ,Very long instruction word ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,business ,Computer hardware - Abstract
We present techniques used to create a high performance application-specific instruction-set processor (ASIP) implementation of the Pattern-Based Directional Interpolation (PBDI) intra-field deinterlacing algorithm. The proposed techniques focus primarily on an efficient utilization of the available memory bandwidth. They include the use of Very Long Instruction Words (VLIW) and an appropriate choice of custom instructions and application-specific registers in order to form a processing pipeline. We report a speedup factor of 1351 in comparison with a software-only implementation of the algorithm running on a general-purpose 32-bit RISC processor.
- Published
- 2009
45. A design methodology for the implementation of embedded vehicle navigation systems
- Author
-
Azizul Islam, Aboelmagd Noureldin, and J. M. Pierre Langlois
- Subjects
Engineering ,business.industry ,Real-time computing ,Navigation system ,Gyroscope ,Kalman filter ,law.invention ,Microprocessor ,law ,Global Positioning System ,Static random-access memory ,Cache ,business ,Field-programmable gate array ,Computer hardware - Abstract
This paper presents a design methodology for the implementation of GPS/INS navigation system on Field Programmable Gate Arrays (FPGA). The method proposed in this research is examined using data from three-axis accelerometers and gyroscopes integrated with GPS for a road test experiment in a land vehicle. The designs are described in software which is executed on an embedded microprocessor. Results show that the decentralized closed loop Kalman filter algorithm running on a single precision floating point embedded processor can produce acceptable results relative to those obtained from a desktop PC platform. A first version of the Kalman filter code (used for the optimal integration of INS and GPS) is executed from a 1 MB external SRAM supported by 8 KB of data cache and 4 KB of instruction cache. A second version is run from high speed 64 KB on-chip Block RAM. In the two memory configurations, the maximum sampling frequencies at which the code can be executed are 80 Hz and 119 Hz, respectively, while accelerometer and gyroscope sensors provide data at 75 Hz. Additionally, from the post synthesis timing analyses, the critical frequencies for the two hardware configurations were 63.3 MHz and 83.2 MHz, an enhancement of 26% and 66% respectively over the reference clock of 50 MHz.
- Published
- 2009
46. Iterative design method for video processors based on an architecture design language and its application to ELA deinterlacing
- Author
-
Yvon Savaria, G.-A.B. Ngoyi, and J. M. Pierre Langlois
- Subjects
Instruction set ,Reduced instruction set computing ,Computer architecture ,Iterative design ,Deinterlacing ,Computer science ,Application-specific instruction-set processor ,Process design ,Algorithm design ,Engineering design process - Abstract
This paper presents a design methodology for dedicated real-time video processors. The methodology begins with a basic processor that is progressively morphed into a specialized processor through five systematic steps. It differs from standard methodologies for ASIP design which place exclusive emphasis on the extension of the instruction set. The proposed methodology takes a global look at various processor and system considerations. The last step consists of removing unnecessary functionality from the instruction set. The required flexibility is attained by the use of an architectural description language. We use a basic deinterlacing algorithm to demonstrate the effectiveness of the methodology and present details of the various phases of the design process. Using ELA deinterlacing as a benchmark, the final processor uses 20% fewer logic elements, achieves a global acceleration by a factor of 11, and an improvement in area-delay product of 14, with respect to the basic processor.
- Published
- 2008
47. Low noise silicon CMOS single-electron transistors and electron pumps
- Author
-
M. Pierre, Marc Sanquer, Xavier Jehl, Simon Deleonibus, Gabriel Molas, Bernard Previtali, and Maud Vinet
- Subjects
Materials science ,Silicon ,business.industry ,Transistor ,Doping ,Nanowire ,chemistry.chemical_element ,Nanotechnology ,Hardware_PERFORMANCEANDRELIABILITY ,Electron ,law.invention ,Low noise ,CMOS ,chemistry ,Hardware_GENERAL ,law ,Modulation ,Hardware_INTEGRATEDCIRCUITS ,Optoelectronics ,business ,Hardware_LOGICDESIGN - Abstract
We design and fabricate single-electron transistors and electron pumps within an industrial CMOS platform. Based on silicon nanowire transistors, these devices allow very simple and stable single-electron control thanks to doping modulation along the wire.
- Published
- 2008
48. Characterization of a single resonant charge in a silicon nanowire device
- Author
-
M. Pierre, Simon Deleonibus, M. Vinet, M. Sanquer, G. Molas, and Xavier Jehl
- Subjects
Materials science ,Silicon ,Condensed matter physics ,Nanowire ,chemistry.chemical_element ,Coulomb blockade ,Charge (physics) ,Electron ,Electrometer ,Condensed Matter::Mesoscopic Systems and Quantum Hall Effect ,chemistry ,Quantum dot ,Coulomb ,Atomic physics - Abstract
We investigate the time-dependent transport properties of two very asymmetric coupled quantum dots: a single resonant charge and an electrometer made of a gated silicon nanowire in the Coulomb blockade regime. The occupation probability of the charge trap is obtained by noise measurements. We observe the predicted smearing of the Coulomb peaks at the resonance, the back-action of the electrometer on the single charge as well as a relatively large dip in the charging energy of the whole system.
- Published
- 2008
49. A video stream processor for real-time detection and correction of specular reflections in endoscopic images
- Author
-
J. M. Pierre Langlois, Farida Cheriet, and S. Tchoulack
- Subjects
Pixel ,Image quality ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Inpainting ,Stream processing ,NTSC ,Histogram ,Computer vision ,Artificial intelligence ,Field-programmable gate array ,business ,Auxiliary memory - Abstract
This paper presents the architecture and FPGA implementation of a video processor for detection and correction of specular reflections in endoscopic images by using an inpainting algorithm. Stream processing and parallelism are used to exceed real-time performance on NTSC format video without the need for an external memory. The system was implemented in a XC2VP30 FPGA and uses 91% of available slices. Image quality is significantly enhanced.
- Published
- 2008
50. A Threshold-Based Deinterlacing Algorithm Using Motion Compensation and Directional Interpolation
- Author
-
J. M. Pierre Langlois, Yvon Savaria, and Hossein Mahvash Mohammadi
- Subjects
Motion compensation ,Optimal matching ,Threshold limit value ,business.industry ,Estimator ,Quarter-pixel motion ,Deinterlacing ,Motion estimation ,Median filter ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Mathematics - Abstract
In this paper we propose a new deinterlacing algorithm using motion compensation and directional interpolation. To limit the propagation error that is a major drawback of conventional motion compensated methods, motion estimation is performed using original lines only, for same and opposite parity fields. In addition, a threshold value is used during the search to recognize situations where the motion estimator fails to find an optimal matching block. Enhanced edge-based line average with median filtering is used in these situations. Experimental results show that the proposed method performs better than the traditional motion compensated method, based on objective and subjective criteria.
- Published
- 2006
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.