76 results on '"Tobi Delbruck"'
Search Results
2. A 23μW Solar-Powered Keyword-Spotting ASIC with Ring-Oscillator-Based Time-Domain Feature Extraction
- Author
-
Kwantae Kim, Chang Gao, Rui Graca, Ilya Kiselev, Hoi-Jun Yoo, Tobi Delbruck, Shih-Chii Liu, and University of Zurich
- Subjects
2208 Electrical and Electronic Engineering ,570 Life sciences ,biology ,2504 Electronic, Optical and Magnetic Materials ,10194 Institute of Neuroinformatics - Published
- 2022
3. Event-Based Vision: A Survey
- Author
-
Guillermo Gallego, Brian Taba, Chiara Bartolozzi, Andrew J. Davison, Kostas Daniilidis, Andrea Censi, Stefan Leutenegger, Davide Scaramuzza, Jörg Conradt, Tobi Delbruck, Garrick Orchard, and University of Zurich
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,10009 Department of Informatics ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,000 Computer science, knowledge & systems ,Machine Learning (cs.LG) ,Computer Science - Robotics ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,High dynamic range ,Feature detection (computer vision) ,10194 Institute of Neuroinformatics ,Spiking neural network ,bio-inspired vision ,low power ,Pixel ,Event (computing) ,business.industry ,Event cameras ,asynchronous sensor ,low latency ,high dynamic range ,Applied Mathematics ,Motion blur ,Process (computing) ,Robotics ,Artificial Intelligence (cs.AI) ,Computational Theory and Mathematics ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Neural Networks, Computer ,business ,Robotics (cs.RO) ,Software ,Algorithms - Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μ s), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world., IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (1), ISSN:0162-8828, ISSN:1939-3539
- Published
- 2022
4. Event-Driven Sensing for Efficient Perception: Vision and Audition Algorithms
- Author
-
Shih-Chii Liu, Enea Ceolini, Adrian E. G. Huber, Bodo Rueckauer, Tobi Delbruck, and University of Zurich
- Subjects
Signal processing ,Event (computing) ,Computer science ,2208 Electrical and Electronic Engineering ,Applied Mathematics ,media_common.quotation_subject ,Process (computing) ,Binary number ,020206 networking & telecommunications ,02 engineering and technology ,2604 Applied Mathematics ,Perception ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Signal processing algorithms ,1711 Signal Processing ,Electrical and Electronic Engineering ,Representation (mathematics) ,Algorithm ,10194 Institute of Neuroinformatics ,media_common ,Electronic circuit - Abstract
Event sensors implement circuits that capture partial functionality of biological sensors, such as the retina and cochlea. As with their biological counterparts, event sensors are drivers of their own output. That is, they produce dynamically sampled binary events to dynamically changing stimuli. Algorithms and networks that process this form of output representation are still in their infancy, but they show strong promise. This article illustrates the unique form of the data produced by the sensors and demonstrates how the properties of these sensor outputs make them useful for power-efficient, low-latency systems working in real time.
- Published
- 2019
5. v2e: From Video Frames to Realistic DVS Events
- Author
-
Yuhuang Hu, Tobi Delbruck, Shih-Chii Liu, and University of Zurich
- Subjects
FOS: Computer and information sciences ,1707 Computer Vision and Pattern Recognition ,Event (computing) ,Computer science ,business.industry ,2208 Electrical and Electronic Engineering ,Computer Vision and Pattern Recognition (cs.CV) ,Motion blur ,Bandwidth (signal processing) ,Cognitive neuroscience of visual object recognition ,Latency (audio) ,Computer Science - Computer Vision and Pattern Recognition ,Visualization ,Pattern recognition (psychology) ,570 Life sciences ,biology ,Computer vision ,Artificial intelligence ,Noise (video) ,business ,10194 Institute of Neuroinformatics - Abstract
To help meet the increasing need for dynamic vision sensor (DVS) event camera data, this paper proposes the v2e toolbox that generates realistic synthetic DVS events from intensity frames. It also clarifies incorrect claims about DVS motion blur and latency characteristics in recent literature. Unlike other toolboxes, v2e includes pixel-level Gaussian event threshold mismatch, finite intensity-dependent bandwidth, and intensity-dependent noise. Realistic DVS events are useful in training networks for uncontrolled lighting conditions. The use of v2e synthetic events is demonstrated in two experiments. The first experiment is object recognition with N-Caltech 101 dataset. Results show that pretraining on various v2e lighting conditions improves generalization when transferred on real DVS data for a ResNet model. The second experiment shows that for night driving, a car detector trained with v2e events shows an average accuracy improvement of 40% compared to the YOLOv3 trained on intensity frames., Accepted at 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Third International Workshop on Event-Based Vision
- Published
- 2021
- Full Text
- View/download PDF
6. EILE: Efficient Incremental Learning on the Edge
- Author
-
Shih-Chii Liu, Chang Gao, Tobi Delbruck, Xi Chen, and University of Zurich
- Subjects
1707 Computer Vision and Pattern Recognition ,Computer science ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,Memory bandwidth ,1702 Artificial Intelligence ,Parallel computing ,Backpropagation ,Application-specific integrated circuit ,Scalability ,1705 Computer Networks and Communications ,Memory footprint ,570 Life sciences ,biology ,Enhanced Data Rates for GSM Evolution ,Field-programmable gate array ,Throughput (business) ,10194 Institute of Neuroinformatics - Abstract
This paper proposes a fully-connected network training architecture called EILE targeting incremental learning on edge. By using a novel reconfigurable processing element (PE) architecture, EILE avoids explicit transposition of weight matrices required for backpropagation to preserve the same efficient memory access pattern for both the forward (FP) and backward propagation (BP) phases. Experimental results on a Zynq XC7Z100 FPGA with 64 PEs show that EILE achieves 19.2 GOp/s peak throughput and maintains nearly 100 % PE utilization efficiency for both FP and BP with batch sizes from 1 to 32. EILE’s small on-chip memory footprint and scalability to match any available off-chip memory bandwidth makes it an attractive ASIC architecture for energy-constrained training.
- Published
- 2021
7. Reducing latency in a converted spiking video segmentation network
- Author
-
Shih-Chii Liu, Qinyu Cheni, Bodo Rueckauer, Tobi Delbruck, Li Li, and University of Zurich
- Subjects
Spiking neural network ,Artificial neural network ,Computer science ,business.industry ,Frame (networking) ,Latency (audio) ,Pattern recognition ,Cognitive artificial intelligence ,Image segmentation ,Object detection ,Redundancy (engineering) ,570 Life sciences ,biology ,Artificial intelligence ,business ,Reset (computing) ,10194 Institute of Neuroinformatics - Abstract
Item does not contain fulltext Spiking Neural Networks (SNNs) can be configured to produce almost-equivalent accurate Analog Neural Networks (ANNs) by various ANN-SNN conversion methods. Most of these methods are applied to classification and object detection networks tested on frame-based datasets. In this work, we demonstrate a converted SNN for image segmentation and applied to a natural video dataset. Instead of resetting the network state with each input frame, we capitalize on the temporal redundancy between adjacent frames in a natural scene, and propose an interval reset method where the network state is reset after a fixed number of frames. We studied the trade-off between accuracy and latency with the number of interval reset frames. We also applied layer-specific normalization and early stopping to speed up network convergence and to reduce the latency. Our results show that the SNN achieved a 35.7x increase in convergence speed with only 1.5% accuracy drop using an interval reset of 20 frames. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) (Daegu, South Korea , 22-28 May 2021)
- Published
- 2021
8. Feedback control of event cameras
- Author
-
Rui Graca, Tobi Delbruck, Marcin Paluch, and University of Zurich
- Subjects
FOS: Computer and information sciences ,Bandwidth management ,Pixel ,Automatic control ,1707 Computer Vision and Pattern Recognition ,business.industry ,Event (computing) ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,2208 Electrical and Electronic Engineering ,Computer Science - Computer Vision and Pattern Recognition ,Noise ,Variable (computer science) ,Control theory ,Range (statistics) ,Bandwidth (computing) ,570 Life sciences ,biology ,sense organs ,Artificial intelligence ,business ,10194 Institute of Neuroinformatics - Abstract
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parameter settings. Biases must be adjusted to match application requirements and the optimal settings depend on many factors. As a first step towards automatic control of biases, this paper proposes fixed-step feedback controllers that use measurements of event rate and noise. The controllers regulate the event rate within an acceptable range using threshold and refractory period control, and regulate noise using bandwidth control. Experiments demonstrate model validity and feedback control., Accepted at 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Third International Workshop on Event-Based Vision
- Published
- 2021
- Full Text
- View/download PDF
9. DDD20 End-to-End Event Camera Driving Dataset: Fusing Frames and Events with Deep Learning for Improved Steering Prediction
- Author
-
Tobi Delbruck, Jonathan Binas, Yuhuang Hu, Shih-Chii Liu, Daniel Neil, and University of Zurich
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Brightness ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,1702 Artificial Intelligence ,02 engineering and technology ,020901 industrial engineering & automation ,End-to-end principle ,0202 electrical engineering, electronic engineering, information engineering ,1802 Information Systems and Management ,Computer vision ,10194 Institute of Neuroinformatics ,CMOS sensor ,business.industry ,Event (computing) ,Deep learning ,Frame (networking) ,Steering wheel ,1801 Decision Sciences (miscellaneous) ,Neuromorphic engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,2611 Modeling and Simulation ,3304 Education - Abstract
Neuromorphic event cameras are useful for dynamic vision problems under difficult lighting conditions. To enable studies of using event cameras in automobile driving applications, this paper reports a new end-to-end driving dataset called DDD20. The dataset was captured with a DAVIS camera that concurrently streams both dynamic vision sensor (DVS) brightness change events and active pixel sensor (APS) intensity frames. DDD20 is the longest event camera end-to-end driving dataset to date with 51h of DAVIS event+frame camera and vehicle human control data collected from 4000km of highway and urban driving under a variety of lighting conditions. Using DDD20, we report the first study of fusing brightness change events and intensity frame data using a deep learning approach to predict the instantaneous human steering wheel angle. Over all day and night conditions, the explained variance for human steering prediction from a Resnet-32 is significantly better from the fused DVS+APS frames (0.88) than using either DVS (0.67) or APS (0.77) data alone., Accepted in The 23rd IEEE International Conference on Intelligent Transportation Systems (Special Session: Beyond Traditional Sensing for Intelligent Transportation)
- Published
- 2020
10. Recurrent Neural Network Control of a Hybrid Dynamical Transfemoral Prosthesis with EdgeDRNN Accelerator
- Author
-
Rachel Gehlhar, Shih-Chii Liu, Chang Gao, Tobi Delbruck, Aaron D. Ames, and University of Zurich
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,0209 industrial biotechnology ,Inference ,PID controller ,2207 Control and Systems Engineering ,1702 Artificial Intelligence ,Systems and Control (eess.SY) ,02 engineering and technology ,010501 environmental sciences ,Dynamical system ,Electrical Engineering and Systems Science - Systems and Control ,01 natural sciences ,Computer Science - Robotics ,020901 industrial engineering & automation ,Control theory ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Simulation ,0105 earth and related environmental sciences ,10194 Institute of Neuroinformatics ,2208 Electrical and Electronic Engineering ,Work (physics) ,1712 Software ,Recurrent neural network ,Trajectory ,570 Life sciences ,biology ,Robotics (cs.RO) ,Energy (signal processing) - Abstract
Lower leg prostheses could improve the life quality of amputees by increasing comfort and reducing energy to locomote, but currently control methods are limited in modulating behaviors based upon the human's experience. This paper describes the first steps toward learning complex controllers for dynamical robotic assistive devices. We provide the first example of behavioral cloning to control a powered transfemoral prostheses using a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) running on a custom hardware accelerator that exploits temporal sparsity. The RNN is trained on data collected from the original prosthesis controller. The RNN inference is realized by a novel EdgeDRNN accelerator in real-time. Experimental results show that the RNN can replace the nominal PD controller to realize end-to-end control of the AMPRO3 prosthetic leg walking on flat ground and unforeseen slopes with comparable tracking accuracy. EdgeDRNN computes the RNN about 240 times faster than real time, opening the possibility of running larger networks for more complex tasks in the future. Implementing an RNN on this real-time dynamical system with impacts sets the ground work to incorporate other learned elements of the human-prosthesis system into prosthesis control., Comment: Accepted at 2020 International Conference on Robotics and Automation (ICRA 2020)
- Published
- 2020
11. EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference
- Author
-
Shih-Chii Liu, Xi Chen, Antonio Rios-Navarro, Tobi Delbruck, Chang Gao, University of Zurich, and Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores
- Subjects
Signal Processing (eess.SP) ,0209 industrial biotechnology ,Computer science ,GRU ,1702 Artificial Intelligence ,02 engineering and technology ,Parallel computing ,USB ,RNN ,law.invention ,embedded system ,020901 industrial engineering & automation ,edge computing ,law ,FOS: Electrical engineering, electronic engineering, information engineering ,1706 Computer Science Applications ,0202 electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Field-programmable gate array ,FPGA ,Edge computing ,10194 Institute of Neuroinformatics ,Spiking neural network ,business.industry ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,Deep learning ,delta network ,deep learning ,Recurrent neural network ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million parameter 2-layer GRU-RNN, with weights stored in DRAM, show that EdgeDRNN computes them in under 0.5 ms. With 2.42 W wall plug power on an entry level USB powered FPGA board, it achieves latency comparable with a 92 W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 6X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2 GOp/s and a wall plug power efficiency that is over 4X higher than all other platforms., Comment: This paper has been accepted for publication at the IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genoa, 2020
- Published
- 2020
12. Incremental Learning of Hand Symbols Using Event-Based Cameras
- Author
-
Shih-Chii Liu, Tobi Delbruck, Iulia-Alexandra Lungu, University of Zurich, and Lungu, Iulia Alexandra
- Subjects
0209 industrial biotechnology ,Computer science ,business.industry ,Event (computing) ,Deep learning ,2208 Electrical and Electronic Engineering ,Latency (audio) ,02 engineering and technology ,Object (computer science) ,Frame rate ,Convolutional neural network ,Symbol (chemistry) ,020901 industrial engineering & automation ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,10194 Institute of Neuroinformatics - Abstract
Conventional cameras create redundant output especially when the frame rate is high. Dynamic vision sensors (DVSs), on the other hand, generate asynchronous and sparse brightness change events only when an object in the field of view is in motion. Such event-based output can be processed as a 1D time sequence, or it can be converted to 2D frames that resemble conventional camera frames. Frames created, e.g., by accumulating a fixed number of events, can be used as input for conventional deep learning algorithms, thus upgrading existing computer vision pipelines through low-power, low-redundancy sensors. This paper describes a hand symbol recognition system that can quickly be trained to incrementally learn new symbols recorded with an event-based camera, without forgetting previously learned classes. By using the iCaRL incremental learning algorithm, we show that we can learn up to 16 new symbols using only 4000 samples for each symbol and achieving a final symbol accuracy of over 80%. The system achieves latency of under 0.5s and training requires 3 minutes for 5 epochs on an NVIDIA 1080TI GPU.
- Published
- 2019
13. Temperature and Parasitic Photocurrent Effects in Dynamic Vision Sensors
- Author
-
Tobi Delbruck, Yuji Nozaki, University of Zurich, and Delbruck, Tobi
- Subjects
Photocurrent ,business.industry ,2208 Electrical and Electronic Engineering ,Photoconductivity ,020208 electrical & electronic engineering ,Temperature independent ,2504 Electronic, Optical and Magnetic Materials ,junction leakage ,CMOS image sensors ,photocurrent ,dark current ,vision sensor ,02 engineering and technology ,Temperature measurement ,Electronic, Optical and Magnetic Materials ,Photodiode ,law.invention ,law ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Optoelectronics ,020201 artificial intelligence & image processing ,Quantum efficiency ,Electrical and Electronic Engineering ,business ,10194 Institute of Neuroinformatics ,Leakage (electronics) ,Dark current - Abstract
The effect of temperature and parasitic photocurrent on event-based dynamic vision sensors (DVS) is important because of their application in uncontrolled robotic, automotive, and surveillance applications. This paper considers the temperature dependence of DVS threshold temporal contrast (TC), dark current, and background activity caused by junction leakage. New theory shows that if bias currents have a constant ratio, then ideally the DVS threshold TC is temperature independent, but the presence of temperature dependent junction leakage currents causes nonideal behavior at elevated temperature. Both measured photodiode dark current and leakage induced event activity follow Arhenius activation. This paper also defines a new metric for parasitic photocurrent quantum efficiency and measures the sensitivity of DVS pixels to parasitic photocurrent. ISSN:0018-9383 ISSN:1557-9646
- Published
- 2017
14. DHP19: Dynamic Vision Sensor 3D Human Pose Dataset
- Author
-
Luca Longinotti, Sophie Skriabine, Gemma Taverni, Christopher Awai Easthope, Enrico Calabrese, Kynan Eng, Federico Corradi, Tobi Delbruck, and University of Zurich
- Subjects
1707 Computer Vision and Pattern Recognition ,Pixel ,business.industry ,2208 Electrical and Electronic Engineering ,Deep learning ,020208 electrical & electronic engineering ,Frame (networking) ,02 engineering and technology ,Solid modeling ,3D pose estimation ,Convolutional neural network ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Pose ,10194 Institute of Neuroinformatics - Abstract
Human pose estimation has dramatically improved thanks to the continuous developments in deep learning. However, marker-free human pose estimation based on standard frame-based cameras is still slow and power hungry for real-time feedback interaction because of the huge number of operations necessary for large Convolutional Neural Network (CNN) inference. Event-based cameras such as the Dynamic Vision Sensor (DVS) quickly output sparse moving-edge information. Their sparse and rapid output is ideal for driving low-latency CNNs, thus potentially allowing real-time interaction for human pose estimators. Although the application of CNNs to standard frame-based cameras for human pose estimation is well established, their application to event-based cameras is still under study. This paper proposes a novel benchmark dataset of human body movements, the Dynamic Vision Sensor Human Pose dataset (DHP19). It consists of recordings from 4 synchronized 346x260 pixel DVS cameras, for a set of 33 movements with 17 subjects. DHP19 also includes a 3D pose estimation model that achieves an average 3D pose estimation error of about 8 cm, despite the sparse and reduced input data from the DVS.
- Published
- 2019
15. A 132 by 104 10μm-Pixel 250μW 1kefps Dynamic Vision Sensor with Pixel-Parallel Noise and Spatial Redundancy Suppression
- Author
-
Tobi Delbruck, Chenghan Li, Federico Corradi, Luca Longinotti, and University of Zurich
- Subjects
Pixel ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,2504 Electronic, Optical and Magnetic Materials ,020206 networking & telecommunications ,02 engineering and technology ,Frame rate ,Chip ,Power (physics) ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,570 Life sciences ,biology ,Noise (video) ,business ,Throughput (business) ,Computer hardware ,Frame rate control ,10194 Institute of Neuroinformatics - Abstract
This paper reports a 132 by 104 dynamic vision sensor (DVS) with $10 \mu \mathrm{m}$ pixel in a 65nm logic process and a synchronous address-event representation (SAER) readout capable of 180Meps throughput. The SAER architecture allows adjustable event frame rate control and supports pre-readout pixel-parallel noise and spatial redundancy suppression. The chip consumes $250 \mu \mathrm{W}$ with 100keps running at 1k event frames per second (efps), 3-5 times more power efficient than the prior art using normalized power metrics. The chip is aimed for low power IoT and real-time high-speed smart vision applications.
- Published
- 2019
16. CNN-based Object Detection on Low Precision Hardware: Racing Car Case Study
- Author
-
Nicolo De Rita, Alessandro Aimar, Tobi Delbruck, and University of Zurich
- Subjects
Artificial neural network ,business.industry ,Deep learning ,020208 electrical & electronic engineering ,Detector ,02 engineering and technology ,Convolutional neural network ,Power budget ,Object detection ,2203 Automotive Engineering ,1706 Computer Science Applications ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Hardware acceleration ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Field-programmable gate array ,Computer hardware ,10194 Institute of Neuroinformatics ,2611 Modeling and Simulation - Abstract
Increasing interest in deep learning and convolutional neural networks resulted in the last years in multiple techniques aiming to improve their accuracy, training speed, and inference speed. At the same time, their computational cost triggered the design of several dedicated hardware architectures, aiming to handle the elevated number of operations neural networks require with minimal power budget, often exploiting reduced precision arithmetic. In this case study, we analyzed how several techniques can be merged together in the design of a track detector for a self-driving racing car, illustrating a step-by-step procedure required to adapt several theoretical works to a real-world scenario. Compared with the best previous detector, the new Proteins cone detector is optimized for low-precision deep learning accelerators. It runs 50% faster on GPU than the previous detector and at a simulated 272.5 FPS on a 1 W ASIC or at 16.4 FPS on a 12 W FPGA and achieves a detection score 16% higher than the previous implementation.
- Published
- 2019
17. Lip Reading Deep Network Exploiting Multi-Modal Spiking Visual and Auditory Sensors
- Author
-
Xiaoya Li, Shih-Chii Liu, Daniel Neil, Tobi Delbruck, and University of Zurich
- Subjects
Modality (human–computer interaction) ,Artificial neural network ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Feature extraction ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Grid ,Sensor fusion ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,020201 artificial intelligence & image processing ,Spike (software development) ,Computer vision ,Artificial intelligence ,Mel-frequency cepstrum ,business ,10194 Institute of Neuroinformatics - Abstract
This work presents a lip reading deep neural network that fuses the asynchronous spiking outputs of two bio-inspired silicon multimodal sensors: the Dynamic Vision Sensor (DVS) and the Dynamic Audio Sensor (DAS). The fusion network is tested on the GRID visual-audio lipreading dataset. Classification is carried out using event-based features generated from the spikes of the DVS and DAS. Networks are trained separately on the two modalities and also jointly trained on both modalities. The jointly trained network when tested on DVS spike frames alone, showed a relative increase in accuracy of around 23% over that of the single DVS modality network.
- Published
- 2019
18. Live Demonstration: Real-Time Spoken Digit Recognition using the DeltaRNN Accelerator
- Author
-
Shih-Chii Liu, Chang Gao, Jithendar Anumula, Stefan Braun, Ilya Kiselev, Tobi Delbruck, and University of Zurich
- Subjects
business.product_category ,Microphone ,Speech recognition ,Computation ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Word error rate ,020206 networking & telecommunications ,02 engineering and technology ,Recurrent neural network ,Laptop ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Digit recognition ,Latency (engineering) ,Field-programmable gate array ,business ,10194 Institute of Neuroinformatics - Abstract
This demonstration shows a real-time continuous speech recognition hardware system using our previously published DeltaRNN accelerator that enables low latency recurrent neural network (RNN) computation. The network is trained on augmented audio samples from the TIDIGITS dataset to achieve a label error rate (LER) of 2.31%. It is implemented on a Xilinx Zynq-7100 FPGA running at 1 MHz. The incremental RNN power consumption is 30 mW. Visitors interact with the system by speaking digits into a microphone connected to the FPGA system and the classification outputs of the network are continuously displayed on a laptop screen in real time.
- Published
- 2019
19. Incremental Learning Meets Reduced Precision Networks
- Author
-
Tobi Delbruck, Shih-Chii Liu, Yuhuang Hu, and University of Zurich
- Subjects
Neuroinformatics ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,Image (mathematics) ,Hardware ,0202 electrical engineering, electronic engineering, information engineering ,Training ,Quantization (signal) ,10194 Institute of Neuroinformatics ,Artificial neural network ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Training methods ,020202 computer hardware & architecture ,Incremental learning ,Deep neural networks ,570 Life sciences ,biology ,Artificial intelligence ,business ,computer ,Neural networks ,Efficient energy use - Abstract
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more energy efficient than the equivalent full precision networks. While many studies have focused on reduced precision training methods for supervised networks with the availability of large datasets, less work has been reported on incremental learning algorithms that adapt the network for new classes and the consequence of reduced precision has on these algorithms. This paper presents an empirical study of how reduced precision training methods impact the iCARL incremental learning algorithm. The incremental network accuracies on the CIFAR-100 image dataset show that weights can be quantized to 1 bit (2.39% drop in accuracy) but when activations are quantized to 1 bit, the accuracy drops much more (12.75%). Quantizing gradients from 32 to 8 bits only affects the accuracies of the trained network by less than 1%. These results are encouraging for hardware accelerators that support incremental learning algorithms.
- Published
- 2019
20. Real-Time Speech Recognition for IoT Purpose using a Delta Recurrent Neural Network Accelerator
- Author
-
Ilya Kiselev, Stefan Braun, Shih-Chii Liu, Tobi Delbruck, Chang Gao, Jithendar Anumula, and University of Zurich
- Subjects
Microphone ,2208 Electrical and Electronic Engineering ,Speech recognition ,020208 electrical & electronic engineering ,Feature extraction ,02 engineering and technology ,Filter bank ,Recurrent neural network ,Asynchronous communication ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Latency (engineering) ,Field-programmable gate array ,Throughput (business) ,10194 Institute of Neuroinformatics - Abstract
This paper describes a continuous speech recognition hardware system that uses a delta recurrent neural network accelerator (DeltaRNN) implemented on a Xilinx Zynq-7100 FPGA to enable low latency recurrent neural network (RNN) computation. The implemented network consists of a single-layer RNN with 256 gated recurrent unit (GRU) neurons and is driven by input features generated either from the output of a filter bank running on the ARM core of the FPGA in a PmodMic3 microphone setup or from the asynchronous outputs of a spiking silicon cochlea circuit. The microphone setup achieves 7.1 ms minimum latency and 177 frames-per-second (FPS) maximum throughput while the cochlea setup achieves 2.9 ms minimum latency and 345 FPS maximum throughput. The low latency and 70 mW power consumption of the DeltaRNN makes it suitable as an IoT computing platform.
- Published
- 2019
21. Slasher: Stadium racer car for event camera end-to-end learning autonomous driving experiments
- Author
-
Hong Ming Chen, Yuhuang Hu, Tobi Delbruck, and University of Zurich
- Subjects
0209 industrial biotechnology ,Positioning system ,Event (computing) ,Computer science ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,Real-time computing ,1702 Artificial Intelligence ,02 engineering and technology ,Convolutional neural network ,Learning to control ,020901 industrial engineering & automation ,Neuromorphic engineering ,Control theory ,Joystick ,Autonomous driving ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Robotic platform ,10194 Institute of Neuroinformatics - Abstract
Slasher is the first open 1/10 scale autonomous driving platform for exploring the use of neuromorphic event cameras for fast driving in unstructured indoor and outdoor environments. Slasher features a DAVIS event-based camera and ROS computer for perception and control. The DAVIS camera provides high dynamic range, sparse output, and sub-millisecond latency output for the quick visual control needed for fast driving. A race controller and Bluetooth remote joystick are used to coordinate different processing pipelines, and a low-cost ultra-wide-band (UWB) positioning system records trajectories. The modular design of Slasher can easily integrate additional features and sensors. In this paper, we show its application in a reflexive Convolutional Neural Network (CNN) steering controller trained by end-to-end learning. We present preliminary experiments in closed-loop indoor and outdoor trail driving.
- Published
- 2019
22. Fast event-driven incremental learning of hand symbols
- Author
-
Iulia-Alexandra Lungu, Tobi Delbruck, Shih-Chii Liu, and University of Zurich
- Subjects
0209 industrial biotechnology ,Computer science ,Event (computing) ,business.industry ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,1702 Artificial Intelligence ,Context (language use) ,Robotics ,02 engineering and technology ,Frame rate ,Object (computer science) ,020901 industrial engineering & automation ,Neuromorphic engineering ,Incremental learning ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,10194 Institute of Neuroinformatics - Abstract
This paper describes a hand symbol recognition system that can quickly be trained to incrementally learn to recognize new symbols using about 100 times less data and time than by using conventional training. It is driven by frames from a Dynamic Vision Sensor (DVS) event camera. Conventional cameras have very redundant output, especially at high frame rates. Dynamic vision sensors output sparse and asynchronous brightness change events that occur when an object or the camera is moving. Images consisting of a fixed number of events from a DVS drive recognition and incremental learning of new hand symbols in the context of a RoShamBo (rock-paper-scissors) demonstration. Conventional training on the original RoShamBo dataset requires about 12.5h compute time on a desktop GPU using the 2.5 million images in the base dataset. Novel symbols that a user shows for a few tens of seconds to the system can be learned on-the-fly using the iCaRL incremental learning algorithm with 3 minutes of training time on a desktop GPU, while preserving recognition accuracy of previously trained symbols. Our system runs a residual network with 32 layers and maintains 88.4% after 100 epochs or 77% after 5 epochs overall accuracy after 4 incremental training stages. Each stage adds an additional 2 novel symbols to the base 4 symbols. The paper also reports an inexpensive robot hand used for live demonstrations of the base RoShamBo game.
- Published
- 2019
23. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
- Author
-
Tobi Delbruck, Alessandro Aimar, Moritz B. Milde, Ricardo Tapiador-Morales, Enrico Calabrese, Shih-Chii Liu, Hesham Mostafa, Federico Corradi, Alejandro Linares-Barranco, Antonio Rios-Navarro, Iulia-Alexandra Lungu, Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores, Universidad de Sevilla. TEP-108: Robótica y Tecnología de Computadores Aplicada a la Rehabilitación, and University of Zurich
- Subjects
FOS: Computer and information sciences ,Artificial intelligence ,Computer Networks and Communications ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Real-time computing ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Convolutional neural network ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Neural and Evolutionary Computing (cs.NE) ,Auxiliary memory ,FPGA ,10194 Institute of Neuroinformatics ,Very-large-scale integration ,Computer Science - Neural and Evolutionary Computing ,Computer Science Applications ,VLSI ,medicine.anatomical_structure ,Convolutional Neural Networks (CNN) ,Neuromorphic engineering ,Computer engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Neuron ,Software - Abstract
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many stateof- the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference.We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500MHz show that the VGG19 network achieves over 450GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2. As further proof of NullHop’s usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations.
- Published
- 2019
24. Live Demonstration: A Real-Time Event-Based Fast Corner Detection Demo Based on FPGA
- Author
-
Wei-Tse Kao, Tobi Delbruck, Min Liu, and University of Zurich
- Subjects
1707 Computer Vision and Pattern Recognition ,business.industry ,Event (computing) ,Event based ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Real-time computing ,Corner detection ,Scale-invariant feature transform ,02 engineering and technology ,Drone ,020202 computer hardware & architecture ,Factor (programming language) ,Component (UML) ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Artificial intelligence ,business ,Field-programmable gate array ,computer ,computer.programming_language ,10194 Institute of Neuroinformatics - Abstract
Corner detection is widely used as a pre-processing step for many computer vision (CV) problems. It is well studied in the conventional CV community and many popular methods are still used nowadays such as Harris, FAST and SIFT. For event cameras like Dynamic Vision Sensors (DVS), similar approaches also have been proposed in recent years. Two of them are event-based harris(eHARRIS) and event-based FAST (eFAST). This demo presents our recent work in which we implement eFAST on MiniZed FPGA. The power consumption of the whole system is less than 4W and the hardware eFAST consumes about 0.9W. This demo processes at least 5M events per second, and achieves a power-speed improvement factor product of more than 30X compared with CPU implementation of eFAST. This embedded component could be suitable for integration to applications such as drones and autonomous cars that produce high event rates.
- Published
- 2019
- Full Text
- View/download PDF
25. Low Latency Event-Based Filtering and Feature Extraction for Dynamic Vision Sensors in Real-Time FPGA Applications
- Author
-
Diederik Paul Moeys, Fernando Perez-Peña, F. Gomez-Rodriguez, Gabriel Jimenez-Moreno, Tobi Delbruck, Shih-Chii Liu, Alejandro Linares-Barranco, University of Zurich, Linares-Barranco, Alejandro, Ingeniería en Automática, Electrónica, Arquitectura y Redes de Computadores, Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores, and Universidad de Sevilla. TEP-108: Robótica y Tecnología de Computadores Aplicada a la Rehabilitación
- Subjects
General Computer Science ,Address-event-representation ,Computer science ,Field programmable gate arrays (FPGA) ,framefree vision ,Feature extraction ,Real-time computing ,address-event-representation (AER) ,frame-free vision ,Event-based filters ,VHDL ,eld programmable gate arrays (FPGA) ,dynamic vision ,General Materials Science ,Framefree vision ,Neuromorphic engineering ,Address-event-representation (AER) ,Dynamic vision ,Event-based processing ,1700 General Computer Science ,event-based lters ,Field-programmable gate array ,computer.programming_language ,10194 Institute of Neuroinformatics ,event-based filters ,Pixel ,General Engineering ,Filter (signal processing) ,event-based processing ,2500 General Materials Science ,Stereopsis ,Asynchronous communication ,Video tracking ,2200 General Engineering ,570 Life sciences ,biology ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 ,computer - Abstract
Dynamic Vision Sensor (DVS) pixels produce an asynchronous variable-rate address-event output that represents brightness changes at the pixel. Since these sensors produce frame-free output, they are ideal for real-time dynamic vision applications with real-time latency and power system constraints. Event-based filtering algorithms have been proposed to post-process the asynchronous event output to reduce sensor noise, extract low level features, and track objects, among others. These postprocessing algorithms help to increase the performance and accuracy of further processing for tasks such as classification using spike-based learning (ie. ConvNets), stereo vision, and visually-servoed robots, etc. This paper presents an FPGA-based library of these postprocessing event-based algorithms with implementation details; specifically background activity (noise) filtering, pixel masking, object motion detection and object tracking. The latencies of these filters on the Field Programmable Gate Array (FPGA) platform are below 300ns with an average latency reduction of 188% (maximum of 570%) over the software versions running on a desktop PC CPU. This open-source event-based filter IP library for FPGA has been tested on two different platforms and scenarios using different synthesis and implementation tools for Lattice and Xilinx vendors., IEEE Access, 7, ISSN:2169-3536
- Published
- 2019
26. EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras
- Author
-
Anton Mitrokhin, Tobi Delbruck, Cornelia Fermüller, Yiannis Aloimonos, Chengxi Ye, and University of Zurich
- Subjects
FOS: Computer and information sciences ,Ground truth ,business.industry ,Computer science ,Event (computing) ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Robotics ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Pipeline (software) ,Motion capture ,Depth map ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,Artificial intelligence ,business ,0105 earth and related environmental sciences ,10194 Institute of Neuroinformatics - Abstract
We present the first event-based learning approach for motion segmentation in indoor scenes and the first event-based dataset - EV-IMO - which includes accurate pixel-wise motion masks, egomotion and ground truth depth. Our approach is based on an efficient implementation of the SfM learning pipeline using a low parameter neural network architecture on event data. In addition to camera egomotion and a dense depth map, the network estimates pixel-wise independently moving object segmentation and computes per-object 3D translational velocities for moving objects. We also train a shallow network with just 40k parameters, which is able to compute depth and egomotion. Our EV-IMO dataset features 32 minutes of indoor recording with up to 3 fast moving objects simultaneously in the camera field of view. The objects and the camera are tracked by the VICON motion capture system. By 3D scanning the room and the objects, accurate depth map ground truth and pixel-wise object masks are obtained, which are reliable even in poor lighting conditions and during fast motion. We then train and evaluate our learning pipeline on EV-IMO and demonstrate that our approach far surpasses its rivals and is well suited for scene constrained robotics applications., Comment: 8 pages, 6 figures. Submitted to 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)
- Published
- 2019
27. Biological Goal Seeking
- Author
-
P. Vance, Dermot Kerr, Tobi Delbruck, Emmett Kerr, TM McGinnity, Gautham P. Das, Diederik Paul Moeys, Sonya Coleman, and University of Zurich
- Subjects
0209 industrial biotechnology ,Robot kinematics ,Finite-state machine ,Computer science ,Goal seeking ,020208 electrical & electronic engineering ,1702 Artificial Intelligence ,Mobile robot ,02 engineering and technology ,020901 industrial engineering & automation ,Human–computer interaction ,Obstacle avoidance ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Robot ,Motion planning ,2614 Theoretical Computer Science ,Collision avoidance ,10194 Institute of Neuroinformatics - Abstract
Obstacle avoidance is a critical aspect of control for a mobile robot when navigating towards a goal in an unfamiliar environment. Where traditional methods for obstacle avoidance depend heavily on path planning to reach a specific goal location whilst avoiding obstacles, the method proposed uses an adaption of a potential fields algorithm to successfully avoid obstacles whilst the robot is being guided to a non-specific goal location. Details of a generalised finite state machine based control algorithm that enable the robot to pursue a dynamic goal location, whilst avoiding obstacles and without the need for specific path planning, are presented. We have developed a novel potential fields algorithm for obstacle avoidance for use within Robot Operating Software (ROS) and made it available for download within the open source community. We evaluated the control algorithm in a high-speed predator-prey scenario in which the predator could successfully catch the moving prey whilst avoiding collision with all obstacles within the environment.
- Published
- 2018
28. Authors Reply to Comment on Temperature and Parasitic Photocurrent Effects in Dynamic Vision Sensors
- Author
-
Yuji Nozaki, Tobi Delbruck, University of Zurich, and Delbruck, Tobi
- Subjects
010302 applied physics ,Physics ,Photocurrent ,Leak ,business.industry ,Photoconductivity ,2208 Electrical and Electronic Engineering ,2504 Electronic, Optical and Magnetic Materials ,02 engineering and technology ,Spotting ,01 natural sciences ,Electronic, Optical and Magnetic Materials ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Optoelectronics ,Temporal contrast ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,business ,10194 Institute of Neuroinformatics - Abstract
We thank the reviewers for their careful analysis of [1] , especially for spotting two errors in the formulas for inferring temporal contrast effects of leak and parasitic photocurrent. The revised results increase the values of the inferred parasitic leak currents by a factor of about $11\times $ .
- Published
- 2018
29. DeltaRNN
- Author
-
Chang Gao, Enea Ceolini, Daniel Neil, Tobi Delbruck, Shih-Chii Liu, University of Zurich, and Neil, Daniel
- Subjects
Speedup ,business.industry ,Computer science ,1708 Hardware and Architecture ,2208 Electrical and Electronic Engineering ,Deep learning ,Process (computing) ,02 engineering and technology ,020202 computer hardware & architecture ,Recurrent neural network ,Computer engineering ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Hardware acceleration ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Field-programmable gate array ,Electrical efficiency ,Throughput (business) ,10194 Institute of Neuroinformatics - Abstract
Recurrent Neural Networks (RNNs) are widely used in speech recognition and natural language processing applications because of their capability to process temporal sequences. Because RNNs are fully connected, they require a large number of weight memory accesses, leading to high power consumption. Recent theory has shown that an RNN delta network update approach can reduce memory access and computes with negligible accuracy loss. This paper describes the implementation of this theoretical approach in a hardware accelerator called "DeltaRNN" (DRNN). The DRNN updates the output of a neuron only when the neuron»s activation changes by more than a delta threshold. It was implemented on a Xilinx Zynq-7100 FPGA. FPGA measurement results from a single-layer RNN of 256 Gated Recurrent Unit (GRU) neurons show that the DRNN achieves 1.2 TOp/s effective throughput and 164 GOp/s/W power efficiency. The delta update leads to a 5.7x speedup compared to a conventional RNN update because of the sparsity created by the DN algorithm and the zero-skipping ability of DRNN.
- Published
- 2018
30. A Sensitive Dynamic and Active Pixel Vision Sensor for Color or Neural Imaging Applications
- Author
-
Luca Longinotti, Chenghan Li, Diederik Paul Moeys, Stewart Berry, Federico Corradi, Fabian F. Voigt, Tobi Delbruck, Fritjof Helmchen, Simeon A. Bamford, Gemma Taverni, University of Zurich, and Moeys, Diederik Paul
- Subjects
Preamplifier ,Biomedical Engineering ,2204 Biomedical Engineering ,Neuroimaging ,610 Medicine & health ,02 engineering and technology ,Signal-To-Noise Ratio ,01 natural sciences ,Cell Line ,010309 optics ,Mice ,0103 physical sciences ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,Animals ,Computer vision ,Sensitivity (control systems) ,Electrical and Electronic Engineering ,Image sensor ,10194 Institute of Neuroinformatics ,Neurons ,CMOS sensor ,Pixel ,10242 Brain Research Institute ,business.industry ,Dynamic range ,2208 Electrical and Electronic Engineering ,Optical Imaging ,020208 electrical & electronic engineering ,Fixed-pattern noise ,570 Life sciences ,biology ,Color filter array ,Artificial intelligence ,business ,Color Perception - Abstract
Applications requiring detection of small visual contrast require high sensitivity. Event cameras can provide higher dynamic range (DR) and reduce data rate and latency, but most existing event cameras have limited sensitivity. This paper presents the results of a 180-nm Towerjazz CIS process vision sensor called SDAVIS192. It outputs temporal contrast dynamic vision sensor (DVS) events and conventional active pixel sensor frames. The SDAVIS192 improves on previous DAVIS sensors with higher sensitivity for temporal contrast. The temporal contrast thresholds can be set down to 1% for negative changes in logarithmic intensity (OFF events) and down to 3.5% for positive changes (ON events). The achievement is possible through the adoption of an in-pixel preamplification stage. This preamplifier reduces the effective intrascene DR of the sensor (70 dB for OFF and 50 dB for ON), but an automated operating region control allows up to at least 110-dB DR for OFF events. A second contribution of this paper is the development of characterization methodology for measuring DVS event detection thresholds by incorporating a measure of signal-to-noise ratio (SNR). At average SNR of 30 dB, the DVS temporal contrast threshold fixed pattern noise is measured to be 0.3%-0.8% temporal contrast. Results comparing monochrome and RGBW color filter array DVS events are presented. The higher sensitivity of SDAVIS192 make this sensor potentially useful for calcium imaging, as shown in a recording from cultured neurons expressing calcium sensitive green fluorescent protein GCaMP6f.
- Published
- 2018
31. Feature Representations for Neuromorphic Audio Spike Streams
- Author
-
Jithendar Anumula, Daniel Neil, Tobi Delbruck, Shih-Chii Liu, University of Zurich, and Anumula, Jithendar
- Subjects
exponential kernels ,Quantitative Biology::Neurons and Cognition ,General Neuroscience ,dynamic audio sensor ,spike feature generation ,recurrent neural network ,audio word classification ,2800 General Neuroscience ,lcsh:RC321-571 ,570 Life sciences ,biology ,lcsh:Neurosciences. Biological psychiatry. Neuropsychiatry ,10194 Institute of Neuroinformatics ,Neuroscience ,Original Research - Abstract
Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset., Frontiers in Neuroscience, 12, ISSN:1662-453X, ISSN:1662-4548
- Published
- 2018
32. Approaching Retinal Ganglion Cell Modeling and FPGA Implementation for Robotics
- Author
-
Antonio Rios-Navarro, Alejandro Linares-Barranco, Hongjie Liu, Tobi Delbruck, F. Gomez-Rodriguez, Diederik Paul Moeys, Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores, Ministerio de Economía y Competitividad (MINECO). España, European Union (UE). FP7, University of Zurich, and Linares-Barranco, Alejandro
- Subjects
robotic ,Event-based processing ,General Physics and Astronomy ,lcsh:Astrophysics ,02 engineering and technology ,Approach sensitivity cell ,Article ,Retina Ganglion Cell ,Address-Event-Representation ,03 medical and health sciences ,0302 clinical medicine ,Software ,Models of neural computation ,lcsh:QB460-466 ,0202 electrical engineering, electronic engineering, information engineering ,approach sensitivity cell ,Dynamic Vision Sensor ,lcsh:Science ,Field-programmable gate array ,FPGA ,10194 Institute of Neuroinformatics ,business.industry ,neuromorphic engineering ,dynamic vision sensor ,Approach Sensitivity cell ,Robotic ,Robotics ,Mobile robot ,lcsh:QC1-999 ,3100 General Physics and Astronomy ,event-based processing ,Neuromorphic engineering ,Asynchronous communication ,570 Life sciences ,biology ,Robot ,lcsh:Q ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,lcsh:Physics ,030217 neurology & neurosurgery ,Computer hardware - Abstract
Entropy, 20 (6), ISSN:1099-4300
- Published
- 2018
33. Live demonstration: In-vivo imaging of neural activity with dynamic vision sensors
- Author
-
Fabian F. Voigt, Celso Cavaco, Pia Sipila, Diederik Paul Moeys, Fritjof Helmchen, David San Segundo Bello, Stewart Berry, Tobi Delbruck, Vasyl Motsnyi, Chenghan Li, Gemma Taverni, University of Zurich, and Taverni, Gemma
- Subjects
Previous generation ,Light sensitivity ,Pixel ,business.industry ,2208 Electrical and Electronic Engineering ,3105 Instrumentation ,020208 electrical & electronic engineering ,2204 Biomedical Engineering ,Context (language use) ,02 engineering and technology ,Neural activity ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Quantum efficiency ,Computer vision ,Sensitivity (control systems) ,Artificial intelligence ,business ,Preclinical imaging ,10194 Institute of Neuroinformatics - Abstract
The demonstration shows the comparison of two novel Dynamic and Active Pixel Vision Sensors (DAVIS) in the context of a simulated neural imaging experiment. The first sensor, the SDAVIS, has, although a lower resolution (188×192) with respect to the previous generation of DAVIS sensors, 10X higher temporal contrast sensitivity. The second sensor, BSIDAVIS, combines a higher resolution (346×260) with a higher light sensitivity (quantum efficiency) because of its Back Side Illumination (BSI) manufacturing.
- Published
- 2017
34. In-vivo imaging of neural activity with dynamic vision sensors
- Author
-
Gemma Taverni, Chenghan Li, Fritjof Helmchen, Pia Sipila, Tobi Delbruck, David San Segundo Bello, Vasyl Motsnyi, Celso Cavaco, Fabian F. Voigt, Stewart Berry, Diederik Paul Moeys, and University of Zurich
- Subjects
0301 basic medicine ,Light sensitivity ,Pixel ,business.industry ,020208 electrical & electronic engineering ,02 engineering and technology ,03 medical and health sciences ,Light intensity ,030104 developmental biology ,CMOS ,Optical recording ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Computer vision ,Sensitivity (control systems) ,Artificial intelligence ,Image sensor ,business ,High dynamic range ,10194 Institute of Neuroinformatics - Abstract
Optical recording of neural activity using calcium or voltage indicators requires cameras capable of detecting small temporal contrast in light intensity with sample rates of 10 Hz to 1 kHz. Large pixel scientific CMOS image sensors (sCMOS) are typically used due to their high resolution, high frame rate, and low noise. However, using such sensors for long-term recording is challenging due to their high data rates of up to 1 Gb/s. Here we studied the use of dynamic vision sensor (DVS) event cameras for neural recording. DVS have high dynamic range and a sparse asynchronous output consisting of brightness change events. Using DVS for neural recording could avoid transferring and storing redundant information. We compared the use of a Hamamatsu Orca V2 sCMOS with two advanced DVS sensors (a higher temporal contrast sensitivity 188×180 pixel SDAVIS and a 346×260 pixel higher light sensitivity back-side-illuminated BSIDAVIS) for neural activity recordings with fluorescent calcium indicators both in brain slices and awake mice. The DVS activity responds to the fast dynamics of neural activity, indicating that a sensor combining SDAVIS and BSIDAVIS technologies would be beneficial for long-term in-vivo neural recording using calcium indicators as well as potentially faster voltage indicators.
- Published
- 2017
35. A Low Power, Fully Event-Based Gesture Recognition System
- Author
-
Michael DeBole, Alexander Andreopoulos, David Berg, Brian Taba, Arnon Amir, Jeffrey L. McKinstry, Marcela Mendoza, Timothy Melano, Dharmendra S. Modha, Tapan K. Nayak, Jeff Kusnitz, Steve K. Esser, Guillaume Garreau, Tobi Delbruck, Myron D. Flickner, Carmelo di Nolfo, and University of Zurich
- Subjects
Pixel ,1707 Computer Vision and Pattern Recognition ,Computer science ,Event (computing) ,business.industry ,020208 electrical & electronic engineering ,Frame (networking) ,Real-time computing ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Latency (audio) ,02 engineering and technology ,Convolutional neural network ,TrueNorth ,Gesture recognition ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,1711 Signal Processing ,Artificial intelligence ,business ,Gesture ,10194 Institute of Neuroinformatics - Abstract
We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.
- Published
- 2017
36. Live demonstration: Convolutional neural network driven by dynamic vision sensor playing RoShamBo
- Author
-
Tobi Delbruck, Iulia-Alexandra Lungu, Federico Corradi, and University of Zurich
- Subjects
Engineering ,Artificial neural network ,business.industry ,2208 Electrical and Electronic Engineering ,Voltage control ,020208 electrical & electronic engineering ,02 engineering and technology ,Convolutional neural network ,Vision sensor ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,10194 Institute of Neuroinformatics - Abstract
This demonstration presents a convolutional neural network (CNN) playing “RoShamBo” (“rock-paper-scissors”) against human opponents in real time. The network is driven by dynamic and active-pixel vision sensor (DAVIS) events, acquired by accumulating events into fixed event-number frames.
- Published
- 2017
37. Live Demonstration: Event-Driven Real-Time Spoken Digit Recognition System
- Author
-
Daniel Neil, Xiaoya Li, Shih-Chii Liu, Tobi Delbruck, Jithendar Anumula, and University of Zurich
- Subjects
Event (computing) ,Computer science ,Speech recognition ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,02 engineering and technology ,Numerical digit ,Task (computing) ,Recurrent neural network ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,Spike (software development) ,Digit recognition ,10194 Institute of Neuroinformatics - Abstract
We previously described a deep network system that reached an accuracy of 82% on a digit recognition task using the spike outputs from a Dynamic Audio Sensor (DAS) in response to audio samples from the TIDIGITS database. The audio samples were played directly to the system therefore bypassing the microphones. This work presents an interactive real-time demonstration of this digit recognition system. The system classifies a spoken digit based on the output spikes of the DAS in response to digits spoken into the on-board microphones.
- Published
- 2017
38. block-matching optical flow for dynamic vision sensors: algorithm and FPGA implementation
- Author
-
Min Liu, Tobi Delbruck, and University of Zurich
- Subjects
Computer science ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Real-time computing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,02 engineering and technology ,High-definition video ,Motion estimation ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Algorithm design ,Field-programmable gate array ,business ,Algorithm ,High dynamic range ,Computer hardware ,Block (data storage) ,Data compression ,10194 Institute of Neuroinformatics - Abstract
Rapid and low power computation of optical flow (OF) is potentially useful in robotics. The dynamic vision sensor (DVS) event camera produces quick and sparse output, and has high dynamic range, but conventional OF algorithms are frame-based and cannot be directly used with event-based cameras. Previous DVS OF methods do not work well with dense textured input and are designed for implementation in logic circuits. This paper proposes a new block-matching based DVS OF algorithm which is inspired by motion estimation methods used for MPEG video compression. The algorithm was implemented both in software and on FPGA. For each event, it computes the motion direction as one of 9 directions. The speed of the motion is set by the sample interval. Results show that the Average Angular Error can be improved by 30% compared with previous methods. The OF can be calculated on FPGA with 50 MHz clock in 0.2 us per event (11 clock cycles), 20 times faster than a Java software implementation running on a desktop PC. Sample data is shown that the method works on scenes dominated by edges, sparse features, and dense texture.
- Published
- 2017
39. Color Temporal Contrast Sensitivity in Dynamic Vision Sensors
- Author
-
Chenghan Li, Luca Longinotti, Tobi Delbruck, Julien N. P. Martel, Diederik Paul Moeys, David San Segundo Bello, Vasyl Motsnyi, Simeon A. Bamford, and University of Zurich
- Subjects
Engineering ,Pixel ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,02 engineering and technology ,Photodiode ,law.invention ,law ,0202 electrical engineering, electronic engineering, information engineering ,Temporal contrast ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Color filter array ,Computer vision ,Quantum efficiency ,Sensitivity (control systems) ,Artificial intelligence ,Active vision ,business ,Interpolation ,10194 Institute of Neuroinformatics - Abstract
This paper introduces the first simulations and measurements of event data obtained from the first Dynamic and Active Vision Sensors (DAVIS) with RGBW color filters. The absolute quantum efficiency spectral responses of the RGBW photodiodes were measured, the behavior of the color-sensitive DVS pixels were simulated and measured, and reconstruction through color events interpolation was developed.
- Published
- 2017
40. Neuromorphic Approach Sensitivity Cell Modeling and FPGA Implementation
- Author
-
Alejandro Linares-Barranco, Diederik Paul Moeys, Hongjie Liu, Tobi Delbruck, Antonio Rios-Navarro, Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores, Universidad de Sevilla. TEP-108: Robótica y Tecnología de Computadores Aplicada a la Rehabilitación, University of Zurich, and Linares-Barranco, Alejandro
- Subjects
Computer science ,Event-based processing ,Dynamic vision sensors ,02 engineering and technology ,Retina Ganglion Cell ,Field (computer science) ,Approach Sensitivity cell ,Models of neural computation ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,1700 General Computer Science ,Sensitivity (control systems) ,2614 Theoretical Computer Science ,Field-programmable gate array ,10194 Institute of Neuroinformatics ,Retina ,Event (computing) ,020208 electrical & electronic engineering ,Address event representation (AER) ,Neuromorphic engineering ,medicine.anatomical_structure ,Computer architecture ,Retinal ganglion cell ,Asynchronous communication ,570 Life sciences ,biology ,020201 artificial intelligence & image processing - Abstract
Neuromorphic engineering takes inspiration from biology to solve engineering problems using the organizing principles of biological neural computation. This field has demonstrated success in sensor based applications (vision and audition) as well in cognition and actuators. This paper is focused on mimicking an interesting functionality of the retina that is computed by one type of Retinal Ganglion Cell (RGC). It is the early detection of approaching (expanding) dark objects. This paper presents the software and hardware logic FPGA implementation of this approach sensitivity cell. It can be used in later cognition layers as an attention mechanism. The input of this hardware modeled cell comes from an asynchronous spiking Dynamic Vision Sensor, which leads to an end-to-end event based processing system. The software model has been developed in Java, and computed with an average processing time per event of 370 ns on a NUC embedded computer. The output firing rate for an approaching object depends on the cell parameters that represent the needed number of input events to reach the firing threshold. For the hardware implementation on a Spartan6 FPGA, the processing time is reduced to 160 ns/event with the clock running at 50 MHz. Ministerio de Economía y Competitividad TEC2016-77785-P Unión Europea FP7-ICT-600954
- Published
- 2017
41. Analysis of encoding degradation in spiking sensors due to spike delay variation
- Author
-
Shih-Chii Liu, Tobi Delbruck, Minhao Yang, and University of Zurich
- Subjects
Queueing theory ,Comparator ,Computer science ,Spike train ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Topology ,Analog signal ,Transmission (telecommunications) ,Delta modulation ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,570 Life sciences ,biology ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,020201 artificial intelligence & image processing ,Spike (software development) ,Electrical and Electronic Engineering ,Decoding methods ,10194 Institute of Neuroinformatics - Abstract
Spiking sensors such as the silicon retina and cochlea encode analog signals into massively parallel asynchronous spike train output where the information is contained in the precise spike timing. The variation of the spike timing that arises from spike transmission degrades signal encoding quality. Using the signal-to-distortion ratio (SDR) metric with nonlinear spike train decoding based on frame theory, two particular sources of delay variation including comparison delay $T_{\mathbf {DC}}$ and queueing delay $T_{\mathbf {DQ}}$ are evaluated on two encoding mechanisms which have been used for implementations of silicon array spiking sensors: asynchronous delta modulation and self-timed reset. As specific examples, $T_{\mathbf {DC}}$ is obtained from a 2T current-mode comparator, and $T_{\mathbf {DQ}}$ is obtained from an M/D/1 queue for 1-D sensors like the silicon cochlea and an $\text {M}^{\mathrm {\mathbf {X}}}$ /D/1 queue for 2-D sensors like the silicon retina. Quantitative relations between the SDR and the circuit and system parameters of spiking sensors are established. The analysis method presented in this work will be useful for future specifications-guided designs of spiking sensors.
- Published
- 2017
42. Training Deep Spiking Neural Networks Using Backpropagation
- Author
-
Jun Haeng Lee, Tobi Delbruck, Michael Pfeiffer, University of Zurich, and Lee, Jun Haeng
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Science::Neural and Evolutionary Computation ,Context (language use) ,neuromorphic ,02 engineering and technology ,Convolutional neural network ,lcsh:RC321-571 ,spiking neural network ,MNIST ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Neural and Evolutionary Computing (cs.NE) ,lcsh:Neurosciences. Biological psychiatry. Neuropsychiatry ,Original Research ,10194 Institute of Neuroinformatics ,Spiking neural network ,N-MNIST ,Quantitative Biology::Neurons and Cognition ,Artificial neural network ,business.industry ,General Neuroscience ,Deep learning ,deep neural network ,2800 General Neuroscience ,Computer Science - Neural and Evolutionary Computing ,Pattern recognition ,Backpropagation ,Deep neural network ,Neuromorphic ,DVS ,Neuromorphic engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,030217 neurology & neurosurgery ,MNIST database ,Neuroscience ,backpropagation - Abstract
Deep spiking neural networks (SNNs) hold the potential for improving the latency and energy efficiency of deep neural networks through data-driven event-based computation. However, training such networks is difficult due to the non-differentiable nature of spike events. In this paper, we introduce a novel technique, which treats the membrane potentials of spiking neurons as differentiable signals, where discontinuities at spike times are considered as noise. This enables an error backpropagation mechanism for deep SNNs that follows the same principles as in conventional deep networks, but works directly on spike signals and membrane potentials. Compared with previous methods relying on indirect training and conversion, our technique has the potential to capture the statistics of spikes more precisely. We evaluate the proposed framework on artificially generated events from the original MNIST handwritten digit benchmark, and also on the N-MNIST benchmark recorded with an event-based dynamic vision sensor, in which the proposed method reduces the error rate by a factor of more than three compared to the best previous SNN, and also achieves a higher accuracy than a conventional convolutional neural network (CNN) trained and tested on the same data. We demonstrate in the context of the MNIST task that thanks to their event-driven operation, deep SNNs (both fully connected and convolutional) trained with our method achieve accuracy equivalent with conventional neural networks. In the N-MNIST example, equivalent accuracy is achieved with about five times fewer computational operations., Frontiers in Neuroscience, 10, ISSN:1662-453X, ISSN:1662-4548
- Published
- 2016
43. DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition
- Author
-
Yuhuang Hu, Michael Pfeiffer, Tobi Delbruck, Hongjie Liu, University of Zurich, and Hu, Yuhuang
- Subjects
Computer science ,Computer Vision ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,neuromorphic ,02 engineering and technology ,lcsh:RC321-571 ,03 medical and health sciences ,0302 clinical medicine ,Data Report ,benchmarks ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Dynamic Vision Sensor ,lcsh:Neurosciences. Biological psychiatry. Neuropsychiatry ,Neuromorphic ,Event-based vision ,AER ,Benchmarks ,DVS ,Action recognition ,Object tracking ,Object recognition ,10194 Institute of Neuroinformatics ,Object Recognition ,Quantitative Biology::Neurons and Cognition ,business.industry ,General Neuroscience ,Deep learning ,Cognitive neuroscience of visual object recognition ,2800 General Neuroscience ,Object Tracking ,Neuromorphic engineering ,Action Recognition ,Neuromorphic Engineering ,Video tracking ,Benchmark (computing) ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Feature learning ,030217 neurology & neurosurgery ,MNIST database ,Neuroscience ,event-based vision - Abstract
Frontiers in Neuroscience, 10, ISSN:1662-453X, ISSN:1662-4548
- Published
- 2016
44. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
- Author
-
Davide Scaramuzza, Tobi Delbruck, Elias Mueggler, Henri Rebecq, Guillermo Gallego, Jon E. A. Lund, University of Zurich, and Gallego, Guillermo
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,1707 Computer Vision and Pattern Recognition ,10009 Department of Informatics ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,1702 Artificial Intelligence ,02 engineering and technology ,Iterative reconstruction ,000 Computer science, knowledge & systems ,Computer Science - Robotics ,020901 industrial engineering & automation ,2604 Applied Mathematics ,Artificial Intelligence ,Depth map ,Computer graphics (images) ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Image sensor ,Video game ,Pose ,High dynamic range ,10194 Institute of Neuroinformatics ,Event (computing) ,business.industry ,Applied Mathematics ,Motion blur ,1712 Software ,Computational Theory and Mathematics ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Robotics (cs.RO) ,Software ,1703 Computational Theory and Mathematics - Abstract
Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that---because of the technological advantages of the event camera---our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras., 12 pages, 13 figures. 2 tables. (in press)
- Published
- 2016
45. Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network
- Author
-
Diederik Paul Moeys, Philip Vance, Tobi Delbruck, Gautham P. Das, Daniel Neil, Dermot Kerr, Federico Corradi, Emmett Kerr, and University of Zurich
- Subjects
FOS: Computer and information sciences ,2606 Control and Optimization ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Convolutional neural network ,Data-driven ,Computer Science - Robotics ,03 medical and health sciences ,0302 clinical medicine ,Histogram ,1705 Computer Networks and Communications ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,10194 Institute of Neuroinformatics ,Artificial neural network ,business.industry ,Deep learning ,Mobile robot ,Robotics ,Robot ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,1711 Signal Processing ,Artificial intelligence ,business ,Robotics (cs.RO) ,030217 neurology & neurosurgery - Abstract
This paper describes the application of a Convolutional Neural Network (CNN) in the context of a predator/prey scenario. The CNN is trained and run on data from a Dynamic and Active Pixel Sensor (DAVIS) mounted on a Summit XL robot (the predator), which follows another one (the prey). The CNN is driven by both conventional image frames and dynamic vision sensor "frames" that consist of a constant number of DAVIS ON and OFF events. The network is thus "data driven" at a sample rate proportional to the scene activity, so the effective sample rate varies from 15 Hz to 240 Hz depending on the robot speeds. The network generates four outputs: steer right, left, center and non-visible. After off-line training on labeled data, the network is imported on the on-board Summit XL robot which runs jAER and receives steering directions in real time. Successful results on closed-loop trials, with accuracies up to 87% or 92% (depending on evaluation criteria) are reported. Although the proposed approach discards the precise DAVIS event timing, it offers the significant advantage of compatibility with conventional deep learning technology without giving up the advantage of data-driven computing., Paper presented at the conference: Second International Conference on Event-Based Control, Communication and Signal Processing (EBCCSP) 2016, At Krakow, Poland
- Published
- 2016
46. ELiSeD - An Event-Based Line Segment Detector
- Author
-
Christian Brandli, Tobi Delbruck, Susanne Keller, Davide Scaramuzza, Jonas Strubel, and University of Zurich
- Subjects
2606 Control and Optimization ,10009 Department of Informatics ,Computer science ,Real-time computing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,000 Computer science, knowledge & systems ,Edge detection ,Line segment ,1705 Computer Networks and Communications ,0202 electrical engineering, electronic engineering, information engineering ,Structure from motion ,Computer vision ,Visual odometry ,Correspondence problem ,10194 Institute of Neuroinformatics ,business.industry ,Event (computing) ,020208 electrical & electronic engineering ,Feature (computer vision) ,Temporal resolution ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,1711 Signal Processing ,Artificial intelligence ,business - Abstract
Event-based temporal contrast vision sensors such as the Dynamic Vison Sensor (DVS) have advantages such as high dynamic range, low latency, and low power consumption. Instead of frames, these sensors produce a stream of events that encode discrete amounts of temporal contrast. Surfaces and objects with sufficient spatial contrast trigger events if they are moving relative to the sensor, which thus performs inherent edge detection. These sensors are well-suited for motion capture, but so far suitable event-based, low-level features that allow assigning events to spatial structures have been lacking. A general solution of the so-called event correspondence problem, i.e. inferring which events are caused by the motion of the same spatial feature, would allow applying these sensors in a multitude of tasks such as visual odometry or structure from motion. The proposed Event-based Line Segment Detector (ELiSeD) is a step towards solving this problem by parameterizing the event stream as a set of line segments. The event stream which is used to update these low-level features is continuous in time and has a high temporal resolution; this allows capturing even fast motions without the requirement to solve the conventional frame-to-frame motion correspondence problem. The ELiSeD feature detector and tracker runs in real-time on a laptop computer at image speeds of up to 1300 pix/s and can continuously track rotations of up to 720 deg/s. The algorithm is open-sourced in the jAER project.
- Published
- 2016
47. Temporal Sequence Recognition in a Self-Organizing Recurrent Network
- Author
-
Shih-Chii Liu, Daniel Neil, Tobi Delbruck, Enea Ceolini, and University of Zurich
- Subjects
Normalization (statistics) ,2606 Control and Optimization ,Computer science ,business.industry ,Binary number ,Initialization ,Pattern recognition ,Ranging ,02 engineering and technology ,03 medical and health sciences ,0302 clinical medicine ,Recurrent neural network ,1705 Computer Networks and Communications ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Network performance ,1711 Signal Processing ,Artificial intelligence ,Echo state network ,business ,030217 neurology & neurosurgery ,Sparse matrix ,10194 Institute of Neuroinformatics - Abstract
A big challenge of reservoir-based Recurrent Neural Networks (RNNs) is the optimization of the connection weights within the network so that the network performance is optimal for the intended task of temporal sequence recognition. One particular RNN called the Self-Organizing Recurrent Network (SORN) avoids the mathematical normalization required after each initialization. Instead, three types of cortical plasticity mechanisms optimize the weights within the network during the initial part of the training. The success of this unsupervised training method was demonstrated on temporal sequences that use input symbols with a binary encoding and that activate only one input pool in each time step. This work extends the analysis towards different types of symbol encoding ranging from encoding methods that activate multiple input pools and that use encoding levels that are not strictly binary but analog in nature. Preliminary results show that the SORN model is able to classify well temporal sequences with symbols using these encoding methods and the advantages of this network over a static network in a classification task is still retained.
- Published
- 2016
48. Combined frame- and event-based detection and tracking
- Author
-
Shih-Chii Liu, Diederik Paul Moeys, Hongjie Liu, Gautham P. Das, Tobi Delbruck, Daniel Neil, and University of Zurich
- Subjects
CMOS sensor ,Computer science ,business.industry ,2208 Electrical and Electronic Engineering ,020208 electrical & electronic engineering ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Tracking system ,02 engineering and technology ,Tracking (particle physics) ,Convolutional neural network ,Convolution ,0202 electrical engineering, electronic engineering, information engineering ,570 Life sciences ,biology ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Particle filter ,business ,10194 Institute of Neuroinformatics - Abstract
This paper reports an object tracking algorithm for a moving platform using the dynamic and active-pixel vision sensor (DAVIS). It takes advantage of both the active pixel sensor (APS) frame and dynamic vision sensor (DVS) event outputs from the DAVIS. The tracking is performed in a three step-manner: regions of interest (ROIs) are generated by a cluster-based tracking using the DVS output, likely target locations are detected by using a convolutional neural network (CNN) on the APS output to classify the ROIs as foreground and background, and finally a particle filter infers the target location from the ROIs. Doing convolution only in the ROIs boosts the speed by a factor of 70 compared with full-frame convolutions for the 240×180 frame input from the DAVIS. The tracking accuracy on a predator and prey robot database reaches 90% with a cost of less than 20ms/frame in Matlab on a normal PC without using a GPU.
- Published
- 2016
49. The Language of the Brain
- Author
-
Terry Sejnowski, Tobi Delbruck, University of Zurich, and Sejnowski, T
- Subjects
Nerve net ,Timing system ,Cell Communication ,Retina ,Article ,Electronic equipment ,Flash (photography) ,Mental Processes ,Memory ,Human–computer interaction ,Neurolinguistics ,medicine ,Humans ,Attention ,10194 Institute of Neuroinformatics ,Visual Cortex ,Neurons ,1000 Multidisciplinary ,Multidisciplinary ,Brain ,Cognition ,Object (computer science) ,Visual cortex ,medicine.anatomical_structure ,Synapses ,570 Life sciences ,biology ,Nerve Net - Abstract
Three pounds of nerve tissue underneath the skull are capable of perceiving, thinking and acting with a finesse that cannot be matched by any computer. The brain achieves this feat of cognition, in part, by carefully timing the signals that flash across the trillions of connections that link billions of brain cells. Seeing a flower pot causes groups of neurons to fire in a brief time interval to activate a part of the brain that registers that particular object at just that one moment. Understanding how this timing system works will both lead to better understanding of our behavior and enable the building of new computing and electronic equipment that, like the brain, functions more efficiently than conventional digital machines.
- Published
- 2012
50. A tactile luminous floor for an interactive autonomous space
- Author
-
Paul F. M. J. Verschure, Kynan Eng, Klaus Hepp, Tobi Delbruck, Adrian Whatley, Rodney J. Douglas, University of Zurich, and Delbrück, T
- Subjects
Computer science ,business.industry ,General Mathematics ,2207 Control and Systems Engineering ,Automation ,Computer Science Applications ,Rendering (computer graphics) ,1712 Software ,Software ,Control and Systems Engineering ,Computer graphics (images) ,1706 Computer Science Applications ,570 Life sciences ,biology ,business ,10194 Institute of Neuroinformatics ,2600 General Mathematics - Abstract
This paper describes the interactive tactile luminous floor that was constructed and used as the skin of the playful interactive space Ada, which ran as a public exhibit for five months in 2002 and had over 550,000 visitors. Ada's floor was custom-built to provide a means for individual and collective user interaction. It consists of 360 hexagonal 66 cm tiles covering a total area of 136 m^2, each with analogue tactile load sensors based on force-sensitive resistors and dimmable neon red, green and blue (RGB) lamps. The tiles are constructed from extruded aluminum with glass tops. An Interbus factory automation bus senses and controls the tiles. Software is described for rendering fluid, dynamic visual effects on the floor, for signal processing of the load information, for real-time visitor tracking and for a variety of behavioural modes, games and interactions. Data from single tiles and from tracking are shown. This floor offers new modalities of human-computer interaction and human-robot interaction for autonomous robotic spaces.
- Published
- 2007
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.