Back to Search
Start Over
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction
- Publication Year :
- 2021
- Publisher :
- arXiv, 2021.
-
Abstract
- Event cameras are novel vision sensors that report per-pixel brightness changes as a stream of asynchronous "events". They offer significant advantages compared to standard cameras due to their high temporal resolution, high dynamic range and lack of motion blur. However, events only measure the varying component of the visual signal, which limits their ability to encode scene context. By contrast, standard cameras measure absolute intensity frames, which capture a much richer representation of the scene. Both sensors are thus complementary. However, due to the asynchronous nature of events, combining them with synchronous images remains challenging, especially for learning-based methods. This is because traditional recurrent neural networks (RNNs) are not designed for asynchronous and irregular data from additional sensors. To address this challenge, we introduce Recurrent Asynchronous Multimodal (RAM) networks, which generalize traditional RNNs to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We apply this novel architecture to monocular depth estimation with events and frames where we show an improvement over state-of-the-art methods by up to 30% in terms of mean absolute depth error. To enable further research on multimodal learning with events, we release EventScape, a new dataset with events, intensity frames, semantic labels, and depth maps recorded in the CARLA simulator.
- Subjects :
- FOS: Computer and information sciences
2606 Control and Optimization
Control and Optimization
1707 Computer Vision and Pattern Recognition
10009 Department of Informatics
Computer science
Computer Vision and Pattern Recognition (cs.CV)
2210 Mechanical Engineering
Computer Science - Computer Vision and Pattern Recognition
Biomedical Engineering
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
2207 Control and Systems Engineering
2204 Biomedical Engineering
1702 Artificial Intelligence
Context (language use)
02 engineering and technology
000 Computer science, knowledge & systems
010501 environmental sciences
01 natural sciences
1709 Human-Computer Interaction
Artificial Intelligence
1706 Computer Science Applications
0202 electrical engineering, electronic engineering, information engineering
Computer vision
High dynamic range
0105 earth and related environmental sciences
Monocular
Event (computing)
business.industry
Mechanical Engineering
Motion blur
Computer Science Applications
Human-Computer Interaction
Multimodal learning
Recurrent neural network
Control and Systems Engineering
Asynchronous communication
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Artificial intelligence
business
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....9e23e9b215290c8b0c3727d0786fc5b7
- Full Text :
- https://doi.org/10.48550/arxiv.2102.09320