Author: "Scaramuzza, Davide - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Scaramuzza, Davide' showing total 1,548 results

Start Over Author "Scaramuzza, Davide

1,548 results on '"Scaramuzza, Davide'

1. Learning Quadrotor Control From Visual Features Using Differentiable Simulation

Author: Heeg, Johannes, Song, Yunlong, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: The sample inefficiency of reinforcement learning (RL) remains a significant challenge in robotics. RL requires large-scale simulation and, still, can cause long training times, slowing down research and innovation. This issue is particularly pronounced in vision-based control tasks where reliable state estimates are not accessible. Differentiable simulation offers an alternative by enabling gradient back-propagation through the dynamics model, providing low-variance analytical policy gradients and, hence, higher sample efficiency. However, its usage for real-world robotic tasks has yet been limited. This work demonstrates the great potential of differentiable simulation for learning quadrotor control. We show that training in differentiable simulation significantly outperforms model-free RL in terms of both sample efficiency and training time, allowing a policy to learn to recover a quadrotor in seconds when providing vehicle state and in minutes when relying solely on visual features. The key to our success is two-fold. First, the use of a simple surrogate model for gradient computation greatly accelerates training without sacrificing control performance. Second, combining state representation learning with policy learning enhances convergence speed in tasks where only visual features are observable. These findings highlight the potential of differentiable simulation for real-world robotics and offer a compelling alternative to conventional RL approaches., Comment: Under Submission
Published: 2024

2. S7: Selective and Simplified State Space Layers for Sequence Modeling

Author: Soydan, Taylan, Zubić, Nikola, Messikommer, Nico, Mishra, Siddhartha, and Scaramuzza, Davide
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Signal Processing, Mathematics - Dynamical Systems
Abstract: A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks., Comment: 23 pages, 3 figures, 11 tables. Equal contribution by Taylan Soydan and Nikola Zubi\'c
Published: 2024

3. Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation

Author: Luo, Jing Yuan, Song, Yunlong, Klemm, Victor, Shi, Fan, Scaramuzza, Davide, and Hutter, Marco
Subjects: Computer Science - Robotics
Abstract: First-order Policy Gradient (FoPG) algorithms such as Backpropagation through Time and Analytical Policy Gradients leverage local simulation physics to accelerate policy search, significantly improving sample efficiency in robot control compared to standard model-free reinforcement learning. However, FoPG algorithms can exhibit poor learning dynamics in contact-rich tasks like locomotion. Previous approaches address this issue by alleviating contact dynamics via algorithmic or simulation innovations. In contrast, we propose guiding the policy search by learning a residual over a simple baseline policy. For quadruped locomotion, we find that the role of residual policy learning in FoPG-based training (FoPG RPL) is primarily to improve asymptotic rewards, compared to improving sample efficiency for model-free RL. Additionally, we provide insights on applying FoPG's to pixel-based local navigation, training a point-mass robot to convergence within seconds. Finally, we showcase the versatility of FoPG RPL by using it to train locomotion and perceptive navigation end-to-end on a quadruped in minutes.
Published: 2024

4. FaVoR: Features via Voxel Rendering for Camera Relocalization

Author: Polizzi, Vincenzo, Cannici, Marco, Scaramuzza, Davide, and Kelly, Jonathan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs., Comment: Submitted to the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, Arizona, US, Feb 28-Mar 4, 2025
Published: 2024

5. Structure-Invariant Range-Visual-Inertial Odometry

Author: Alberico, Ivan, Delaune, Jeff, Cioffi, Giovanni, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: The Mars Science Helicopter (MSH) mission aims to deploy the next generation of unmanned helicopters on Mars, targeting landing sites in highly irregular terrain such as Valles Marineris, the largest canyons in the Solar system with elevation variances of up to 8000 meters. Unlike its predecessor, the Mars 2020 mission, which relied on a state estimation system assuming planar terrain, MSH requires a novel approach due to the complex topography of the landing site. This work introduces a novel range-visual-inertial odometry system tailored for the unique challenges of the MSH mission. Our system extends the state-of-the-art xVIO framework by fusing consistent range information with visual and inertial measurements, preventing metric scale drift in the absence of visual-inertial excitation (mono camera and constant velocity descent), and enabling landing on any terrain structure, without requiring any planar terrain assumption. Through extensive testing in image-based simulations using actual terrain structure and textures collected in Mars orbit, we demonstrate that our range-VIO approach estimates terrain-relative velocity meeting the stringent mission requirements, and outperforming existing methods., Comment: IEEE/RSJ International Conference on Intelligent Robots (IROS), 2024
Published: 2024

6. Reinforcement Learning Meets Visual Odometry

Author: Messikommer, Nico, Cioffi, Giovanni, Gehrig, Mathias, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Visual Odometry (VO) is essential to downstream mobile robotics and augmented/virtual reality tasks. Despite recent advances, existing VO methods still rely on heuristic design choices that require several weeks of hyperparameter tuning by human experts, hindering generalizability and robustness. We address these challenges by reframing VO as a sequential decision-making task and applying Reinforcement Learning (RL) to adapt the VO process dynamically. Our approach introduces a neural network, operating as an agent within the VO pipeline, to make decisions such as keyframe and grid-size selection based on real-time conditions. Our method minimizes reliance on heuristic choices using a reward function based on pose error, runtime, and other metrics to guide the system. Our RL framework treats the VO system and the image sequence as an environment, with the agent receiving observations from keypoints, map statistics, and prior poses. Experimental results using classical VO methods and public benchmarks demonstrate improvements in accuracy and robustness, validating the generalizability of our RL-enhanced VO approach to different scenarios. We believe this paradigm shift advances VO technology by eliminating the need for time-intensive parameter tuning of heuristics.
Published: 2024

7. Demonstrating Agile Flight from Pixels without State Estimation

Author: Geles, Ismail, Bauersfeld, Leonard, Romero, Angel, Xing, Jiaxu, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Quadrotors are among the most agile flying robots. Despite recent advances in learning-based control and computer vision, autonomous drones still rely on explicit state estimation. On the other hand, human pilots only rely on a first-person-view video stream from the drone onboard camera to push the platform to its limits and fly robustly in unseen environments. To the best of our knowledge, we present the first vision-based quadrotor system that autonomously navigates through a sequence of gates at high speeds while directly mapping pixels to control commands. Like professional drone-racing pilots, our system does not use explicit state estimation and leverages the same control commands humans use (collective thrust and body rates). We demonstrate agile flight at speeds up to 40km/h with accelerations up to 2g. This is achieved by training vision-based policies with reinforcement learning (RL). The training is facilitated using an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use the inner edges of the gates as a sensor abstraction. This simple yet robust, task-relevant representation can be simulated during training without rendering images. During deployment, a Swin-transformer-based gate detector is used. Our approach enables autonomous agile flight with standard, off-the-shelf hardware. Although our demonstration focuses on drone racing, we believe that our method has an impact beyond drone racing and can serve as a foundation for future research into real-world applications in structured environments.
Published: 2024

8. Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Author: Zubić, Nikola, Soldá, Federico, Sulser, Aurelio, and Scaramuzza, Davide
Subjects: Computer Science - Machine Learning, Computer Science - Computational Complexity, Computer Science - Logic in Computer Science
Abstract: Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Multi-layer SSMs are constrained by log-space computational capacity, limiting their reasoning abilities. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with advanced prompting techniques. Models often resort to shortcuts, leading to compounding errors. These findings highlight fundamental barriers within current deep learning architectures rooted in their computational capacities. We underscore the need for innovative solutions to transcend these constraints and achieve reliable multi-step reasoning and compositional task-solving, which is critical for advancing toward general artificial intelligence., Comment: 29 pages, 4 theorems, 17 figures, 4 tables
Published: 2024

9. Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

Author: Song, Yunlong and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Control systems are at the core of every real-world robot. They are deployed in an ever-increasing number of applications, ranging from autonomous racing and search-and-rescue missions to industrial inspections and space exploration. To achieve peak performance, certain tasks require pushing the robot to its maximum agility. How can we design control algorithms that enhance the agility of autonomous robots and maintain robustness against unforeseen disturbances? This paper addresses this question by leveraging fundamental principles in optimal control, reinforcement learning, and differentiable simulation., Comment: This abstract has been accepted for the Robotics: Science and Systems (RSS) Pioneers Workshop, 2024
Published: 2024

10. Event Cameras Meet SPADs for High-Speed, Low-Bandwidth Imaging

Author: Muglikar, Manasi, Somasundaram, Siddharth, Dave, Akshat, Charbon, Edoardo, Raskar, Ramesh, and Scaramuzza, Davide
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Traditional cameras face a trade-off between low-light performance and high-speed imaging: longer exposure times to capture sufficient light results in motion blur, whereas shorter exposures result in Poisson-corrupted noisy images. While burst photography techniques help mitigate this tradeoff, conventional cameras are fundamentally limited in their sensor noise characteristics. Event cameras and single-photon avalanche diode (SPAD) sensors have emerged as promising alternatives to conventional cameras due to their desirable properties. SPADs are capable of single-photon sensitivity with microsecond temporal resolution, and event cameras can measure brightness changes up to 1 MHz with low bandwidth requirements. We show that these properties are complementary, and can help achieve low-light, high-speed image reconstruction with low bandwidth requirements. We introduce a sensor fusion framework to combine SPADs with event cameras to improves the reconstruction of high-speed, low-light scenes while reducing the high bandwidth cost associated with using every SPAD frame. Our evaluation, on both synthetic and real sensor data, demonstrates significant enhancements ( > 5 dB PSNR) in reconstructing low-light scenes at high temporal resolution (100 kHz) compared to conventional cameras. Event-SPAD fusion shows great promise for real-world applications, such as robotics or medical imaging.
Published: 2024

11. Hilti SLAM Challenge 2023: Benchmarking Single + Multi-session SLAM across Sensor Constellations in Construction

Author: Nair, Ashish Devadas, Kindle, Julien, Levchev, Plamen, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM. Dataset URL: https://www.hilti-challenge.com/dataset-2023.html
Published: 2024
Full Text: View/download PDF

12. 3D scene generation from scene graphs and self-attention

Author: Bonazzi, Pietro, Wang, Mengqi, Arroyo, Diego Martin, Manhardt, Fabian, Messikomer, Nico, Tombari, Federico, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene, and use these as the building blocks of our model. Our model, leverages graph transformers to estimate the size, dimension and orientation of the objects in a room while satisfying relationships in the given scene graph. Our experiments shows self-attention layers leads to sparser (7.9x compared to Graphto3D) and more diverse scenes (16%)., Comment: Some authors were not timely informed of the submission
Published: 2024

13. Few-shot point cloud reconstruction and denoising via learned Guassian splats renderings and fine-tuned diffusion features

Author: Bonazzi, Pietro, Rakatosaona, Marie-Julie, Cannici, Marco, Tombari, Federico, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computational Geometry
Abstract: Existing deep learning methods for the reconstruction and denoising of point clouds rely on small datasets of 3D shapes. We circumvent the problem by leveraging deep learning methods trained on billions of images. We propose a method to reconstruct point clouds from few images and to denoise point clouds from their rendering by exploiting prior knowledge distilled from image-based deep learning models. To improve reconstruction in constraint settings, we regularize the training of a differentiable renderer with hybrid surface and appearance by introducing semantic consistency supervision. In addition, we propose a pipeline to finetune Stable Diffusion to denoise renderings of noisy point clouds and we demonstrate how these learned filters can be used to remove point cloud noise coming without 3D supervision. We compare our method with DSS and PointRadiance and achieved higher quality 3D reconstruction on the Sketchfab Testset and SCUT Dataset., Comment: An author was not timely informed before the released submission
Published: 2024

14. An N-Point Linear Solver for Line and Motion Estimation with Event Cameras

Author: Gao, Ling, Gehrig, Daniel, Su, Hang, Scaramuzza, Davide, and Kneip, Laurent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event cameras respond primarily to edges--formed by strong gradients--and are thus particularly well-suited for line-based motion estimation. Recent work has shown that events generated by a single line each satisfy a polynomial constraint which describes a manifold in the space-time volume. Multiple such constraints can be solved simultaneously to recover the partial linear velocity and line parameters. In this work, we show that, with a suitable line parametrization, this system of constraints is actually linear in the unknowns, which allows us to design a novel linear solver. Unlike existing solvers, our linear solver (i) is fast and numerically stable since it does not rely on expensive root finding, (ii) can solve both minimal and overdetermined systems with more than 5 events, and (iii) admits the characterization of all degenerate cases and multiple solutions. The found line parameters are singularity-free and have a fixed scale, which eliminates the need for auxiliary constraints typically encountered in previous work. To recover the full linear camera velocity we fuse observations from multiple lines with a novel velocity averaging scheme that relies on a geometrically-motivated residual, and thus solves the problem more efficiently than previous schemes which minimize an algebraic residual. Extensive experiments in synthetic and real-world settings demonstrate that our method surpasses the previous work in numerical stability, and operates over 600 times faster.
Published: 2024
Full Text: View/download PDF

15. Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

Author: Cannici, Marco and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural Radiance Fields (NeRFs) have shown great potential in novel view synthesis. However, they struggle to render sharp images when the data used for training is affected by motion blur. On the other hand, event cameras excel in dynamic scenes as they measure brightness changes with microsecond resolution and are thus only marginally affected by blur. Recent methods attempt to enhance NeRF reconstructions under camera motion by fusing frames and events. However, they face challenges in recovering accurate color content or constrain the NeRF to a set of predefined camera poses, harming reconstruction quality in challenging conditions. This paper proposes a novel formulation addressing these issues by leveraging both model- and learning-based modules. We explicitly model the blur formation process, exploiting the event double integral as an additional model-based prior. Additionally, we model the event-pixel response using an end-to-end learnable response function, allowing our method to adapt to non-idealities in the real event-camera sensor. We show, on synthetic and real data, that the proposed approach outperforms existing deblur NeRFs that use only frames as well as those that combine frames and events by +6.13dB and +2.48dB, respectively., Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Published: 2024

16. MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints

Author: Krinner, Maria, Romero, Angel, Bauersfeld, Leonard, Zeilinger, Melanie, Carron, Andrea, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a promising model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the state-of-the-art MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a track constraint and terminal set. The track constraint is designed as a spatial constraint which prevents gate collisions while allowing for time optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real-world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state-of-the-art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPCC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best-performing RL policy and outperforms the best model-based controller while satisfying constraints. In both simulation and real world, our approach consistently prevents gate crashes with 100% success rate, while pushing the quadrotor to its physical limits reaching speeds of more than 80km/h., Comment: 12 pages, 6 figures
Published: 2024

17. Learning Quadruped Locomotion Using Differentiable Simulation

Author: Song, Yunlong, Kim, Sangbae, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: This work explores the potential of using differentiable simulation for learning quadruped locomotion. Differentiable simulation promises fast convergence and stable training by computing low-variance first-order gradients using robot dynamics. However, its usage for legged robots is still limited to simulation. The main challenge lies in the complex optimization landscape of robotic tasks due to discontinuous dynamics. This work proposes a new differentiable simulation framework to overcome these challenges. Our approach combines a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. This approach maintains simulation accuracy by aligning the robot states from the surrogate model with those of the precise, non-differentiable simulator. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. We demonstrate that differentiable simulation outperforms a reinforcement learning algorithm (PPO) by achieving significantly better sample efficiency while maintaining its effectiveness in handling large-scale environments. Our method represents one of the first successful applications of differentiable simulation to real-world quadruped locomotion, offering a compelling alternative to traditional RL methods., Comment: 8th Annual Conference on Robot Learning (CoRL)
Published: 2024

18. Robotics meets Fluid Dynamics: A Characterization of the Induced Airflow around a Quadrotor

Author: Bauersfeld, Leonard, Muller, Koen, Ziegler, Dominic, Coletti, Filippo, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: The widespread adoption of quadrotors for diverse applications, from agriculture to public safety, necessitates an understanding of the aerodynamic disturbances they create. This paper introduces a computationally lightweight model for estimating the time-averaged magnitude of the induced flow below quadrotors in hover. Unlike related approaches that rely on expensive computational fluid dynamics (CFD) simulations or time-consuming empirical measurements, our method leverages classical theory from turbulent flows. By analyzing over 9 hours of flight data from drones of varying sizes within a large motion capture system, we show that the combined flow from all propellers of the drone is well-approximated by a turbulent jet. Through the use of a novel normalization and scaling, we have developed and experimentally validated a unified model that describes the mean velocity field of the induced flow for different drone sizes. The model accurately describes the far-field airflow in a very large volume below the drone which is difficult to simulate in CFD. Our model, which requires only the drone's mass, propeller size, and drone size for calculations, offers a practical tool for dynamic planning in multi-agent scenarios, ensuring safer operations near humans and optimizing sensor placements., Comment: 7+1 pages
Published: 2024

19. Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

Author: Xing, Jiaxu, Romero, Angel, Bauersfeld, Leonard, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Learning visuomotor policies for agile quadrotor flight presents significant difficulties, primarily from inefficient policy exploration caused by high-dimensional visual inputs and the need for precise and low-latency control. To address these challenges, we propose a novel approach that combines the performance of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL) in the task of vision-based autonomous drone racing. While RL provides a framework for learning high-performance controllers through trial and error, it faces challenges with sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL efficiently learns from visual expert demonstrations, but it remains limited by the expert's performance and state distribution. To overcome these limitations, our policy learning framework integrates the strengths of both approaches. Our framework contains three phases: training a teacher policy using RL with privileged state information, distilling it into a student policy via IL, and adaptive fine-tuning via RL. Testing in both simulated and real-world scenarios shows our approach can not only learn in scenarios where RL from scratch fails but also outperforms existing IL methods in both robustness and performance, successfully navigating a quadrotor through a race course using only visual information., Comment: 8th Annual Conference on Robot Learning (CoRL)
Published: 2024

20. State Space Models for Event Cameras

Author: Zubić, Nikola, Gehrig, Mathias, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. Additionally, we investigate two strategies to counteract aliasing effects when deploying the model at higher frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Traditional RNN and Transformer models exhibit performance drops of more than 20 mAP, with SSMs having a drop of 3.76 mAP, highlighting the effectiveness of SSMs in event-based vision tasks., Comment: 18 pages, 5 figures, 6 tables, CVPR 2024 Camera Ready paper
Published: 2024

21. AERIAL-CORE: AI-Powered Aerial Robots for Inspection and Maintenance of Electrical Power Infrastructures

Author: Ollero, Anibal, Suarez, Alejandro, Papaioannidis, Christos, Pitas, Ioannis, Marredo, Juan M., Duong, Viet, Ebeid, Emad, Kratky, Vit, Saska, Martin, Hanoune, Chloe, Afifi, Amr, Franchi, Antonio, Vourtsis, Charalampos, Floreano, Dario, Vasiljevic, Goran, Bogdan, Stjepan, Caballero, Alvaro, Ruggiero, Fabio, Lippiello, Vincenzo, Matilla, Carlos, Cioffi, Giovanni, Scaramuzza, Davide, Martinez-de-Dios, Jose R., Arrue, Begona C., Martin, Carlos, Zurad, Krzysztof, Gaitan, Carlos, Rodriguez, Jacob, Munoz, Antonio, and Viguria, Antidio
Subjects: Computer Science - Robotics
Abstract: Large-scale infrastructures are prone to deterioration due to age, environmental influences, and heavy usage. Ensuring their safety through regular inspections and maintenance is crucial to prevent incidents that can significantly affect public safety and the environment. This is especially pertinent in the context of electrical power networks, which, while essential for energy provision, can also be sources of forest fires. Intelligent drones have the potential to revolutionize inspection and maintenance, eliminating the risks for human operators, increasing productivity, reducing inspection time, and improving data collection quality. However, most of the current methods and technologies in aerial robotics have been trialed primarily in indoor testbeds or outdoor settings under strictly controlled conditions, always within the line of sight of human operators. Additionally, these methods and technologies have typically been evaluated in isolation, lacking comprehensive integration. This paper introduces the first autonomous system that combines various innovative aerial robots. This system is designed for extended-range inspections beyond the visual line of sight, features aerial manipulators for maintenance tasks, and includes support mechanisms for human operators working at elevated heights. The paper further discusses the successful validation of this system on numerous electrical power lines, with aerial robots executing flights over 10 kilometers away from their ground control stations.
Published: 2024

22. Flymation: Interactive Animation for Flying Robots

Author: Song, Yunlong and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Trajectory visualization and animation play critical roles in robotics research. However, existing data visualization and animation tools often lack flexibility, scalability, and versatility, resulting in limited capability to fully explore and analyze flight data. To address this limitation, we introduce Flymation, a new flight trajectory visualization and animation tool. Built on the Unity3D engine, Flymation is an intuitive and interactive tool that allows users to visualize and analyze flight data in real time. Users can import data from various sources, including flight simulators and real-world data, and create customized visualizations with high-quality rendering. With Flymation, users can choose between trajectory snapshot and animation; both provide valuable insights into the behavior of the underlying autonomous system. Flymation represents an exciting step toward visualizing and interacting with large-scale data in robotics research., Comment: This work was presented at Workshop at ICRA 2023 ( The Role of Robotics Simulators for Unmanned Aerial Vehicles)
Published: 2023

23. Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning

Author: Song, Yunlong, Romero, Angel, Mueller, Matthias, Koltun, Vladlen, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Machine Learning
Abstract: A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.
Published: 2023
Full Text: View/download PDF

24. A 5-Point Minimal Solver for Event Camera Relative Motion Estimation

Author: Gao, Ling, Su, Hang, Gehrig, Daniel, Cannici, Marco, Scaramuzza, Davide, and Kneip, Laurent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event-based cameras are ideal for line-based motion estimation, since they predominantly respond to edges in the scene. However, accurately determining the camera displacement based on events continues to be an open problem. This is because line feature extraction and dynamics estimation are tightly coupled when using event cameras, and no precise model is currently available for describing the complex structures generated by lines in the space-time volume of events. We solve this problem by deriving the correct non-linear parametrization of such manifolds, which we term eventails, and demonstrate its application to event-based linear motion estimation, with known rotation from an Inertial Measurement Unit. Using this parametrization, we introduce a novel minimal 5-point solver that jointly estimates line parameters and linear camera velocity projections, which can be fused into a single, averaged linear velocity when considering multiple lines. We demonstrate on both synthetic and real data that our solver generates more stable relative motion estimates than other methods while capturing more inliers than clustering based on spatio-temporal planes. In particular, our method consistently achieves a 100% success rate in estimating linear velocity where existing closed-form solvers only achieve between 23% and 70%. The proposed eventails contribute to a better understanding of spatio-temporal event-generated geometries and we thus believe it will become a core building block of future event-based motion estimation algorithms.
Published: 2023
Full Text: View/download PDF

25. Learning to Walk and Fly with Adversarial Motion Priors

Author: L'Erario, Giuseppe, Hanover, Drew, Romero, Angel, Song, Yunlong, Nava, Gabriele, Viceconte, Paolo Maria, Pucci, Daniele, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Robot multimodal locomotion encompasses the ability to transition between walking and flying, representing a significant challenge in robotics. This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task without the need for complex reward functions. The robot learns walking patterns from human-like gaits and aerial locomotion patterns from motions obtained using trajectory optimization. Through this process, the robot adapts the locomotion scheme based on environmental feedback using reinforcement learning, with the spontaneous emergence of mode-switching behavior. The results highlight the potential for achieving multimodal locomotion in aerial humanoid robotics through automatic control of walking and flying modes, paving the way for applications in diverse domains such as search and rescue, surveillance, and exploration missions. This research contributes to advancing the capabilities of aerial humanoid robots in terms of versatile locomotion in various environments., Comment: This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, 2024
Published: 2023

26. Deep Visual Odometry with Events and Frames

Author: Pellerito, Roberto, Cannici, Marco, Gehrig, Daniel, Belhadj, Joris, Dubois-Matra, Olivier, Casasco, Massimo, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Odometry (VO) is crucial for autonomous robotic navigation, especially in GPS-denied environments like planetary terrains. To improve robustness, recent model-based VO systems have begun combining standard and event-based cameras. While event cameras excel in low-light and high-speed motion, standard cameras provide dense and easier-to-track features. However, the field of image- and event-based VO still predominantly relies on model-based methods and is yet to fully integrate recent image-only advancements leveraging end-to-end learning-based architectures. Seamlessly integrating the two modalities remains challenging due to their different nature, one asynchronous, the other not, limiting the potential for a more effective image- and event-based VO. We introduce RAMP-VO, the first end-to-end learned image- and event-based VO system. It leverages novel Recurrent, Asynchronous, and Massively Parallel (RAMP) encoders capable of fusing asynchronous events with image data, providing 8x faster inference and 33% more accurate predictions than existing solutions. Despite being trained only in simulation, RAMP-VO outperforms previous methods on the newly introduced Apollo and Malapert datasets, and on existing benchmarks, where it improves image- and event-based methods by 58.8% and 30.6%, paving the way for robust and asynchronous VO in space., Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
Published: 2023

27. Contrastive Learning for Enhancing Robust Scene Transfer in Vision-based Agile Flight

Author: Xing, Jiaxu, Bauersfeld, Leonard, Song, Yunlong, Xing, Chunwei, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Scene transfer for vision-based mobile robotics applications is a highly relevant and challenging problem. The utility of a robot greatly depends on its ability to perform a task in the real world, outside of a well-controlled lab environment. Existing scene transfer end-to-end policy learning approaches often suffer from poor sample efficiency or limited generalization capabilities, making them unsuitable for mobile robotics applications. This work proposes an adaptive multi-pair contrastive learning strategy for visual representation learning that enables zero-shot scene transfer and real-world deployment. Control policies relying on the embedding are able to operate in unseen environments without the need for finetuning in the deployment environment. We demonstrate the performance of our approach on the task of agile, vision-based quadrotor flight. Extensive simulation and real-world experiments demonstrate that our approach successfully generalizes beyond the training domain and outperforms all baselines.
Published: 2023

28. Contrastive Initial State Buffer for Reinforcement Learning

Author: Messikommer, Nico, Song, Yunlong, and Scaramuzza, Davide
Subjects: Computer Science - Machine Learning
Abstract: In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.
Published: 2023

29. Seeing Behind Dynamic Occlusions with Event Cameras

Author: Zou, Rong, Muglikar, Manasi, Messikommer, Nico, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Unwanted camera occlusions, such as debris, dust, rain-drops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. Existing occlusion-removal methods currently use synthetic aperture imaging or image inpainting. However, they face issues with dynamic occlusions as these require multiple viewpoints or user-generated masks to hallucinate the background intensity. We propose a novel approach to reconstruct the background from a single viewpoint in the presence of dynamic occlusions. Our solution relies for the first time on the combination of a traditional camera with an event camera. When an occlusion moves across a background image, it causes intensity changes that trigger events. These events provide additional information on the relative intensity changes between foreground and background at a high temporal resolution, enabling a truer reconstruction of the background content. We present the first large-scale dataset consisting of synchronized images and event sequences to evaluate our approach. We show that our method outperforms image inpainting methods by 3dB in terms of PSNR on our dataset.
Published: 2023

30. Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight

Author: Foehn, Philipp, Kaufmann, Elia, Romero, Angel, Penicka, Robert, Sun, Sihao, Bauersfeld, Leonard, Laengle, Thomas, Cioffi, Giovanni, Song, Yunlong, Loquercio, Antonio, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Autonomous, agile quadrotor flight raises fundamental challenges for robotics research in terms of perception, planning, learning, and control. A versatile and standardized platform is needed to accelerate research and let practitioners focus on the core problems. To this end, we present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open-source and open-hardware and supports both model-based and neural-network--based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, GPU-accelerated compute hardware for real-time perception and neural-network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both model-based and neural-network--based controllers. Our demonstrators include trajectory tracking at up to 5g and 70 km/h in a motion-capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Finally, we demonstrate its use for hardware-in-the-loop simulation in virtual-reality environments. Thanks to its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research., Comment: 14 pages, 5 figures, 2 tables
Published: 2023
Full Text: View/download PDF

31. HDVIO: Improving Localization and Disturbance Estimation with Hybrid Dynamics VIO

Author: Cioffi, Giovanni, Bauersfeld, Leonard, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Visual-inertial odometry (VIO) is the most common approach for estimating the state of autonomous micro aerial vehicles using only onboard sensors. Existing methods improve VIO performance by including a dynamics model in the estimation pipeline. However, such methods degrade in the presence of low-fidelity vehicle models and continuous external disturbances, such as wind. Our proposed method, HDVIO, overcomes these limitations by using a hybrid dynamics model that combines a point-mass vehicle model with a learning-based component that captures complex aerodynamic effects. HDVIO estimates the external force and the full robot state by leveraging the discrepancy between the actual motion and the predicted motion of the hybrid dynamics model. Our hybrid dynamics model uses a history of thrust and IMU measurements to predict the vehicle dynamics. To demonstrate the performance of our method, we present results on both public and novel drone dynamics datasets and show real-world experiments of a quadrotor flying in strong winds up to 25 km/h. The results show that our approach improves the motion and external force estimation compared to the state-of-the-art by up to 33% and 40%, respectively. Furthermore, differently from existing methods, we show that it is possible to predict the vehicle dynamics accurately while having no explicit knowledge of its full state.
Published: 2023

32. Actor-Critic Model Predictive Control

Author: Romero, Angel, Song, Yunlong, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: An open research question in robotics is how to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC). This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in both simulation and the real world with a quadcopter platform across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out of distribution behaviour., Comment: 6 pages, 5 figures
Published: 2023

33. E-Calib: A Fast, Robust and Accurate Calibration Toolbox for Event Cameras

Author: Salah, Mohammed, Ayyad, Abdulla, Humais, Muhammad, Gehrig, Daniel, Abusafieh, Abdelqader, Seneviratne, Lakmal, Scaramuzza, Davide, and Zweiri, Yahya
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event cameras triggered a paradigm shift in the computer vision community delineated by their asynchronous nature, low latency, and high dynamic range. Calibration of event cameras is always essential to account for the sensor intrinsic parameters and for 3D perception. However, conventional image-based calibration techniques are not applicable due to the asynchronous, binary output of the sensor. The current standard for calibrating event cameras relies on either blinking patterns or event-based image reconstruction algorithms. These approaches are difficult to deploy in factory settings and are affected by noise and artifacts degrading the calibration performance. To bridge these limitations, we present E-Calib, a novel, fast, robust, and accurate calibration toolbox for event cameras utilizing the asymmetric circle grid, for its robustness to out-of-focus scenes. The proposed method is tested in a variety of rigorous experiments for different event camera models, on circle grids with different geometric properties, and under challenging illumination conditions. The results show that our approach outperforms the state-of-the-art in detection success rate, reprojection error, and estimation accuracy of extrinsic parameters., Comment: 13 pages, 6 tables, 15 figures
Published: 2023

34. Revisiting Token Pruning for Object Detection and Instance Segmentation

Author: Liu, Yifei, Gehrig, Mathias, Messikommer, Nico, Cannici, Marco, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number of tokens may not be necessary, as not all tokens are equally important. In this paper, we investigate token pruning to accelerate inference for object detection and instance segmentation, extending prior works from image classification. Through extensive experiments, we offer four insights for dense tasks: (i) tokens should not be completely pruned and discarded, but rather preserved in the feature maps for later use. (ii) reactivating previously pruned tokens can further enhance model performance. (iii) a dynamic pruning rate based on images is better than a fixed pruning rate. (iv) a lightweight, 2-layer MLP can effectively prune tokens, achieving accuracy comparable with complex gating networks with a simpler design. We assess the effects of these design decisions on the COCO dataset and introduce an approach that incorporates these findings, showing a reduction in performance decline from ~1.5 mAP to ~0.3 mAP in both boxes and masks, compared to existing token pruning methods. In relation to the dense counterpart that utilizes all tokens, our method realizes an increase in inference speed, achieving up to 34% faster performance for the entire network and 46% for the backbone.
Published: 2023

35. Low-latency automotive vision with event cameras

Author: Gehrig, Daniel and Scaramuzza, Davide
Published: 2024
Full Text: View/download PDF

36. From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection

Author: Zubić, Nikola, Gehrig, Daniel, Gehrig, Mathias, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. This work eliminates this bottleneck by selecting representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation. It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, datasets, and tasks. Thus finding representations with high task scores is equivalent to finding representations with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark. Moreover, we outperform state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of explicit representation optimization for event-based learning., Comment: 15 pages, 11 figures, 2 tables, ICCV 2023 Camera Ready paper
Published: 2023

37. Microgravity induces overconfidence in perceptual decision-making

Author: Loued-Khenissi, Leyla, Pfeiffer, Christian, Saxena, Rupal, Adarsh, Shivam, and Scaramuzza, Davide
Subjects: Computer Science - Robotics, Computer Science - Human-Computer Interaction
Abstract: Does gravity affect decision-making? This question comes into sharp focus as plans for interplanetary human space missions solidify. In the framework of Bayesian brain theories, gravity encapsulates a strong prior, anchoring agents to a reference frame via the vestibular system, informing their decisions and possibly their integration of uncertainty. What happens when such a strong prior is altered? We address this question using a self-motion estimation task in a space analog environment under conditions of altered gravity. Two participants were cast as remote drone operators orbiting Mars in a virtual reality environment on board a parabolic flight, where both hyper- and microgravity conditions were induced. From a first-person perspective, participants viewed a drone exiting a cave and had to first predict a collision and then provide a confidence estimate of their response. We evoked uncertainty in the task by manipulating the motion's trajectory angle. Post-decision subjective confidence reports were negatively predicted by stimulus uncertainty, as expected. Uncertainty alone did not impact overt behavioral responses (performance, choice) differentially across gravity conditions. However microgravity predicted higher subjective confidence, especially in interaction with stimulus uncertainty. These results suggest that variables relating to uncertainty affect decision-making distinctly in microgravity, highlighting the possible need for automatized, compensatory mechanisms when considering human factors in space research., Comment: 12 pages, 10 figures
Published: 2023
Full Text: View/download PDF

38. Neuromorphic Optical Flow and Real-time Implementation with Event Cameras

Author: Schnider, Yannick, Wozniak, Stanislaw, Gehrig, Mathias, Lecomte, Jules, von Arnim, Axel, Benini, Luca, Scaramuzza, Davide, and Pantazi, Angeliki
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing
Abstract: Optical flow provides information on relative motion that is an important component in many computer vision pipelines. Neural networks provide high accuracy optical flow, yet their complexity is often prohibitive for application at the edge or in robots, where efficiency and latency play crucial role. To address this challenge, we build on the latest developments in event-based vision and spiking neural networks. We propose a new network architecture, inspired by Timelens, that improves the state-of-the-art self-supervised optical flow accuracy when operated both in spiking and non-spiking mode. To implement a real-time pipeline with a physical event camera, we propose a methodology for principled model simplification based on activity and latency analysis. We demonstrate high speed optical flow prediction with almost two orders of magnitude reduced complexity while maintaining the accuracy, opening the path for real-time deployments., Comment: Accepted for IEEE CVPRW, Vancouver 2023. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media. Copyright 2023 IEEE
Published: 2023

39. Learning Agile, Vision-based Drone Flight: from Simulation to Reality

Author: Scaramuzza, Davide and Kaufmann, Elia
Subjects: Computer Science - Robotics
Abstract: We present our latest research in learning deep sensorimotor policies for agile, vision-based quadrotor flight. We show methodologies for the successful transfer of such policies from simulation to the real world. In addition, we discuss the open research questions that still need to be answered to improve the agility and robustness of autonomous drones toward human-pilot performance.
Published: 2023

40. Autonomous Power Line Inspection with Drones via Perception-Aware MPC

Author: Xing, Jiaxu, Cioffi, Giovanni, Hidalgo-Carrió, Javier, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Drones have the potential to revolutionize power line inspection by increasing productivity, reducing inspection time, improving data quality, and eliminating the risks for human operators. Current state-of-the-art systems for power line inspection have two shortcomings: (i) control is decoupled from perception and needs accurate information about the location of the power lines and masts; (ii) obstacle avoidance is decoupled from the power line tracking, which results in poor tracking in the vicinity of the power masts, and, consequently, in decreased data quality for visual inspection. In this work, we propose a model predictive controller (MPC) that overcomes these limitations by tightly coupling perception and action. Our controller generates commands that maximize the visibility of the power lines while, at the same time, safely avoiding the power masts. For power line detection, we propose a lightweight learning-based detector that is trained only on synthetic data and is able to transfer zero-shot to real-world power line images. We validate our system in simulation and real-world experiments on a mock-up power line infrastructure. We release our code and datasets to the public.
Published: 2023

41. Event-based Agile Object Catching with a Quadrupedal Robot

Author: Forrai, Benedek, Miki, Takahiro, Gehrig, Daniel, Hutter, Marco, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equipping our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin.
Published: 2023

42. A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception

Author: Aydin, Asude, Gehrig, Mathias, Gehrig, Daniel, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Spiking Neural Networks (SNN) are a class of bio-inspired neural networks that promise to bring low-power and low-latency inference to edge devices through asynchronous and sparse processing. However, being temporal models, SNNs depend heavily on expressive states to generate predictions on par with classical artificial neural networks (ANNs). These states converge only after long transient periods, and quickly decay without input data, leading to higher latency, power consumption, and lower accuracy. This work addresses this issue by initializing the state with an auxiliary ANN running at a low rate. The SNN then uses the state to generate predictions with high temporal resolution until the next initialization phase. Our hybrid ANN-SNN model thus combines the best of both worlds: It does not suffer from long state transients and state decay thanks to the ANN, and can generate predictions with high temporal resolution, low latency, and low power thanks to the SNN. We show for the task of event-based 2D and 3D human pose estimation that our method consumes 88% less power with only a 4% decrease in performance compared to its fully ANN counterparts when run at the same inference rate. Moreover, when compared to SNNs, our method achieves a 74% lower error. This research thus provides a new understanding of how ANNs and SNNs can be used to maximize their respective benefits.
Published: 2023

43. COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation

Author: Munasinghe, Charith, Amin, Fatemeh Mohammadi, Scaramuzza, Davide, and van de Venn, Hans Wernher
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Safe human-robot collaboration (HRC) has recently gained a lot of interest with the emerging Industry 5.0 paradigm. Conventional robots are being replaced with more intelligent and flexible collaborative robots (cobots). Safe and efficient collaboration between cobots and humans largely relies on the cobot's comprehensive semantic understanding of the dynamic surrounding of industrial environments. Despite the importance of semantic understanding for such applications, 3D semantic segmentation of collaborative robot workspaces lacks sufficient research and dedicated datasets. The performance limitation caused by insufficient datasets is called 'data hunger' problem. To overcome this current limitation, this work develops a new dataset specifically designed for this use case, named "COVERED", which includes point-wise annotated point clouds of a robotic cell. Lastly, we also provide a benchmark of current state-of-the-art (SOTA) algorithm performance on the dataset and demonstrate a real-time semantic segmentation of a collaborative robot workspace using a multi-LiDAR system. The promising results from using the trained Deep Networks on a real-time dynamically changing situation shows that we are on the right track. Our perception pipeline achieves 20Hz throughput with a prediction point accuracy of $>$96\% and $>$92\% mean intersection over union (mIOU) while maintaining an 8Hz throughput.
Published: 2023
Full Text: View/download PDF

44. Improving safety in physical human-robot collaboration via deep metric learning

Author: Rezayati, Maryam, Zanni, Grammatiki, Zaoshi, Ying, Scaramuzza, Davide, and van de Venn, Hans Wernher
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Direct physical interaction with robots is becoming increasingly important in flexible production scenarios, but robots without protective fences also pose a greater risk to the operator. In order to keep the risk potential low, relatively simple measures are prescribed for operation, such as stopping the robot if there is physical contact or if a safety distance is violated. Although human injuries can be largely avoided in this way, all such solutions have in common that real cooperation between humans and robots is hardly possible and therefore the advantages of working with such systems cannot develop its full potential. In human-robot collaboration scenarios, more sophisticated solutions are required that make it possible to adapt the robot's behavior to the operator and/or the current situation. Most importantly, during free robot movement, physical contact must be allowed for meaningful interaction and not recognized as a collision. However, here lies a key challenge for future systems: detecting human contact by using robot proprioception and machine learning algorithms. This work uses the Deep Metric Learning (DML) approach to distinguish between non-contact robot movement, intentional contact aimed at physical human-robot interaction, and collision situations. The achieved results are promising and show show that DML achieves 98.6\% accuracy, which is 4\% higher than the existing standards (i.e. a deep learning network trained without DML). It also indicates a promising generalization capability for easy portability to other robots (target robots) by detecting contact (distinguishing between contactless and intentional or accidental contact) without having to retrain the model with target robot data.
Published: 2023
Full Text: View/download PDF

45. Event-based Shape from Polarization

Author: Muglikar, Manasi, Bauersfeld, Leonard, Moeys, Diederik Paul, and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: State-of-the-art solutions for Shape-from-Polarization (SfP) suffer from a speed-resolution tradeoff: they either sacrifice the number of polarization angles measured or necessitate lengthy acquisition times due to framerate constraints, thus compromising either accuracy or latency. We tackle this tradeoff using event cameras. Event cameras operate at microseconds resolution with negligible motion blur, and output a continuous stream of events that precisely measures how light changes over time asynchronously. We propose a setup that consists of a linear polarizer rotating at high-speeds in front of an event camera. Our method uses the continuous event stream caused by the rotation to reconstruct relative intensities at multiple polarizer angles. Experiments demonstrate that our method outperforms physics-based baselines using frames, reducing the MAE by 25% in synthetic and real-world dataset. In the real world, we observe, however, that the challenging conditions (i.e., when few events are generated) harm the performance of physics-based solutions. To overcome this, we propose a learning-based approach that learns to estimate surface normals even at low event-rates, improving the physics-based approach by 52% on the real world dataset. The proposed system achieves an acquisition speed equivalent to 50 fps (>twice the framerate of the commercial polarization sensor) while retaining the spatial resolution of 1MP. Our evaluation is based on the first large-scale dataset for event-based SfP, Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 2023
Published: 2023

46. Autonomous Drone Racing: A Survey

Author: Hanover, Drew, Loquercio, Antonio, Bauersfeld, Leonard, Romero, Angel, Penicka, Robert, Song, Yunlong, Cioffi, Giovanni, Kaufmann, Elia, and Scaramuzza, Davide
Subjects: Computer Science - Robotics
Abstract: Over the last decade, the use of autonomous drone systems for surveying, search and rescue, or last-mile delivery has increased exponentially. With the rise of these applications comes the need for highly robust, safety-critical algorithms which can operate drones in complex and uncertain environments. Additionally, flying fast enables drones to cover more ground which in turn increases productivity and further strengthens their use case. One proxy for developing algorithms used in high-speed navigation is the task of autonomous drone racing, where researchers program drones to fly through a sequence of gates and avoid obstacles as quickly as possible using onboard sensors and limited computational power. Speeds and accelerations exceed over 80 kph and 4 g respectively, raising significant challenges across perception, planning, control, and state estimation. To achieve maximum performance, systems require real-time algorithms that are robust to motion blur, high dynamic range, model uncertainties, aerodynamic disturbances, and often unpredictable opponents. This survey covers the progression of autonomous drone racing across model-based and learning-based approaches. We provide an overview of the field, its evolution over the years, and conclude with the biggest challenges and open questions to be faced in the future., Comment: 26 pages
Published: 2023
Full Text: View/download PDF

47. Recurrent Vision Transformers for Object Detection with Event Cameras

Author: Gehrig, Mathias and Scaramuzza, Davide
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras. Event cameras provide visual information with sub-millisecond latency at a high-dynamic range and with strong robustness against motion blur. These unique properties offer great potential for low-latency object detection and tracking in time-critical scenarios. Prior work in event-based vision has achieved outstanding detection performance but at the cost of substantial inference time, typically beyond 40 milliseconds. By revisiting the high-level design of recurrent vision backbones, we reduce inference time by a factor of 6 while retaining similar performance. To achieve this, we explore a multi-stage design that utilizes three key concepts in each stage: First, a convolutional prior that can be regarded as a conditional positional embedding. Second, local and dilated global self-attention for spatial feature interaction. Third, recurrent temporal feature aggregation to minimize latency while retaining temporal information. RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection - achieving an mAP of 47.2% on the Gen1 automotive dataset. At the same time, RVTs offer fast inference (<12 ms on a T4 GPU) and favorable parameter efficiency (5 times fewer than prior art). Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
Published: 2022

48. SLAM for Visually Impaired People: a Survey

Author: Bamdad, Marziyeh, Scaramuzza, Davide, and Darvishy, Alireza
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent decades, several assistive technologies have been developed to improve the ability of blind and visually impaired (BVI) individuals to navigate independently and safely. At the same time, simultaneous localization and mapping (SLAM) techniques have become sufficiently robust and efficient to be adopted in developing these assistive technologies. We present the first systematic literature review of 54 recent studies on SLAM-based solutions for blind and visually impaired people, focusing on literature published from 2017 onward. This review explores various localization and mapping techniques employed in this context. We systematically identified and categorized diverse SLAM approaches and analyzed their localization and mapping techniques, sensor types, computing resources, and machine-learning methods. We discuss the advantages and limitations of these techniques for blind and visually impaired navigation. Moreover, we examine the major challenges described across studies, including practical challenges and considerations that affect usability and adoption. Our analysis also evaluates the effectiveness of these SLAM-based solutions in real-world scenarios and user satisfaction, providing insights into their practical impact on BVI mobility. The insights derived from this review identify critical gaps and opportunities for future research activities, particularly in addressing the challenges presented by dynamic and complex environments. We explain how SLAM technology offers the potential to improve the ability of visually impaired individuals to navigate effectively. Finally, we present future opportunities and challenges in this domain., Comment: 47 pages, 42 tables, 6 figures
Published: 2022

49. A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception.

Author: Asude Aydin, Mathias Gehrig, Daniel Gehrig, and Davide Scaramuzza 0001
Published: 2024
Full Text: View/download PDF

50. State Space Models for Event Cameras.

Author: Nikola Zubic, Mathias Gehrig, and Davide Scaramuzza 0001
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

1,548 results on '"Scaramuzza, Davide'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources