148 results on '"Moreno-Noguer, Francesc"'
Search Results
2. Simultaneous completion and spatiotemporal grouping of corrupted motion tracks
- Author
-
Agudo, Antonio, Lepetit, Vincent, and Moreno-Noguer, Francesc
- Published
- 2022
- Full Text
- View/download PDF
3. GANimation: One-Shot Anatomically Consistent Facial Animation
- Author
-
Pumarola, Albert, Agudo, Antonio, Martinez, Aleix M., Sanfeliu, Alberto, and Moreno-Noguer, Francesc
- Published
- 2020
- Full Text
- View/download PDF
4. A scalable, efficient, and accurate solution to non-rigid structure from motion
- Author
-
Agudo, Antonio and Moreno-Noguer, Francesc
- Published
- 2018
- Full Text
- View/download PDF
5. Online learning and detection of faces with low human supervision
- Author
-
Villamizar, Michael, Sanfeliu, Alberto, and Moreno-Noguer, Francesc
- Published
- 2019
- Full Text
- View/download PDF
6. Real-time 3D reconstruction of non-rigid shapes with a single moving camera
- Author
-
Agudo, Antonio, Moreno-Noguer, Francesc, Calvo, Begoña, and Montiel, J.M.M.
- Published
- 2016
- Full Text
- View/download PDF
7. A 3D descriptor to detect task-oriented grasping points in clothing
- Author
-
Ramisa, Arnau, Alenyà, Guillem, Moreno-Noguer, Francesc, and Torras, Carme
- Published
- 2016
- Full Text
- View/download PDF
8. Interactive multiple object learning with scanty human supervision
- Author
-
Villamizar, Michael, Garrell, Anaís, Sanfeliu, Alberto, and Moreno-Noguer, Francesc
- Published
- 2016
- Full Text
- View/download PDF
9. A Bayesian approach to simultaneously recover camera pose and non-rigid shape from monocular images
- Author
-
Moreno-Noguer, Francesc and Porta, Josep M.
- Published
- 2016
- Full Text
- View/download PDF
10. On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis
- Author
-
Ordonez-Apraez, Daniel, Martin, Mario, Agudo, Antonio, and Moreno-Noguer, Francesc
- Subjects
FOS: Computer and information sciences ,Computer Science - Robotics ,Computer Science - Machine Learning ,J.2 ,FOS: Electrical engineering, electronic engineering, information engineering ,Systems and Control (eess.SY) ,37J15 ,Robotics (cs.RO) ,Electrical Engineering and Systems Science - Systems and Control ,Machine Learning (cs.LG) - Abstract
We present a comprehensive study on discrete morphological symmetries of dynamical systems, which are commonly observed in biological and artificial locomoting systems, such as legged, swimming, and flying animals/robots/virtual characters. These symmetries arise from the presence of one or more planes/axis of symmetry in the system's morphology, resulting in harmonious duplication and distribution of body parts. Significantly, we characterize how morphological symmetries extend to symmetries in the system's dynamics, optimal control policies, and in all proprioceptive and exteroceptive measurements related to the system's dynamics evolution. In the context of data-driven methods, symmetry represents an inductive bias that justifies the use of data augmentation or symmetric function approximators. To tackle this, we present a theoretical and practical framework for identifying the system's morphological symmetry group $\G$ and characterizing the symmetries in proprioceptive and exteroceptive data measurements. We then exploit these symmetries using data augmentation and $\G$-equivariant neural networks. Our experiments on both synthetic and real-world applications provide empirical evidence of the advantageous outcomes resulting from the exploitation of these symmetries, including improved sample efficiency, enhanced generalization, and reduction of trainable parameters., 8 pages, 4 figures, 7 optional appendix pages, 4 appendix figures
- Published
- 2023
11. Morphological symmetries in robot learning
- Author
-
Ordoñez Apraez, Daniel Felipe, Martín Muñoz, Mario, Agudo Martínez, Antonio, Moreno-Noguer, Francesc, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group, and Universitat Politècnica de Catalunya. ROBiri - Grup de Percepció i Manipulació Robotitzada de l'IRI
- Subjects
Symmetries in robotics ,Symmetric robotic systems ,Machine learning ,Aprenentatge automàtic ,Morphological symmetries ,Finite symmetry groups ,Informàtica::Robòtica [Àrees temàtiques de la UPC] ,Simetria (Biologia) ,Robots ,Symmetry (Biology) ,Locomotion ,Symmetric dynamical systems - Abstract
This work studies the impact of morphological symmetries in learning applications in robotics. Morphological symmetries are a predominant feature in both biological and robotic systems, arising from the presence of planes/axis of symmetry in the system's morphology. This results in harmonious duplication and distribution of body parts (e.g., humans' sagittal/left-right symmetry). Morphological symmetries become a significant learning prior as they extend to symmetries in the system's dynamics, optimal control policies, and in all proprioceptive and exteroceptive measurements, related to the system's dynamics evolution \cite{ordonez2023discrete}. Exploiting these symmetries in learning applications offers several advantageous outcomes, such as the use of data augmentation to mitigate the cost and challenges of data collection, or the use of equivariant/invariant function approximation models (e.g., neural networks) to improve sample efficiency and generalization, while reducing the number of trainable parameters. Lastly, we provide a video presentation and an open access repository reproducing our experiments and allowing for rapid prototyping in robot learning applications exploiting morphological symmetries. This work is supported by the Spanish government with the project MoHuCo PID2020-120049RB-I00 and the ERA-Net Chistera project IPALM PCI2019-103386.
- Published
- 2023
12. Combining Local-Physical and Global-Statistical Models for Sequential Deformable Shape from Motion
- Author
-
Agudo, Antonio and Moreno-Noguer, Francesc
- Published
- 2017
- Full Text
- View/download PDF
13. 3D Human Pose Tracking Priors using Geodesic Mixture Models
- Author
-
Simo-Serra, Edgar, Torras, Carme, and Moreno-Noguer, Francesc
- Published
- 2017
- Full Text
- View/download PDF
14. PoseScript: 3D Human Poses from Natural Language
- Author
-
Delmas, Ginger, Weinzaepfel, Philippe, Lucas, Thomas, Moreno-Noguer, Francesc, and Rogez, Grégory
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Natural language is leveraged in many computer vision tasks such as image captioning, cross-modal retrieval or visual question answering, to provide fine-grained semantic information. While human pose is key to human understanding, current 3D human pose datasets lack detailed language descriptions. In this work, we introduce the PoseScript dataset, which pairs a few thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. To increase the size of this dataset to a scale compatible with typical data hungry learning algorithms, we propose an elaborate captioning process that generates automatic synthetic descriptions in natural language from given 3D keypoints. This process extracts low-level pose information -- the posecodes -- using a set of simple but generic rules on the 3D keypoints. The posecodes are then combined into higher level textual descriptions using syntactic rules. Automatic annotations substantially increase the amount of available data, and make it possible to effectively pretrain deep models for finetuning on human captions. To demonstrate the potential of annotated poses, we show applications of the PoseScript dataset to retrieval of relevant poses from large-scale datasets and to synthetic pose generation, both based on a textual pose description., Published in ECCV 2022
- Published
- 2022
15. Topic Detection in Continuous Sign Language Videos
- Author
-
Budria, Alvaro, Tarres, Laia, Gallego, Gerard I., Moreno-Noguer, Francesc, Torres, Jordi, and Giro-i-Nieto, Xavier
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Significant progress has been made recently on challenging tasks in automatic sign language understanding, such as sign language recognition, translation and production. However, these works have focused on datasets with relatively few samples, short recordings and limited vocabulary and signing space. In this work, we introduce the novel task of sign language topic detection. We base our experiments on How2Sign, a large-scale video dataset spanning multiple semantic domains. We provide strong baselines for the task of topic detection and present a comparison between different visual features commonly used in the domain of sign language., Presented as an extended abstract in the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop
- Published
- 2022
16. DaLI: Deformation and Light Invariant Descriptor
- Author
-
Simo-Serra, Edgar, Torras, Carme, and Moreno-Noguer, Francesc
- Published
- 2015
- Full Text
- View/download PDF
17. PhysXNet: A Customizable Approach for LearningCloth Dynamics on Dressed People
- Author
-
Sanchez-Riera, Jordi, Pumarola, Albert, and Moreno-Noguer, Francesc
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,GeneralLiterature_MISCELLANEOUS ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We introduce PhysXNet, a learning-based approach to predict the dynamics of deformable clothes given 3D skeleton motion sequences of humans wearing these clothes. The proposed model is adaptable to a large variety of garments and changing topologies, without need of being retrained. Such simulations are typically carried out by physics engines that require manual human expertise and are subjectto computationally intensive computations. PhysXNet, by contrast, is a fully differentiable deep network that at inference is able to estimate the geometry of dense cloth meshes in a matter of milliseconds, and thus, can be readily deployed as a layer of a larger deep learning architecture. This efficiency is achieved thanks to the specific parameterization of the clothes we consider, based on 3D UV maps encoding spatial garment displacements. The problem is then formulated as a mapping between the human kinematics space (represented also by 3D UV maps of the undressed body mesh) into the clothes displacement UV maps, which we learn using a conditional GAN with a discriminator that enforces feasible deformations. We train simultaneously our model for three garment templates, tops, bottoms and dresses for which we simulate deformations under 50 different human actions. Nevertheless, the UV map representation we consider allows encapsulating many different cloth topologies, and at test we can simulate garments even if we did not specifically train for them. A thorough evaluation demonstrates that PhysXNet delivers cloth deformations very close to those computed with the physical engine, opening the door to be effectively integrated within deeplearning pipelines.
- Published
- 2021
18. Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision
- Author
-
Ren, Ruijie, Rajesh, Mohit Gurnani, Sanchez-Riera, Jordi, Zhang, Fan, Tian, Yurun, Agudo, Antonio, Demiris, Yiannis, Mikolajczyk, Krystian, and Moreno-Noguer, Francesc
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognising larger segments, such as semantic edges that carry more information than points. The first goal of this paper is therefore to tackle the problem of fine-grained region detection in deformed clothes using only a depth image. As a proof of concept, we implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-net based network to segment and label these parts. The second contribution of our work is concerned with the level of supervision that we require to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we defy the limitations of the synthetic data, and propose a multilayered domain adaptation (DA) strategy that does not use real annotations at all. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels. We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data., 6 pages, 4 figures. Submitted to International Conference on Robotics and Automation (ICRA)
- Published
- 2021
19. Multi-Person Extreme Motion Prediction with Cross-Interaction Attention
- Author
-
Guo, Wen, Bie, Xiaoyu, Alameda-Pineda, Xavier, Moreno-Noguer, Francesc, Vers des robots à l’intelligence sociale au travers de l’apprentissage, de la perception et de la commande (ROBOTLEARN), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Grenoble Alpes (UGA), Institut de Robòtica i Informàtica Industrial (IRI), and Consejo Superior de Investigaciones Científicas [Madrid] (CSIC)-Universitat Politècnica de Catalunya [Barcelona] (UPC)
- Subjects
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] - Abstract
Human motion prediction aims to forecast future human poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper we explore this problem from a novel perspective, involving humans performing collaborative tasks. We assume that the input of our system are two sequences of past skeletons for two interacting persons, and we aim to predict the future motion for each of them. For this purpose, we devise a novel cross interaction attention mechanism that exploits historical information of both persons and learns to predict cross dependencies between self poses and the poses of the other person in spite of their spatial or temporal distance. Since no dataset to train such interactive situations is available, we have captured ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing acrobatics. ExPI contains 115 sequences with 30k frames and 60k instances with annotated 3D body poses and shapes. We thoroughly evaluate our cross-interaction network on this dataset and show that both in short-term and long-term predictions, it consistently outperforms baselines that independently reason for each person. We plan to release our code jointly with the dataset and the train/test splits to spur future research on the topic.
- Published
- 2021
20. 3D Human Pose, Shape and Texture From Low-Resolution Images and Videos.
- Author
-
Xu, Xiangyu, Chen, Hao, Moreno-Noguer, Francesc, Jeni, Laszlo A., and De la Torre, Fernando
- Subjects
POSE estimation (Computer vision) ,VIDEO surveillance ,TELEVISED sports ,COMPUTER vision ,DEEP learning ,SPORTS films - Abstract
3D human pose and shape estimation from monocular images has been an active research area in computer vision. Existing deep learning methods for this task rely on high-resolution input, which however, is not always available in many scenarios such as video surveillance and sports broadcasting. Two common approaches to deal with low-resolution images are applying super-resolution techniques to the input, which may result in unpleasant artifacts, or simply training one model for each resolution, which is impractical in many realistic applications. To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme. The proposed method is able to learn 3D body pose and shape across different resolutions with one single model. The self-supervision loss enforces scale-consistency of the output, and the contrastive learning scheme enforces scale-consistency of the deep features. We show that both these new losses provide robustness when learning in a weakly-supervised manner. Moreover, we extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input. Extensive experiments demonstrate that the RSC-Net can achieve consistently better results than the state-of-the-art methods for challenging low-resolution images. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Challenge 4: Intelligent robotics
- Author
-
Alenyà Ribas, Guillem|||0000-0002-6018-154X, Villagrá Serrano, Jorge, Fernández Saavedra, Maria Belén, González de Santos, Pablo, Haber Guerra, Rodolfo E., Jiménez Ruiz, Antonio Ramón, Ribeiro, Angela, Rocón de Lima, Eduardo, Borràs Sol, Júlia, Moreno-Noguer, Francesc, Torras, Carme|||0000-0002-2933-398X, Institut de Robòtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
- Subjects
Artificial intelligence ,Robòtica ,Social aspects of automation ,Robotics ,Artificial intelligence--Engineering applications ,Intel·ligència artificial--Aplicacions a l'enginyeria ,Informàtica::Robòtica [Àrees temàtiques de la UPC] ,Intelligent robots - Abstract
Accés lliure al text del llibre a la web de l'editor Intelligent robotics are called to be the next revolution by providing AI with the capability of interacting with the physical world. Robots are overpassing their cages in the industry to become intelligent machines that can live among us, helping in the service sector, as tools in rehabilitation and assistive tasks, and also as companions. Robotics poses especial problems and AI research must be reshaped and redefined to meet robotics special needs in areas like perception and scene understanding, decision making and learning, and actuation. Besides these classical robotics areas, modern robots need to take into account the central role of human-robot interaction : unstructured environments, unforeseen situations, user preferences, and safety. The challenges to frame this revolution are multiple. We highlight the seven where we identify CSIC has a strategic advantage and thus can cause a better impact. Modern robotics implies robots in human environments, what we called here robots for everyone : easy reprogramming and continuous learning. Deployment can include big-scale mobile robots and cars for autonomous navigation for cities, or small-scale robots for intelligent manipulation for new applications, possibly making use of effective and adaptive coordination of robot fleets. Robots in human environments require safe and ethical human-robot interaction, that can take advantage of seamless cooperative and everywhere localization solutions and dexterity and efficiency through bio-inspired and parallel mechanisms. Advances on intelligent robotics will have a great impact on science, industry, and society in general. Robots have the potential to change people’s lifestyle and thus, require special attention from rule bodies and policymakers. However, robotics is highly experimental and requires special efforts in physically building the prototypes. To make this possible, we believe a new joint lab or infrastructure must be established to facilitate research and testing, foster collaboration and involve industry and policy-makers.
- Published
- 2021
22. PhysXNet: a customizable approach for learning cloth dynamics on dressed people
- Author
-
Sánchez-Riera, Jordi, Pumarola, Albert, Moreno-Noguer, Francesc, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. Doctorat en Automàtica, Robòtica i Visió, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents, Ministerio de Ciencia, Innovación y Universidades (España), Agencia Estatal de Investigación (España), and Ministerio de Economía y Competitividad (España)
- Subjects
Pipelines ,Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,Three-dimensional imaging ,Deep learning ,Solid modeling Computational modeling ,Object recognition ,Topology ,Imatgeria tridimensional ,ComputingMethodologies_COMPUTERGRAPHICS ,Gan ,Clothing - Abstract
Trabajo presentado en la International Conference on Computer Vision (ICCV), celebrada de forma virtual del 11 al 17 de octubre de 2021, We introduce PhysXNet, a learning-based approach to predict the dynamics of deformable clothes given 3D skeleton motion sequences of humans wearing these clothes. The proposed model is adaptable to a large variety of garments and changing topologies, without need of being retrained. Such simulations are typically carried out by physics engines that require manual human expertise and are subject to computationally intensive computations. PhysXNet, by contrast, is a fully differentiable deep network that at inference is able to estimate the geometry of dense cloth meshes in a matter of milliseconds, and thus, can be readily deployed as a layer of a larger deep learning architecture. This efficiency is achieved thanks to the specific parameterization of the clothes we consider, based on 3D UV maps encoding spatial garment displacements., This work is supported partly by the Spanish government under project MoHuCo PID2020-120049RB-I00, the ERANet Chistera project IPALM PCI2019-103386 and María de Maeztu Seal of Excellence MDM-2016-0656.
- Published
- 2021
23. FaceDet3D: Facial Expressions with 3D Geometric Detail Prediction
- Author
-
Athar, ShahRukh, Pumarola, Albert, Moreno-Noguer, Francesc, and Samaras, Dimitris
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Facial Expressions induce a variety of high-level details on the 3D face geometry. For example, a smile causes the wrinkling of cheeks or the formation of dimples, while being angry often causes wrinkling of the forehead. Morphable Models (3DMMs) of the human face fail to capture such fine details in their PCA-based representations and consequently cannot generate such details when used to edit expressions. In this work, we introduce FaceDet3D, a first-of-its-kind method that generates - from a single image - geometric facial details that are consistent with any desired target expression. The facial details are represented as a vertex displacement map and used then by a Neural Renderer to photo-realistically render novel images of any single image in any desired expression and view. The project website is: http://shahrukhathar.github.io/2020/12/14/FaceDet3D.html, Fixed errors in acknowledgements
- Published
- 2020
24. EPnP: An Accurate O(n) Solution to the PnP Problem
- Author
-
Lepetit, Vincent, Moreno-Noguer, Francesc, and Fua, Pascal
- Published
- 2009
- Full Text
- View/download PDF
25. Dependent multiple cue integration for robust tracking
- Author
-
Moreno-Noguer, Francesc, Sanfeliu, Alberto, and Samaras, Dimitris
- Subjects
Bayesian statistical decision theory -- Usage ,Estimation theory -- Methods ,Image processing -- Methods - Abstract
We propose a new technique for fusing multiple cues to robustly segment an object from its background in video sequences that suffer from abrupt changes of both illumination and position of the target. Robustness is achieved by the integration of appearance and geometric object features and by their estimation using Bayesian filters, such as Kalman or particle filters. In particular, each filter estimates the state of a specific object feature, conditionally dependent on another feature estimated by a distinct filter. This dependence provides improved target representations, permitting us to segment it out from the background even in nonstationary sequences. Considering that the procedure of the Bayesian filters may be described by a "hypotheses generationhypotheses correction" strategy, the major novelty of our methodology compared to previous approaches is that the mutual dependence between filters is considered during the feature observation, that is, into the "hypotheses-correction" stage, instead of considering it when generating the hypotheses. This proves to be much more effective in terms of accuracy and reliability. The proposed method is analytically justified and applied to develop a robust tracking system that adapts online and simultaneously the color space where the image points are represented, the color distributions, the contour of the object, and its bounding box. Results with synthetic data and real video sequences demonstrate the robustness and versatility of our method. Index Terms--Bayesian tracking, multiple cue integration.
- Published
- 2008
26. Integrating human body mocaps into Blender using RGB images
- Author
-
Sánchez-Riera, Jordi, Moreno-Noguer, Francesc, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Ministerio de Ciencia, Innovación y Universidades (España), Agencia Estatal de Investigación (España), and Ministerio de Economía y Competitividad (España)
- Subjects
Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Synthetic human model ,Computer vision ,MoCap ,Action mimic ,3D human pose estimation ,Pattern recognition::Computer vision [Classificació INSPEC] ,2D - Abstract
Trabajo presentado en el 13th International Conference on Advances in Computer-Human Interactions, celebrado en Valencia (España), del 21 al 25 de noviembre de 2020, Reducing the complexity and cost of a Motion Capture (MoCap) system has been of great interest in recent years. Unlike other systems that use depth range cameras, we present an algorithm that is capable of working as a MoCap system with a single Red-Green-Blue (RGB) camera, and it is completely integrated in an off-the-shelf rendering software. This makes our system easily deployable in outdoor and unconstrained scenarios. Our approach builds upon three main modules. First, given solely one input RGB image, we estimate 2D body pose; the second module estimates the 3D human pose from the previously calculated 2D coordinates, and the last module calculates the necessary rotations of the joints given the goal 3D point coordinates and the 3D virtual human model. We quantitatively evaluate the first two modules using synthetic images, and provide qualitative results of the overall system with real images recorded from a webcam., This work is supported by the Spanish MiNeCo under projects HuMoUR TIN2017-90086-R and Maria de Maeztu Seal of Excellence MDM-2016-0656. We also thank Marta Altarriba Fatsini for her support derivating the formulas to compute skeleton angle rotations.
- Published
- 2020
27. Fast video object segmentation with Spatio-Temporal GANs
- Author
-
Caelles, Sergi, Pumarola, Albert, Moreno-Noguer, Francesc, Sanfeliu, Alberto, and Van Gool, Luc
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning descriptive spatio-temporal object models from data is paramount for the task of semi-supervised video object segmentation. Most existing approaches mainly rely on models that estimate the segmentation mask based on a reference mask at the first frame (aided sometimes by optical flow or the previous mask). These models, however, are prone to fail under rapid appearance changes or occlusions due to their limitations in modelling the temporal component. On the other hand, very recently, other approaches learned long-term features using a convolutional LSTM to leverage the information from all previous video frames. Even though these models achieve better temporal representations, they still have to be fine-tuned for every new video sequence. In this paper, we present an intermediate solution and devise a novel GAN architecture, FaSTGAN, to learn spatio-temporal object models over finite temporal windows. To achieve this, we concentrate all the heavy computational load to the training phase with two critics that enforce spatial and temporal mask consistency over the last K frames. Then at test time, we only use a relatively light regressor, which reduces the inference time considerably. As a result, our approach combines a high resiliency to sudden geometric and photometric object changes with efficiency at test time (no need for fine-tuning nor post-processing). We demonstrate that the accuracy of our method is on par with state-of-the-art techniques on the challenging YouTube-VOS and DAVIS datasets, while running at 32 fps, about 4x faster than the closest competitor.
- Published
- 2019
28. Visual reranking with natural language understanding for text spotting
- Author
-
Sabir, Ahmed, Moreno-Noguer, Francesc, Padró, Lluís, King Abdullah University of Science and Technology, and Ministerio de Economía y Competitividad (España)
- Abstract
Trabajo presentado en la 14th Asian Conference on Computer Vision, celerada en Perth (Australia) del 4 al 6 de diciembre de 2018, Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text. For this, we initially rely on an off-the-shelf deep neural network, already trained with large amount of data, which provides a series of text hypotheses per input image. These hypotheses are then re-ranked using word frequencies and semantic relatedness with objects or scenes in the image. As a result of this combination, the performance of the original network is boosted with almost no additional cost. We validate our approach on ICDAR'17 dataset., This work was supported by the KASP Scholarship Program and by the MINECO project HuMoUR TIN2017-90086-R
- Published
- 2018
29. Dual-Branch CNNs for Vehicle Detection and Tracking on LiDAR Data.
- Author
-
Vaquero, Victor, del Pino, Ivan, Moreno-Noguer, Francesc, Sola, Joan, Sanfeliu, Alberto, and Andrade-Cetto, Juan
- Abstract
We present a novel vehicle detection and tracking system that works solely on 3D LiDAR information. Our approach segments vehicles using a dual-view representation of the 3D LiDAR point cloud on two independently trained convolutional neural networks, one for each view. A bounding box growing algorithm is applied to the fused output of the networks to properly enclose the segmented vehicles. Bounding boxes are grown using a probabilistic method that takes into account also occluded areas. The final vehicle bounding boxes act as observations for a multi-hypothesis tracking system which allows to estimate the position and velocity of the observed vehicles. We thoroughly evaluate our system on the KITTI benchmarks both for detection and tracking separately and show that our dual-branch classifier consistently outperforms previous single-branch approaches, improving or directly competing to other state of the art LiDAR-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. EP n P: An Accurate O ( n ) Solution to the P n P Problem
- Author
-
Lepetit, Vincent, Moreno-Noguer, Francesc, and Fua, Pascal
- Abstract
We propose a non-iterative solution to the PnP problem—the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with n. This is in contrast to state-of-the-art methods that are O(n 5) or even O(n 8), without being more accurate. Our method is applicable for all n≥4 and handles properly both planar and non-planar configurations. Our central idea is to express the n 3D points as a weighted sum of four virtual control points. The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix and solving a small constant number of quadratic equations to pick the right weights. Furthermore, if maximal precision is required, the output of the closed-form solution can be used to initialize a Gauss-Newton scheme, which improves accuracy with negligible amount of additional time. The advantages of our method are demonstrated by thorough testing on both synthetic and real-data
- Published
- 2018
31. Enhancing text spotting with a language model and visual context information
- Author
-
Sabir, Ahmed, Moreno-Noguer, Francesc, Padró, Lluís|||0000-0003-4738-5019, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,Computer vision ,Pattern recognition::Computer vision [Classificació INSPEC] - Abstract
This paper addresses the problem of detecting and recognizing text in images acquired ‘in the wild’. This is a severely under-constrained problem which needs to tackle a number of challenges including large occlusions, changing lighting conditions, cluttered backgrounds and different font types and sizes. In order to address this problem we leverage on recent and successful developments in the cross-fields of machine learning and natural language understanding. In particular, we initially rely on off-the-shelf deep networks already trained with large amounts of data and that provide a series of text hypotheses per input image. The outputs of this network are then combined with different priors obtained from both the semantic interpretation of the image and from a scene-based language model. As a result of this combination, the performance of the original network is consistently boosted. We validate our approach on ICDAR'17 shared task dataset.
- Published
- 2018
32. Vehicle pose estimation using G-Net: multi-class localization and depth estimation
- Author
-
García López, Javier, Agudo Martínez, Antonio, Moreno-Noguer, Francesc, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Generalitat de Catalunya, and Ficosa Internacional
- Subjects
Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,Pattern recognition ,Deep learning ,Vehicle detection ,Pose estimation ,Pattern recognition [Classificació INSPEC] - Abstract
In this paper we present a new network architecture, called G-Net, for 3D pose estimation on RGB images which is trained in a weakly supervised manner. We introduce a two step pipeline based on region-based Convolutional neural networks (CNNs) for feature localization, bounding box refinement based on non-maximum-suppression and depth estimation. The G-Net is able to estimate the depth from single monocular images with a self-tuned loss function. The combination of this predicted depth and the presented two-step localization allows the extraction of the 3D pose of the object. We show in experiments that our method achieves good results compared to other state-of-the-art approaches which are trained in a fully supervised manner., This work was supported by the Catalan Government inside the program ”Doctorats Industrials” and by the company FICOSA ADAS S.L.U. J. Garc´ıa Lopez is supported by ´ the industrial doctorate of the AGAUR
- Published
- 2018
33. ROS wrapper for real-time multi-person pose estimation with a single camera
- Author
-
Arduengo García, Miguel, Jorgensen, Steven Jens, Hambuchen, Kimberly, Sentis, Luis, Moreno-Noguer, Francesc, Alenyà Ribas, Guillem, Institut de Robòtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
- Subjects
Automation::Robots [Classificació INSPEC] ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,robots ,Informàtica::Robòtica [Àrees temàtiques de la UPC] - Abstract
For robots to be deployable in human occupied environments, the robots must have human-awareness and generate human-aware behaviors and policies. OpenPose is a library for real-time multi-person keypoint detection. We have considered the implementation of a ROS package that would allow the estimation of 2d pose from simple RGB images, for which we have introduced a ROS wrapper that automatically recovers the pose of several people from a single camera using OpenPose. Additionally, a ROS node to obtain 3d pose estimation from the initial 2d pose estimation when a depth image is synchronized with the RGB image (RGB-D image, such as with a Kinect camera) has been developed. This aim is attained projecting the 2d pose estimation onto the point-cloud of the depth image.
- Published
- 2017
34. Database for 3D human pose estimation from single depth images
- Author
-
Arduengo García, Miguel, Alenyà Ribas, Guillem, Moreno-Noguer, Francesc, Institut de Robòtica i Informàtica Industrial, and Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
- Subjects
image recognition ,Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,object detection ,ComputingMethodologies_COMPUTERGRAPHICS ,Pattern recognition [Classificació INSPEC] - Abstract
This work is part of the project I--DRESS (Assistive interactive robotic system for support in dressing). The specific objective is the detection of human body postures and the tracking of their movements. To this end, this work aims to create the image database needed for the training of the algorithms of pose estimation for the artificial vision of the robotic system, based on the depth images obtained by a sensor Time--of--Flight (ToF) depth camera type, such as the incorporated by the Kinect One (Kinect v2) device.
- Published
- 2016
35. Deep Convolutional Feature Point Descriptors
- Author
-
Simo-Serra, Edgar, Trulls, Eduard, Ferraz, Luis, Kokkinos, Iasonas, Fua, Pascal, Moreno-Noguer, Francesc, Waseda University, Institut de Robòtica i Informàtica Industrial (IRI), Consejo Superior de Investigaciones Científicas [Madrid] (CSIC)-Universitat Politècnica de Catalunya [Barcelona] (UPC), Catchoom Technologies [Barcelona], Organ Modeling through Extraction, Representation and Understanding of Medical Image Content (GALEN), Ecole Centrale Paris-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Artificial Intelligence Center, SRI International, Waseda University [Tokyo, Japan], Universitat Politècnica de Catalunya [Barcelona] (UPC)-Consejo Superior de Investigaciones Científicas [Madrid] (CSIC), Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Ecole Centrale Paris
- Subjects
[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2015
36. Semantic tuples for evaluation of image sentence generation
- Author
-
Ellebracht, Lily Delores, Ramisa Ayats, Arnau, Shantharam Madhyastha, Pranava Swaroop, Cordero Rama, Jose Alejandro, Moreno-Noguer, Francesc, Quattoni, Ariadna Julieta|||0000-0002-7879-7174, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, and Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
- Subjects
Informàtica::Automàtica i control [Àrees temàtiques de la UPC] ,natural language processing ,Pattern recognition::Computer vision [Classificació INSPEC] ,computer vision - Abstract
The automatic generation of image captions has received considerable attention. The problem of evaluating caption generation systems, though, has not been that much explored. We propose a novel evaluation approach based on comparing the underlying visual semantics of the candidate and ground-truth captions. With this goal in mind we have defined a semantic representation for visually descriptive language and have augmented a subset of the Flickr-8K dataset with semantic annotations. Our evaluation metric (BAST) can be used not only to compare systems but also to do error analysis and get a better understanding of the type of mistakes a system does. To compute BAST we need to predict the semantic representation for the automatically generated captions. We use the Flickr-ST dataset to train classifiers that predict STs so that evaluation can be fully automated.
- Published
- 2015
37. Fracking Deep Convolutional Image Descriptors
- Author
-
Simo-Serra, Edgar, Trulls, Eduard, Ferraz, Luis, Kokkinos, Iasonas, and Moreno-Noguer, Francesc
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper we propose a novel framework for learning local image descriptors in a discriminative manner. For this purpose we explore a siamese architecture of Deep Convolutional Neural Networks (CNN), with a Hinge embedding loss on the L2 distance between descriptors. Since a siamese architecture uses pairs rather than single image patches to train, there exist a large number of positive samples and an exponential number of negative samples. We propose to explore this space with a stochastic sampling of the training set, in combination with an aggressive mining strategy over both the positive and negative samples which we denote as "fracking". We perform a thorough evaluation of the architecture hyper-parameters, and demonstrate large performance gains compared to both standard CNN learning strategies, hand-crafted image descriptors like SIFT, and the state-of-the-art on learned descriptors: up to 2.5x vs SIFT and 1.5x vs the state-of-the-art in terms of the area under the curve (AUC) of the Precision-Recall curve.
- Published
- 2014
38. Robust Spatio-Temporal Clustering and Reconstruction of Multiple Deformable Bodies.
- Author
-
Agudo, Antonio and Moreno-Noguer, Francesc
- Subjects
- *
ROBUST control , *AUTOMATIC control systems , *CONTROL theory (Engineering) , *CLUSTERING of particles , *COAGULATION - Abstract
In this paper we present an approach to reconstruct the 3D shape of multiple deforming objects from a collection of sparse, noisy and possibly incomplete 2D point tracks acquired by a single monocular camera. Additionally, the proposed solution estimates the camera motion and reasons about the spatial segmentation (i.e., identifies each of the deforming objects in every frame) and temporal clustering (i.e., splits the sequence into motion primitive actions). This advances competing work, which mainly tackled the problem for one single object and non-occluded tracks. In order to handle several objects at a time from partial observations, we model point trajectories as a union of spatial and temporal subspaces, and optimize the parameters of both modalities, the non-observed point tracks, the camera motion, and the time-varying 3D shape via augmented Lagrange multipliers. The algorithm is fully unsupervised and does not require any training data at all. We thoroughly validate the method on challenging scenarios with several human subjects performing different activities which involve complex motions and close interaction. We show our approach achieves state-of-the-art 3D reconstruction results, while it also provides spatial and temporal segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Shape Basis Interpretation for Monocular Deformable 3-D Reconstruction.
- Author
-
Agudo, Antonio and Moreno-Noguer, Francesc
- Abstract
In this paper, we propose a novel interpretable shape model to encode object nonrigidity. We first use the initial frames of a monocular video to recover a rest shape, used later to compute a dissimilarity measure based on a distance matrix measurement. Spectral analysis is then applied to this matrix to obtain a reduced shape basis, that in contrast to existing approaches, can be physically interpreted. In turn, these precomputed shape bases are used to linearly span the deformation of a wide variety of objects. We introduce the low-rank basis into a sequential approach to recover both camera motion and nonrigid shape from the monocular video, by simply optimizing the weights of the linear combination using bundle adjustment. Since the number of parameters to optimize per frame is relatively small, specially when physical priors are considered, our approach is fast and can potentially run in real time. Validation is done in a wide variety of real-world objects, undergoing both inextensible and extensible deformations. Our approach achieves remarkable robustness to artifacts such as noisy and missing measurements and shows an improved performance to competing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. Learning RGB-D descriptors of garment parts for informed robot grasping
- Author
-
Ramisa, Arnau, Alenyà, Guillem, Moreno-Noguer, Francesc, Torras, Carme, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel.ligents, Ministerio de Ciencia e Innovación (España), European Commission, and Consejo Superior de Investigaciones Científicas (España)
- Subjects
0209 industrial biotechnology ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,garment part detection ,computer vision ,Task (project management) ,020901 industrial engineering & automation ,Artificial Intelligence ,Pattern recognition ,0202 electrical engineering, electronic engineering, information engineering ,Segmentation ,Computer vision ,Electrical and Electronic Engineering ,Bag of visual words ,manipulators ,business.industry ,pattern recognition ,object detection ,Classification ,Clothing ,robot vision ,Object detection ,machine learning ,classification ,Control and Systems Engineering ,Bag-of-words model in computer vision ,Pattern recognition (psychology) ,Robot ,020201 artificial intelligence & image processing ,Artificial intelligence ,Garment part detection ,business ,Informàtica::Robòtica [Àrees temàtiques de la UPC] ,bag-of-visual-words ,computer ,Pattern recognition::Computer vision [Classificació INSPEC] - Abstract
Robotic handling of textile objects in household environments is an emerging application that has recently received considerable attention thanks to the development of domestic robots. Most current approaches follow a multiple re-grasp strategy for this purpose, in which clothes are sequentially grasped from different points until one of them yields a desired configuration. In this work we propose a vision-based method, built on the Bag of Visual Words approach, that combines appearance and 3D information to detect parts suitable for grasping in clothes, even when they are highly wrinkled. We also contribute a new, annotated, garment part dataset that can be used for benchmarking classification, part detection, and segmentation algorithms. The dataset is used to evaluate our approach and several state-of-the-art 3D descriptors for the task of garment part detection. Results indicate that appearance is a reliable source of information, but that augmenting it with 3D information can help the method perform better with new clothing items., This research is partially funded by the Spanish Ministry of Science and Innovation under Project PAU+ DPI2011-2751, the EU Project IntellAct FP7-ICT2009-6-269959 and the ERA-Net Chistera Project ViSen PCIN-2013-047. A. Ramisa worked under the JAE-Doc grant from CSIC and FSE.
- Published
- 2014
41. Force-Based Representation for Non-Rigid Shape and Elastic Model Estimation.
- Author
-
Agudo, Antonio and Moreno-Noguer, Francesc
- Subjects
- *
VISUAL perception , *IMAGE processing , *COMPUTER graphics , *IMAGE recognition (Computer vision) , *MULTIPLE correspondence analysis (Statistics) - Abstract
This paper addresses the problem of simultaneously recovering 3D shape, pose and the elastic model of a deformable object from only 2D point tracks in a monocular video. This is a severely under-constrained problem that has been typically addressed by enforcing the shape or the point trajectories to lie on low-rank dimensional spaces. We show that formulating the problem in terms of a low-rank force space that induces the deformation and introducing the elastic model as an additional unknown, allows for a better physical interpretation of the resulting priors and a more accurate representation of the actual object’s behavior. In order to simultaneously estimate force, pose, and the elastic model of the object we use an expectation maximization strategy, where each of these parameters are successively learned by partial M-steps. Once the elastic model is learned, it can be transfered to similar objects to code its 3D deformation. Moreover, our approach can robustly deal with missing data, and encode both rigid and non-rigid points under the same formalism. We thoroughly validate the approach on Mocap and real sequences, showing more accurate 3D reconstructions than state-of-the-art, and additionally providing an estimate of the full elastic model with no a priori information. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
42. BreakingNews: Article Annotation by Image and Text Processing.
- Author
-
Ramisa, Arnau, Yan, Fei, Moreno-Noguer, Francesc, and Mikolajczyk, Krystian
- Subjects
TEXT processing (Computer science) ,ARTIFICIAL neural networks ,DEEP learning ,NATURAL language processing ,DATA visualization ,GLOBAL Positioning System - Abstract
Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of Computer Vision and Natural Language Processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of news articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce an adaptive CNN architecture that shares most of the structure for multiple tasks including source detection, article illustration and geolocation of articles. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and user comments). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
43. Boosted Random Ferns for Object Detection.
- Author
-
Vergel, Michael Villamizar, Andrade-Cetto, Juan, Sanfeliu, Alberto, and Moreno-Noguer, Francesc
- Subjects
OBJECT recognition (Computer vision) ,HISTOGRAMS ,BOOSTING algorithms ,THREE-dimensional imaging ,FEATURE extraction - Abstract
In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
44. Characterization of textile grasping experiments
- Author
-
Alenyà Ribas, Guillem, Ramisa Ayats, Arnau, Moreno-Noguer, Francesc, Torras, Carme, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents, and Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel.ligents
- Subjects
robot vision [feature extraction manipulators robot vision PARAULES AUTOR] ,Pattern recognition::Feature extraction [Classificació INSPEC] ,Enginyeria de la telecomunicació::Processament del senyal::Reconeixement de formes [Àrees temàtiques de la UPC] ,system comparison ,Pattern recognition systems ,Reconeixement de formes (Informàtica) ,textile manipulation ,repeatable experiments - Abstract
Presentado al International Conference on Robotics and Automation celebrado en USA del 14 al 18 de mayo de 2012., Grasping highly deformable objects, like textiles, is an emerging area of research that involves both perception and manipulation abilities. As new techniques appear, it becomes essential to design strategies to compare them. However, this is not an easy task, since the large state-space of textile objects explodes when coupled with the variability of grippers, robotic hands and robot arms performing the manipulation task. This high variability makes it very difficult to design experiments to evaluate the performance of a system in a repeatable way and compare it to others. We propose a framework to allow the comparison of different grasping methods for textile objects. Instead of measuring each component separately, we therefore propose a methodology to explicitly measure the vision-manipulation correlation by taking into account the throughput of the actions. Perceptions of deformable objects should be grouped into different clusters, and the different grasping actions available should be tested for each perception type to obtain the action-perception success ratio. This characterization potentially allows to compare very different systems in terms of specialized actions, perceptions or widely useful actions, along with the cost of performing each action. We will also show that this categorization is useful in manipulation planning of deformable objects., This work was supported by the Spanish Ministry of Science and Innovation under projects PAU+ DPI2011-27510, by the EU project INTELLACT 247947 FP7-269959 and by the Catalan Research Commission through SGR-00155. A. Ramisa worked under the JAE-DOC grant from the CSIC and the FSE.
- Published
- 2012
45. Accurate and Linear Time Pose Estimation from Points and Lines.
- Author
-
Vakhitov, Alexander, Funke, Jan, and Moreno-Noguer, Francesc
- Published
- 2016
- Full Text
- View/download PDF
46. Mode-shape interpretation: Re-thinking modal space for recovering deformable shapes.
- Author
-
Agudo, Antonio, Montiel, J. M. M., Calvo, Begona, and Moreno-Noguer, Francesc
- Published
- 2016
- Full Text
- View/download PDF
47. Dense Segmentation-Aware Descriptors.
- Author
-
Trulls, Eduard, Kokkinos, Iasonas, Sanfeliu, Alberto, and Moreno-Noguer, Francesc
- Published
- 2016
- Full Text
- View/download PDF
48. Multiple cue integration for robust tracking in dynamic environments: application to video relighting
- Author
-
Moreno-Noguer, Francesc, Belhumeur, Peter N., Sanfeliu, Alberto, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, and Sanfeliu Cortés, Alberto
- Subjects
re-il·luminació de vídeos ,filtres bayesians ,visió per computadors ,Informàtica [Àrees temàtiques de la UPC] ,Visió per ordinador ,gràfics de computador ,fusió de dades ,segmentació d'imatges ,Moviment -- Anàlisi ,seguiment automàtic d'objectes - Abstract
L'anàlisi de moviment i seguiment d'objectes ha estat un dels pricipals focus d'atenció en la comunitat de visió per computador durant les dues darreres dècades. L'interès per aquesta àrea de recerca resideix en el seu ample ventall d'aplicabilitat, que s'extén des de tasques de navegació de vehicles autònoms i robots, fins a aplications en la indústria de l'entreteniment i realitat virtual.Tot i que s'han aconseguit resultats espectaculars en problemes específics, el seguiment d'objectes continua essent un problema obert, ja que els mètodes disponibles són propensos a ser sensibles a diversos factors i condicions no estacionàries de l'entorn, com ara moviments impredictibles de l'objecte a seguir, canvis suaus o abruptes de la il·luminació, proximitat d'objectes similars o fons confusos. Enfront aquests factors de confusió la integració de múltiples característiques ha demostrat que permet millorar la robustesa dels algoritmes de seguiment. En els darrers anys, degut a la creixent capacitat de càlcul dels ordinadors, hi ha hagut un significatiu increment en el disseny de complexes sistemes de seguiment que consideren simultàniament múltiples característiques de l'objecte. No obstant, la majoria d'aquests algoritmes estan basats enheurístiques i regles ad-hoc formulades per aplications específiques, fent-ne impossible l'extrapolació a noves condicions de l'entorn.En aquesta tesi proposem un marc probabilístic general per integrar el nombre de característiques de l'objecte que siguin necessàries, permetent que interactuin mútuament per tal d'estimar-ne el seu estat amb precisió, i per tant, estimar amb precisió la posició de l'objecte que s'està seguint. Aquest marc, s'utilitza posteriorment per dissenyar un algoritme de seguiment, que es valida en diverses seqüències de vídeo que contenen canvis abruptes de posició i il·luminació, camuflament de l'objecte i deformacions no rígides. Entre les característiques que s'han utilitzat per representar l'objecte, cal destacar la paramatrització robusta del color en un espai de color dependent de l'objecte, que permet distingir-lo del fons més clarament que altres espais de color típicament ulitzats al llarg de la literatura.En la darrera part de la tesi dissenyem una tècnica per re-il·luminar tant escenes estàtiques com en moviment, de les que s'en desconeix la geometria. La re-il·luminació es realitza amb un mètode 'basat en imatges', on la generació de les images de l'escena sota noves condicions d'il·luminació s'aconsegueix a partir de combinacions lineals d'un conjunt d'imatges de referència pre-capturades, i que han estat generades il·luminant l'escena amb patrons de llum coneguts. Com que la posició i intensitat de les fonts d'il.luminació que formen aquests patrons de llum es pot controlar, és natural preguntar-nos: quina és la manera més òptima d'il·luminar una escena per tal de reduir el nombre d'imatges de referència? Demostrem que la millor manera d'il·luminar l'escena (és a dir, la que minimitza el nombre d'imatges de referència) no és utilitzant una seqüència de fonts d'il·luminació puntuals, com es fa generalment, sinó a través d'una seqüència de patrons de llum d'una base d'il·luminació depenent de l'objecte. És important destacar que quan es re-il·luminen seqüències de vídeo, les imatges successives s'han d'alinear respecte a un sistema de coordenades comú. Com que cada imatge ha estat generada per un patró de llum diferent il·uminant l'escena, es produiran canvis d'il·luminació bruscos entre imatges de referència consecutives. Sota aquestes circumstàncies, el mètode de seguiment proposat en aquesta tesi juga un paper fonamental. Finalment, presentem diversos resultats on re-il·luminem seqüències de vídeo reals d'objectes i cares d'actors en moviment. En cada cas, tot i que s'adquireix un únic vídeo, som capaços de re-il·luminar una i altra vegada, controlant la direcció de la llum, la seva intensitat, i el color., Motion analysis and object tracking has been one of the principal focus of attention over the past two decades within the computer vision community. The interest of this research area lies in its wide range of applicability, extending from autonomous vehicle and robot navigation tasks, to entertainment and virtual reality applications.Even though impressive results have been obtained in specific problems, object tracking is still an open problem, since available methods are prone to be sensitive to several artifacts and non-stationary environment conditions, such as unpredictable target movements, gradual or abrupt changes of illumination, proximity of similar objects or cluttered backgrounds. Multiple cue integration has been proved to enhance the robustness of the tracking algorithms in front of such disturbances. In recent years, due to the increasing power of the computers, there has been a significant interest in building complex tracking systems which simultaneously consider multiple cues. However, most of these algorithms are based on heuristics and ad-hoc rules formulated for specific applications, making impossible to extrapolate them to new environment conditions.In this dissertation we propose a general probabilistic framework to integrate as many object features as necessary, permitting them to mutually interact in order to obtain a precise estimation of its state, and thus, a precise estimate of the target position. This framework is utilized to design a tracking algorithm, which is validated on several video sequences involving abrupt position and illumination changes, target camouflaging and non-rigid deformations. Among the utilized features to represent the target, it is important to point out the use of a robust parameterization of the target color in an object dependent colorspace which allows to distinguish the object from the background more clearly than other colorspaces commonly used in the literature.In the last part of the dissertation, we design an approach for relighting static and moving scenes with unknown geometry. The relighting is performed through an -image-based' methodology, where the rendering under new lighting conditions is achieved by linear combinations of a set of pre-acquired reference images of the scene illuminated by known light patterns. Since the placement and brightness of the light sources composing such light patterns can be controlled, it is natural to ask: what is the optimal way to illuminate the scene to reduce the number of reference images that are needed? We show that the best way to light the scene (i.e., the way that minimizes the number of reference images) is not using a sequence of single, compact light sources as is most commonly done, but rather to use a sequence of lighting patterns as given by an object-dependent lighting basis. It is important to note that when relighting video sequences, consecutive images need to be aligned with respect to a common coordinate frame. However, since each frame is generated by a different light pattern illuminating the scene, abrupt illumination changes between consecutive reference images are produced. Under these circumstances, the tracking framework designed in this dissertation plays a central role. Finally, we present several relighting results on real video sequences of moving objects, moving faces, and scenes containing both. In each case, although a single video clip was captured, we are able to relight again and again, controlling the lighting direction, extent, and color.
- Published
- 2005
49. Research at the learning and vision mobile robotics group 2004-2005
- Author
-
Scandaliaris, Jorge, Alquézar Mancho, Renato, Andrade-Cetto, Juan, Aranda López, Juan, Climent Vilaro, Juan, Grau Saldes, Antoni, Mirats-Tur, Josep M., Moreno-Noguer, Francesc, Vergés Llahí, Jaume, Vidal-Calleja, Teresa A., and Sanfeliu, Alberto
- Subjects
Pattern recognition: Computer vision ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Automation: Robots ,Computer vision ,Robotics ,Computer vision [Pattern recognition] ,Robots ,Robots [Automation] - Abstract
Spanish Congress on Informatics (CEDI), 2005, Granada (España), This article presents the current trends on wheeled mobile robotics being pursued at the Learning and Vision Mobile Robotics Group (IRI). It includes an overview of recent results produced in our group in a wide range of areas, including robot localization, color invariance, segmentation, tracking, audio processing and object learning and recognition., This work was supported by projects: 'Supervised learning of industrial scenes by means of an active vision equipped mobile robot.' (J-00063), 'Integration of robust perception, learning, and navigation systems in mobile robotics' (J-0929).
- Published
- 2005
50. Integration of conditionally dependent object features for robust figure-background segmentation
- Author
-
Moreno-Noguer, Francesc, Sanfeliu, Alberto, and Samaras, Dimitris
- Subjects
Object detection [Pattern recognition] ,Bayesian methods ,business.industry ,Object detection ,Feature extraction ,Pattern recognition: Computer vision ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Pattern recognition ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Pattern recognition systems ,Image segmentation ,Computer vision [Pattern recognition] ,Color space ,ComputingMilieux_GENERAL ,Robustness (computer science) ,Computer vision ,Segmentation ,Artificial intelligence ,business ,Particle filter ,Pattern recognition: Object detection ,Mathematics - Abstract
IEEE International Conference on Computer Vision (ICCV) 2005, Beijing (China), We propose a new technique for focusing multiple cues to robustly segment an object from its background in video sequences that suffer from abrupt changes of both illumination and position of the target. Robustness is achieved by tile integration of appearance and geometric object features and by their description using particle filters. Previous approaches assume independence of the object cues or apply the particle filter formulation to only one of the features, and assume a smooth change in the rest, which can prove is very limiting, especially when the state of some features needs to be updated using other cues or when their dynamics follow non-linear and unpredictable paths. Our technique offers a general framework to model the probabilistic relationship between features. The proposed method is analytically justified and applied to develop a robust tracking system that adapts online and simultaneously the color space where the image points are represented, the color distributions, and the contour of the object. Results with synthetic data and real video sequences demonstrate the robustness and versatility of our method., This work was supported by projects: 'Navegación autónoma de robots guiados por objetivos visuales' (070-720), 'Supervised learning of industrial scenes by means of an active vision equipped mobile robot.' (J-00063).
- Published
- 2005
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.