Author: "Sanchez-Riera, Jordi" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sanchez-Riera, Jordi"' showing total 9 results

Start Over Author "Sanchez-Riera, Jordi" Search Limiters Available in Library Collection

9 results on '"Sanchez-Riera, Jordi"'

1. MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Author: Ugrinovic, Nicolas, Pan, Boxiao, Pavlakos, Georgios, Paschalidou, Despoina, Shen, Bokui, Sanchez-Riera, Jordi, Moreno-Noguer, Francesc, and Guibas, Leonidas
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipeline in which the motion estimated by a kinematic-based method is fed into a physics simulator in an autoregressive manner. We introduce distinct components that enable our model to harness the simulator's properties without compromising the accuracy of the kinematic estimates. This results in final motion estimates that are both kinematically coherent and physically compliant. Extensive evaluations on three challenging datasets characterized by substantial inter-person interaction show that our method significantly reduces errors associated with penetration and foot skating, while performing competitively with the state-of-the-art on motion accuracy and smoothness. Results and code can be found on our project page (http://www.iri.upc.edu/people/nugrinovic/multiphys/).
Published: 2024

2. InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering

Author: Canela, Antonio, Caselles, Pol, Malik, Ibrar, Ramon, Eduard, García, Jaime, Sánchez-Riera, Jordi, Triginer, Gil, and Moreno-Noguer, Francesc
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
Published: 2023

3. PhysXNet: A Customizable Approach for LearningCloth Dynamics on Dressed People

Author: Sanchez-Riera, Jordi, Pumarola, Albert, and Moreno-Noguer, Francesc
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce PhysXNet, a learning-based approach to predict the dynamics of deformable clothes given 3D skeleton motion sequences of humans wearing these clothes. The proposed model is adaptable to a large variety of garments and changing topologies, without need of being retrained. Such simulations are typically carried out by physics engines that require manual human expertise and are subjectto computationally intensive computations. PhysXNet, by contrast, is a fully differentiable deep network that at inference is able to estimate the geometry of dense cloth meshes in a matter of milliseconds, and thus, can be readily deployed as a layer of a larger deep learning architecture. This efficiency is achieved thanks to the specific parameterization of the clothes we consider, based on 3D UV maps encoding spatial garment displacements. The problem is then formulated as a mapping between the human kinematics space (represented also by 3D UV maps of the undressed body mesh) into the clothes displacement UV maps, which we learn using a conditional GAN with a discriminator that enforces feasible deformations. We train simultaneously our model for three garment templates, tops, bottoms and dresses for which we simulate deformations under 50 different human actions. Nevertheless, the UV map representation we consider allows encapsulating many different cloth topologies, and at test we can simulate garments even if we did not specifically train for them. A thorough evaluation demonstrates that PhysXNet delivers cloth deformations very close to those computed with the physical engine, opening the door to be effectively integrated within deeplearning pipelines.
Published: 2021

4. Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision

Author: Ren, Ruijie, Rajesh, Mohit Gurnani, Sanchez-Riera, Jordi, Zhang, Fan, Tian, Yurun, Agudo, Antonio, Demiris, Yiannis, Mikolajczyk, Krystian, and Moreno-Noguer, Francesc
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognising larger segments, such as semantic edges that carry more information than points. The first goal of this paper is therefore to tackle the problem of fine-grained region detection in deformed clothes using only a depth image. As a proof of concept, we implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-net based network to segment and label these parts. The second contribution of our work is concerned with the level of supervision that we require to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we defy the limitations of the synthetic data, and propose a multilayered domain adaptation (DA) strategy that does not use real annotations at all. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels. We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data., Comment: 6 pages, 4 figures. Submitted to International Conference on Robotics and Automation (ICRA)
Published: 2021

5. AVATAR: Blender add-on for fast creation of 3D human models

Author: Sanchez-Riera, Jordi, Civit, Aniol, Altarriba, Marta, and Moreno-Noguer, Francesc
Subjects: Computer Science - Graphics
Abstract: Create an articulated and realistic human 3D model is a complicated task, not only get a model with the right body proportions but also to the whole process of rigging the model with correct articulation points and vertices weights. Having a tool that can create such a model with just a few clicks will be very advantageous for amateurs developers to use in their projects, researchers to easily generate datasets to train neural networks and industry for game development. We present a software that is integrated in Blender in form of add-on that allows us to design and animate a dressed 3D human models based on Makehuman with just a few clicks. Moreover, as it is already integrated in Blender, python scripts can be created to animate, render and further customize the current available options., Comment: 7 pages, 2 figures, software description
Published: 2021

6. Robust RGB-D Hand Tracking Using Deep Learning Priors

Author: Sanchez-Riera, Jordi, primary, Srinivasan, Kathiravan, additional, Hua, Kai-Lung, additional, Cheng, Wen-Huang, additional, Hossain, M. Anwar, additional, and Alhamid, Mohammed F., additional
Published: 2018
Full Text: View/download PDF

7. Capacités audiovisuelles en robot humanoïde NAO

Author: Sanchez-Riera, Jordi, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Université de Grenoble, Radu Horaud, and STAR, ABES
Subjects: [INFO.INFO-OH] Computer Science [cs]/Other [cs.OH], Audiovisual fusion, Fusion audiovisuelle, [INFO.INFO-OH]Computer Science [cs]/Other [cs.OH], Stereo vision, Reconnaissance d'actions, Action recognition
Abstract: In this thesis we plan to investigate the complementarity of auditory and visual sensory data for building a high-level interpretation of a scene. The audiovisual (AV) input received by the robot is a function of both the external environment and of the robot's actual localization which is closely related to its actions. Current research in AV scene analysis has tended to focus on ﬁxed perceivers. However, psychophysical evidence suggests that humans use small head and body movements, in order to optimize the location of their ears with respect to the source. Similarly, by walking or turning, the robot may be able to improve the incoming visual data. For example, in binocular perception, it is desirable to reduce the viewing distance to an object of interest. This allows the 3D structure of the object to be analyzed at a higher depth-resolution., Dans cette thèse nous avons l'intention d'enquêter sur la complémentarité des données auditives et visuelles sensorielles pour la construction d'une interprétation de haut niveau d'une scène. L'audiovisuel (AV) d'entrée reçus par le robot est une fonction à la fois l'environnement extérieur et de la localisation réelle du robot qui est étroitement liée à ses actions. La recherche actuelle dans AV analyse de scène a eu tendance à se concentrer sur les observateurs fixes. Toutefois, la preuve psychophysique donne à penser que les humains utilisent petite tête et les mouvements du corps, afin d'optimiser l'emplacement de leurs oreilles à l'égard de la source. De même, en marchant ou en tournant, le robot mai être en mesure d'améliorer les données entrantes visuelle. Par exemple, dans la perception binoculaire, il est souhaitable de réduire la distance de vue à un objet d'intérêt. Cela permet à la structure 3D de l'objet à analyser à une profondeur de résolution supérieure.
Published: 2013

8. Developing Audio-Visual capabilities of humanoid robot NAO

Author: Sanchez-Riera, Jordi, team, Perception, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Université de Grenoble, and Radu Horaud(radu.horaud@inria.fr)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], multimodal fusion, [INFO.INFO-RB] Computer Science [cs]/Robotics [cs.RO], [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], audition, vison par ordinateur, robot hearing, computer vision, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], human-robot interaction, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], interaction humain-robot, robot humanoïde, humanoid robotics
Abstract: Humanoid robots are becoming more and more important in our daily lives due the high potential they have to help persons in different situations. To be able to aid, a human-robot interaction is essential and to this end, it is important to use as well as possible, the external information collected by the different sensors of the robot. Usually most relevant sensors for perception are cameras and micro- phones, which provide very rich information about the world. In this thesis, we plan to develop applications towards human-robot interaction and to achieve a more natural communication when interacting with the robot. Taking advantage of the information provided by the cameras and microphones of NAO humanoid robot, we present new algorithms and applications using these sensors. With the visual information we introduce two different stereo algorithms, that will serve as a basis to design other applications. The first stereo algorithm is designed to avoid problems with textureless regions using information from images in dif- ferent temporal instances. The second stereo algorithm, sceneflow, is designed to provide a more complete understanding of a scene, adding optical flow infor- mation in the computation of disparity. Indeed, position and velocity vector is available for each pixel. This provides a basis to start developing more high-level applications to a certain extent of interaction. Using the sceneflow algorithm, a descriptor is designed for action recognition. As a result, action recognition ben- efits from richer information in opposition to traditional monocular approaches, giving robustness to background clutter and disambiguating depth actions like 'punch'. To complement and improve the performance in action recognition, au- ditory information is added. It is well known that auditory data is complementary to the visual data and can be helpful in situations where objects are occluded or simply are not there. Finally, a last application developed towards a better human-robot interaction is a speaker detector. This can be used, for example, to center camera images to the speaking person (person of interest) and collect more reliable information. Here data from video and audio is also used, but the principle is completely different: from the visual and auditory features used to the way that these features are combined., Les robots humanoïdes sont de plus en plus important dans nos vies quotidiennes en raison du fort potentiel qu'ils ont pour aider les personnes. Pour être en mesure d'aider, il est nécessaire que le robot peut communiquer avec les humains, et pour cela, il est l'information importante du monde collectées par les capteurs intégrés au robot. Dans notre cas particulier, le rellevant la plupart sont des cam ́eras et des micros, qui peuvent fournir une description assez complète de l'environnement du robot. Dans cette th'ese, nous avons l'intention d'utiliser les informations fournies par les caméras et les micros de robot humano ̈ıde Nao de d ́evelopper des applications qui permettent une interaction homme-robot. Avec l'information visuelle deux algorithmes diff ́erents st ́er ́eo, qui serviront de base pour concevoir d'autres applications, sont pr ́esent ́es. La premi'ere utilise des in- formations provenant framse temporelle diff ́erente de surmonter certains prob- lmes avec les r ́egions sans texture, tandis que la deuxi'eme chaˆıne hi-fi et le flux optique sont recherch ́ees en mˆeme temps afin d'avoir plus d'informations sur la sc'ene. Dans les vecteurs de b ́eton, de position et de vitesse pour chaque pixel. Est le dernier algorithme que le descripteur est con ̧cu pour la reconnaissance d'actions avec des donn ́ees st ́er ́eo. Le but de cela est de tirer parti de l'information suppl ́ementaire qui peut fournir l'st ́er ́eo comme en face de traditionnels algo- rithmes monoculaires qui existent 'a ce jour. Pour compl ́eter et am ́eliorer le taux de reconnaissance moyen de la reconnaissance d'actions, l'information auditive est ́egalement utilis ́e. Il est bien connu que les donn ́ees provenant visuelle et capteurs auditifs est compl ́ementaire et peut aider dans des situations ou' des objets sont cach ́e ou ne sont tout simplement pas l'a. Enfin, une derni'ere application vers une meilleure interaction entre l'humain et le robot est un d ́etecteur de haut-parleur. en ce cas, les donn ́ees des deux modalit ́es est ́egalement utilis ́e, mais il en diff'ere sur la mani'ere dont les informations sont combin ́ees, ainsi que les informations extraites de capteurs visuels et auditifs. Presque la totalit ́e des applications sont mises en œuvre et ex ́ecuter en robot humano ̈ıde NAO.
Published: 2013

9. Robust Spatiotemporal Stereo for Dynamic Scenes

Author: Sanchez-Riera, Jordi, Cech, Jan, Horaud, Radu, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), and team, Perception
Subjects: [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]
Abstract: International audience; Stereo matching is a challenging problem, especially in the presence of noise or of weakly textured objects. Using temporal information in a binocular video sequence to increase the discriminability for matching has been introduced in the recent past, but all the proposed methods assume either constant disparity over time, or small object motions, which is not always true. We introduce a novel stereo algorithm that exploits temporal information by robustly aggregating a similarity statistic over time, in order to improve the matching accuracy for weak data, while preserving regions undergoing large motions without introducing artifacts.
Published: 2012

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Sanchez-Riera, Jordi"'

1. MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

2. InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering

3. PhysXNet: A Customizable Approach for LearningCloth Dynamics on Dressed People

4. Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision

5. AVATAR: Blender add-on for fast creation of 3D human models

6. Robust RGB-D Hand Tracking Using Deep Learning Priors

7. Capacités audiovisuelles en robot humanoïde NAO

8. Developing Audio-Visual capabilities of humanoid robot NAO

9. Robust Spatiotemporal Stereo for Dynamic Scenes

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Sanchez-Riera, Jordi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources