151 results on '"Patrick Le Callet"'
Search Results
2. SMGEA: A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories
- Author
-
Guodong Guo, Ali Borji, Xiongkuo Min, Suiyi Ling, Zhaohui Che, Patrick Le Callet, Jing Li, and Guangtao Zhai
- Subjects
Memory, Long-Term ,Computer Networks and Communications ,Computer science ,business.industry ,Transferability ,02 engineering and technology ,Machine learning ,computer.software_genre ,Net (mathematics) ,Computer Science Applications ,Term (time) ,Adversarial system ,Artificial Intelligence ,Transfer (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Learning ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Artificial intelligence ,business ,computer ,Algorithms ,Software - Abstract
Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of source models transfer to other target models and, thus, pose a security threat to black-box applications (when attackers have no access to the target models). Current transfer-based ensemble attacks, however, only consider a limited number of source models to craft an adversarial example and, thus, obtain poor transferability. Besides, recent query-based black-box attacks, which require numerous queries to the target model, not only come under suspicion by the target model but also cause expensive query cost. In this article, we propose a novel transfer-based black-box attack, dubbed serial-minigroup-ensemble-attack (SMGEA). Concretely, SMGEA first divides a large number of pretrained white-box source models into several ``minigroups.'' For each minigroup, we design three new ensemble strategies to improve the intragroup transferability. Moreover, we propose a new algorithm that recursively accumulates the ``long-term'' gradient memories of the previous minigroup to the subsequent minigroup. This way, the learned adversarial information can be preserved, and the intergroup transferability can be improved. Experiments indicate that SMGEA not only achieves state-of-the-art black-box attack ability over several data sets but also deceives two online black-box saliency prediction systems in real world, i.e., DeepGaze-II (https://deepgaze.bethgelab.org/) and SALICON (http://salicon.net/demo/). Finally, we contribute a new code repository to promote research on adversarial attack and defense over ubiquitous pixel-to-pixel computer vision tasks. We share our code together with the pretrained substitute model zoo at https://github.com/CZHQuality/AAA-Pix2pix.
- Published
- 2022
3. UIF: An Objective Quality Assessment for Underwater Image Enhancement
- Author
-
Yannan Zheng, Weiling Chen, Rongfu Lin, Tiesong Zhao, Patrick Le Callet, Fuzhou University [Fuzhou], Image Perception Interaction (LS2N - équipe IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), and Nantes Université (Nantes Univ)
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Image and Video Processing (eess.IV) ,Image Quality Assessment (IQA) ,Computer Science - Computer Vision and Pattern Recognition ,FOS: Electrical engineering, electronic engineering, information engineering ,underwater image processing ,Underwater Image Enhancement (UIE) ,Electrical Engineering and Systems Science - Image and Video Processing ,Computer Graphics and Computer-Aided Design ,Software - Abstract
Due to complex and volatile lighting environment, underwater imaging can be readily impaired by light scattering, warping, and noises. To improve the visual quality, Underwater Image Enhancement (UIE) techniques have been widely studied. Recent efforts have also been contributed to evaluate and compare the UIE performances with subjective and objective methods. However, the subjective evaluation is time-consuming and uneconomic for all images, while existing objective methods have limited capabilities for the newly-developed UIE approaches based on deep learning. To fill this gap, we propose an Underwater Image Fidelity (UIF) metric for objective evaluation of enhanced underwater images. By exploiting the statistical features of these images, we present to extract naturalness-related, sharpness-related, and structure-related features. Among them, the naturalness-related and sharpness-related features evaluate visual improvement of enhanced images; the structure-related feature indicates structural similarity between images before and after UIE. Then, we employ support vector regression to fuse the above three features into a final UIF metric. In addition, we have also established a large-scale UIE database with subjective scores, namely Underwater Image Enhancement Database (UIED), which is utilized as a benchmark to compare all objective metrics. Experimental results confirm that the proposed UIF outperforms a variety of underwater and general-purpose image quality metrics., Comment: This paper was submitted to ACMMM 2021
- Published
- 2022
4. Quality Assessment of DIBR-Synthesized Views Based on Sparsity of Difference of Closings and Difference of Gaussians
- Author
-
Dragana D. Sandic-Stankovic, Dragan D. Kukolj, and Patrick Le Callet
- Subjects
Imaging, Three-Dimensional ,Primary Visual Cortex ,Normal Distribution ,Neural Networks, Computer ,Computer Graphics and Computer-Aided Design ,Algorithms ,Software - Abstract
Images synthesized using depth-image-based-rendering (DIBR) techniques may suffer from complex structural distortions. The goal of the primary visual cortex and other parts of brain is to reduce redundancies of input visual signal in order to discover the intrinsic image structure, and thus create sparse image representation. Human visual system (HVS) treats images on several scales and several levels of resolution when perceiving the visual scene. With an attempt to emulate the properties of HVS, we have designed the no-reference model for the quality assessment of DIBR-synthesized views. To extract a higher-order structure of high curvature which corresponds to distortion of shapes to which the HVS is highly sensitive, we define a morphological oriented Difference of Closings (DoC) operator and use it at multiple scales and resolutions. DoC operator nonlinearly removes redundancies and extracts fine grained details, texture of an image local structure and contrast to which HVS is highly sensitive. We introduce a new feature based on sparsity of DoC band. To extract perceptually important low-order structural information (edges), we use the non-oriented Difference of Gaussians (DoG) operator at different scales and resolutions. Measure of sparsity is calculated for DoG bands to get scalar features. To model the relationship between the extracted features and subjective scores, the general regression neural network (GRNN) is used. Quality predictions by the proposed DoC-DoG-GRNN model show higher compatibility with perceptual quality scores in comparison to the tested state-of-the-art metrics when evaluated on four benchmark datasets with synthesized views, IRCCyN/IVC image/video dataset, MCL-3D stereoscopic image dataset and IST image dataset.
- Published
- 2022
5. List of contributors
- Author
-
Ali Ak, Martin Alain, Faouzi Alaya Cheikh, Evangelos Alexiou, Federica Battisti, Michel Bätz, Marco Cagnazzo, Pablo Cesar, Fang-Yi Chao, Kelvin Chelli, Siheng Chen, Simone Croci, Frédéric Dufaux, Peter Eisert, Ingo Feldmann, Laura Fink, Siegfried Fößel, Søren Forchhammer, Stephan Fremerey, Patrick Garus, Florian Goldmann, Danillo Graziosi, Alan Guedes, Muhammad Shahzeb Khan Gul, Cornelius Hellge, Volker Helzle, Thorsten Herfet, Anna Hilsmann, Tobias Jaschke, Joël Jung, Joachim Keinert, Maja Krivokuća, Guillaume Lavoué, Pierre Lebreton, Patrick Le Callet, Mikael Le Pendu, Jie Li, Jingyu Liu, Claire Mantel, Rafał K. Mantiuk, Jean-Eudes Marvie, Thomas Maugey, Marta Milovanović, Yana Nehmé, Néill O'Dwyer, Cagri Ozcinar, Rafael Palomar, Jiahao Pang, Egidijus Pelanis, Nico Prappacher, Rahul Prasanna Kumar, Maurice Quach, Alexander Raake, Silvia Rossi, Oliver Schreer, Ashutosh Singla, Aljosa Smolic, Milan Stepanov, Dong Tian, Laura Toni, Giuseppe Valenzise, Irene Viola, Congcong Wang, Gareth W. Young, Jin Zeng, Emin Zerman, Fangcheng Zhong, and Matthias Ziegler
- Published
- 2023
6. Behavioral phenotype features of autism
- Author
-
Huiyu Duan, Jesús Gutiérrez, Zhaohui Che, Patrick Le Callet, and Guangtao Zhai
- Published
- 2023
7. List of contributors
- Author
-
Sara A. Abdulla, Usama I. Abdulrazak, Francisco Alcantud-Marín, Yurena Alonso-Esteban, Jennifer R. Bertollo, Romuald Carette, Manuel F. Casanova, Nadire Cavus, Zhaohui Che, Federica Cilia, Rosane Meire Munhak da Silva, Angela V. Dahiya, Y. De Diego-Otero, Kaushik Deb, Gilles Dequen, Huiyu Duan, Mahmoud Elbattah, Ayman S. El-Baz, Ahmed K. Elsayed, Sukru Eraslan, Zoya Farooqui, M.R. Gómez-Soler, Jean-Luc Guérin, Jesús Gutiérrez, Le An Ha, Michael Helde, Danielle (Hyun Jung) Kim, Haruhide Kimura, Abdulmalik A. Lawan, Patrick Le Callet, Emily Li, Elisa Maria Bezerra Maia, Satoru Matsuda, Christina G. McDonnell, Ruslan Mitkov, Soraia Mayane Souza Mota, Srushti Nerkar, Alexandra Ramirez-Celis, M.L. Ríos-Rodríguez, Liliana P. Rojas-Torres, J.M. Salgado-Cacho, Salam Salloum-Asfar, Branden Sattler, Angela Scarpa, Lonnie Sears, Mohamed Shaban, Reinaldo Antonio Silva-Sobrinho, Arjun Singh, Estate M. Sokhadze, Faria Zarin Subah, Sadiya Tahir, Allan Tasman, Unyime Usua, Judy Van de Water, Victoria Yaneva, Yeliz Yesilada, Rufa'i Yunusa, Guangtao Zhai, and Adriana Zilly
- Published
- 2023
8. Quality evaluation of light fields
- Author
-
Ali Ak and Patrick Le Callet
- Published
- 2023
9. Contributors
- Author
-
Xavier Amatriain, Yogesh Balaji, Stefan Bekiranov, Vasileios Belagiannis, Anas-Alexis Benyoussef, Gustavo Carneiro, Manish Chablani, Cheng Chen, Hyun Jae Cho, Jingyuan Chou, Béatrice Cochener, Pierre-Henri Conze, Youssef Dawoud, Thanh-Toan Do, Qi Dou, Azade Farshad, Chi-Wing Fu, Abhijit Guha Roy, Pengfei Guo, Pheng-Ann Heng, Hieu Hoang, Shanshan Jiang, Yueming Jin, Anitha Kannan, Jieum Kim, Mathieu Lamard, Ngan Le, Patrick Le Callet, Alexandre Le Guilcher, Xiaomeng Li, Suiyi Ling, Quande Liu, Pascale Massin, Sarah Matta, Aryan Mobiny, Jacinto C. Nascimento, Nassir Navab, Cuong C. Nguyen, Hien Van Nguyen, Andreas Pastor, Vishal M. Patel, Angshuman Paul, Sebastian Pölsterl, Viraj Prabhu, Gwenolé Quellec, Murali Ravuri, Vincent Ricquebourg, Jean-Bernard Rottier, Swami Sankaranarayanan, Thomas C. Shen, Shayan Siddiqui, David Sontag, Ronald M. Summers, Qiuling Suo, Yu-Xing Tang, Minh-Triet Tran, Viet-Khoa Vo-Ho, Christian Wachinger, Puyang Wang, Lei Xing, Kashu Yamazaki, Yousef Yeganeh, Lequan Yu, Pengyu Yuan, Chongzhi Zang, Aidong Zhang, and Jinyuan Zhou
- Published
- 2023
10. Case study: few-shot pill recognition
- Author
-
Andreas Pastor, Suiyi Ling, Jieum Kim, and Patrick Le Callet
- Published
- 2023
11. Imitation from Observation using RL and Graph-based Representation of Demonstrations
- Author
-
Yassine El Manyari, Patrick Le Callet, and Laurent Dollé
- Published
- 2022
12. When is the Cleaning of Subjective Data Relevant to Train UGC Video Quality Metrics?
- Author
-
Anne-Flore Perrin, Charles Dormeval, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Patrick Le Callet, CAPACITÉS SAS, Nantes Université (Nantes Univ), Google Inc [Mountain View], Research at Google, Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, and Open access of the research financed by Google inc. in collaboration with the Youtube team
- Subjects
User Generated Content ,Training metrics ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Outlier ,Cleaning ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Video Quality Metrics - Abstract
International audience; Outlier analysis and spammer detection recently gained momentum in order to reduce uncertainty of subjective ratings in image & video quality assessment tasks. The large proportion of unreliable ratings from online crowdsourcing experiments and the need for qualitative and quantitative large-scale studies in the deep-learning ecosystem played a role in this event. We study the effect that data cleaning has on trainable models predicting the visual quality for videos, and present results demonstrating when cleaning is necessary to reach higher efficiency. To this end, we present and analyze a benchmark on clean and noisy User Generated Content (UGC) large-scale datasets on which we retrained models, followed by an empirical exploration of the constraint of data removal. Our results shows that a dataset presenting between 7 and 30% of outliers benefits from cleaning before training.
- Published
- 2022
13. QoEVMA'22: 2nd Workshop on Quality of Experience (QoE) in Visual Multimedia Applications
- Author
-
Jing Li, Patrick Le Callet, Xinbo Gao, Zhi Li, Wen Lu, Jiachen Yang, and Junle Wang
- Published
- 2022
14. From Just Noticeable Differences to Image Quality
- Author
-
Ali Ak, Andreas Pastor, and Patrick Le Callet
- Published
- 2022
15. Perceptual Quality Assessment for Asymmetrically Distorted Stereoscopic Video by Temporal Binocular Rivalry
- Author
-
Patrick Le Callet, Yuming Fang, Jiheng Wang, Jiebin Yan, Jianjun Lei, and Xiangjie Sui
- Subjects
Binocular rivalry ,Computer science ,business.industry ,Image quality ,media_common.quotation_subject ,Pattern recognition ,Weighting ,Visualization ,Quality (physics) ,Perception ,Distortion ,Media Technology ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Energy (signal processing) ,media_common - Abstract
In this paper, we propose a two-stage weighting based perceptual quality assessment framework for asymmetrically distorted stereoscopic video (SV) sequences by temporal binocular rivalry. Firstly, a traditional 2D image quality assessment (IQA) method is employed to measure spatial distortion, and the temporal distortion is evaluated by the magnitude differences between motion vectors of distorted and reference video frames. Secondly, the structural strength (SS) computed by gradient map and the motion energy (ME) computed by frame difference map are used to estimate the intensity of visual stimulus in spatial and temporal domain respectively. Then, SS and ME are considered as the importance indexes to combine the quality scores of spatial and temporal distortion to estimate perceived distortion of single-view video sequences, which is denoted as the first-stage weighting. Finally, considering that the difference of intensity of visual stimulus between two eyes results in binocular rivalry, a novel temporal binocular rivalry inspired weighting method is designed to integrate the quality scores of left- and right-views for the final visual quality prediction of SV sequences, which is denoted as the second-stage weighting. Experimental results on Waterloo-IVC SV quality databases show that several specific examples of 2D-IQA methods within the proposed framework can obtain highly competitive performance over other existing ones.
- Published
- 2021
16. Just noticeable difference (JND) and satisfied user ratio (SUR) prediction for compressed video
- Author
-
Jingwen Zhu and Patrick Le Callet
- Published
- 2022
17. Perception of video quality at a local spatio-temporal horizon
- Author
-
Andréas Pastor and Patrick Le Callet
- Published
- 2022
18. Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming
- Author
-
Yao Zhao, Chunyu Lin, Xue Zhang, Gene Cheung, Jack Z. G. Tan, and Patrick Le Callet
- Subjects
Linear programming ,Computer science ,Video Recording ,Initialization ,02 engineering and technology ,Markov model ,Data modeling ,Deep Learning ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,Maximum a posteriori estimation ,Humans ,Computer vision ,Models, Statistical ,business.industry ,Bandwidth (signal processing) ,Directed graph ,Computer Graphics and Computer-Aided Design ,Markov Chains ,Head Movements ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithms ,Software - Abstract
Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.
- Published
- 2021
19. Adversarial Attack Against Deep Saliency Models Powered by Non-Redundant Priors
- Author
-
Yuan Tian, Jing Li, Guangtao Zhai, Suiyi Ling, Guodong Guo, Ali Borji, Patrick Le Callet, and Zhaohui Che
- Subjects
Exploit ,Artificial neural network ,Computer science ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Feature (computer vision) ,Robustness (computer science) ,Prior probability ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Algorithm ,Software - Abstract
Saliency detection is an effective front-end process to many security-related tasks, e.g. automatic drive and tracking. Adversarial attack serves as an efficient surrogate to evaluate the robustness of deep saliency models before they are deployed in real world. However, most of current adversarial attacks exploit the gradients spanning the entire image space to craft adversarial examples, ignoring the fact that natural images are high-dimensional and spatially over-redundant, thus causing expensive attack cost and poor perceptibility. To circumvent these issues, this paper builds an efficient bridge between the accessible partially-white-box source models and the unknown black-box target models. The proposed method includes two steps: 1) We design a new partially-white-box attack, which defines the cost function in the compact hidden space to punish a fraction of feature activations corresponding to the salient regions, instead of punishing every pixel spanning the entire dense output space. This partially-white-box attack reduces the redundancy of the adversarial perturbation. 2) We exploit the non-redundant perturbations from some source models as the prior cues, and use an iterative zeroth-order optimizer to compute the directional derivatives along the non-redundant prior directions, in order to estimate the actual gradient of the black-box target model. The non-redundant priors boost the update of some “critical” pixels locating at non-zero coordinates of the prior cues, while keeping other redundant pixels locating at the zero coordinates unaffected. Our method achieves the best tradeoff between attack ability and perturbation redundancy. Finally, we conduct a comprehensive experiment to test the robustness of 18 state-of-the-art deep saliency models against 16 malicious attacks, under both of white-box and black-box settings, which contributes a new robustness benchmark to the saliency community for the first time.
- Published
- 2021
20. Re-Visiting Discriminator for Blind Free-Viewpoint Image Quality Assessment
- Author
-
Wei Zhou, Patrick Le Callet, Zhaohui Che, Suiyi Ling, Wang Junle, and Jing Li
- Subjects
Discriminator ,Computer science ,Image quality ,business.industry ,media_common.quotation_subject ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Codebook ,Context (language use) ,Synthetic data ,Computer Science Applications ,Rendering (computer graphics) ,Signal Processing ,Media Technology ,Quality (business) ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,media_common - Abstract
Accurate measurement of perceptual quality is important for various immersive multimedia, which demand real-time quality control or quality-based bench-marking for relevant algorithms. For instance, virtual views rendering in Free-Viewpoint (FV) navigation scenarios is a typical case that introduces challenging distortions, particularly the ones around dis-occluded regions. Existing quality metrics, most of which are targeting for impairments caused by compression or network condition, fail to quantify such non-uniform structure-related distortions. Moreover, the lack of quality databases for such distortions makes it even more challenging to develop robust quality metrics. In this work, a Generative Adversarial Networks based No-Reference (NR) quality Metric, namely GANs-NRM, is proposed. We first present an approach to create masks mimicking dis-occlusions/textureless regions, which is applicable on large-scale 2D image databases publicly available in the computer vision domain. Using these synthetic data, we then train a GANs-based context renderer with the capability of rendering those masked regions. Since the naturalness of the rendered dis-occluded regions strongly relates to the perceptual quality, we assume that the discriminator of the trained GANs has an intrinsic ability for quality assessment. We thus use the features extracted from the discriminator to learn a Bag-of-Distortion-Word (BDW) codebook. We show that a quality predictor can be then well trained using only a small amount of subjective quality data for the FV views rendering. Moreover, in the proposed framework, the discriminator is also adapted as a distortion-detector to locate possible distorted regions. According to the experimental results, the proposed model outperforms significantly the state-of-the-art quality metrics. The corresponding context renderer also shows appealing visualized results over other rendering algorithms.
- Published
- 2021
21. Specialised Video Quality Model For Enhanced User Generated Content (UGC) With Special Effects
- Author
-
Anne-Flore Perrin, Yejing Xie, Tao Zhang, Yiting Liao, Junlin Li, Patrick Le Callet, CAPACITÉS SAS, Nantes Université (Nantes Univ), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, and Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie
- Subjects
User Generated Content ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Crowdsourcing ,Video Quality Metric ,Enhancement filters ,Fine-Tuning - Abstract
International audience; User Generated Content (UGC) refers to media generated by users for end-consumers that represent most of the media exchange on social media. UGC is subject to acquisition and transmission limitations that disable access to the pristine, i.e., perfect source content. Evaluating their quality, especially with current pre-and post-processing algorithms or filters, is a major issue for most off-the-shelf full-reference quality metrics. We propose to conduct a benchmark on existing full-reference, non-reference, and aesthetic quality metrics for UGC with special effects. We aim to identify the challenges posed by both UGC and filtering. We then propose a new combination of metrics tailored to enhanced and filtered UGC, which reaches a trade-off between complexity and accuracy.
- Published
- 2022
22. Noisy Localization of Objects in Reinforcement Learning Framework: An Experimental Case Study on a Pushing Task
- Author
-
Yassine El Manyari, Laurent Dolle, and Patrick Le Callet
- Published
- 2022
23. What are the visuo-motor tendencies of omnidirectional scene free-viewing in virtual reality?
- Author
-
Erwan Joël David, Pierre Lebranchu, Matthieu Perreira Da Silva, Patrick Le Callet, Goethe-Universität Frankfurt am Main, Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ), and Centre hospitalier universitaire de Nantes (CHU Nantes)
- Subjects
Ophthalmology ,Eye Movements ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Saccades ,Virtual Reality ,Visual Perception ,[SCCO.COMP]Cognitive science/Computer science ,Humans ,Fixation, Ocular ,Sensory Systems - Abstract
International audience; Central and peripheral vision during visual tasks have been extensively studied on two-dimensional screens, highlighting their perceptual and functional disparities. This study has two objectives: replicating on-screen gaze-contingent experiments removing central or peripheral field of view in virtual reality, and identifying visuo-motor biases specific to the exploration of 360 scenes with a wide field of view. Our results are useful for vision modelling, with applications in gaze position prediction (e.g., content compression and streaming). We ask how previous on-screen findings translate to conditions where observers can use their head to explore stimuli. We implemented a gaze-contingent paradigm to simulate loss of vision in virtual reality, participants could freely view omnidirectional natural scenes. This protocol allows the simulation of vision loss with an extended field of view (>80°) and studying the head's contributions to visual attention. The time-course of visuo-motor variables in our pure free-viewing task reveals long fixations and short saccades during first seconds of exploration, contrary to literature in visual tasks guided by instructions. We show that the effect of vision loss is reflected primarily on eye movements, in a manner consistent with two-dimensional screens literature. We hypothesize that head movements mainly serve to explore the scenes during free-viewing, the presence of masks did not significantly impact head scanning behaviours. We present new fixational and saccadic visuo-motor tendencies in a 360°context that we hope will help in the creation of gaze prediction models dedicated to virtual reality.
- Published
- 2022
24. State-of-the-Art in 360° Video/Image Processing: Perception, Assessment and Compression
- Author
-
Patrick Le Callet, Mai Xu, Shanyi Zhang, Chen Li, Beihang University (BUAA), University of Chineses Academy of Sciences, Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Image Perception Interaction (IPI), and Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique)
- Subjects
FOS: Computer and information sciences ,Computer science ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,02 engineering and technology ,computer.software_genre ,Bottleneck ,Image (mathematics) ,Compression (functional analysis) ,Perception ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,ComputingMilieux_MISCELLANEOUS ,media_common ,Multimedia ,Image and Video Processing (eess.IV) ,020206 networking & telecommunications ,Video image ,Multimedia (cs.MM) ,Range (mathematics) ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Signal Processing ,State (computer science) ,computer - Abstract
Nowadays, 360�� video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360�� video/image accounts for huge data, which pose the challenges to 360�� video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360�� video/image processing. In this paper, we review the state-of-the-art works on 360�� video/image processing from the aspects of perception, assessment and compression. First, this paper reviews both datasets and visual attention modelling approaches for 360�� video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360�� video/image. Third, we overview the compression approaches for 360�� video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview paper and outlook the future research trends on 360�� video/image processing., Submitted to IEEE J-STSP SI of Perception-driven 360-degree video processing as an Invited Overview Paper
- Published
- 2020
25. Improving Maximum Likelihood Difference Scaling method to measure inter content scale
- Author
-
Andreas Pastor, Lukas Krasula, Xiaoqing Zhu, Zhi Li, Patrick Le Callet, Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ), Image Perception Interaction (LS2N - équipe IPI), Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), and Netflix
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Quantitative Biology - Neurons and Cognition ,FOS: Biological sciences ,[INFO]Computer Science [cs] ,Neurons and Cognition (q-bio.NC) ,ComputingMilieux_MISCELLANEOUS ,Machine Learning (cs.LG) - Abstract
The goal of most subjective studies is to place a set of stimuli on a perceptual scale. This is mostly done directly by rating, e.g. using single or double stimulus methodologies, or indirectly by ranking or pairwise comparison. All these methods estimate the perceptual magnitudes of the stimuli on a scale. However, procedures such as Maximum Likelihood Difference Scaling (MLDS) have shown that considering perceptual distances can bring benefits in terms of discriminatory power, observers' cognitive load, and the number of trials required. One of the disadvantages of the MLDS method is that the perceptual scales obtained for stimuli created from different source content are generally not comparable. In this paper, we propose an extension of the MLDS method that ensures inter-content comparability of the results and shows its usefulness especially in the presence of observer errors., Comment: Difference scaling, supra-threshold estimation, human perception, subjective experiment
- Published
- 2022
- Full Text
- View/download PDF
26. RV-TMO: Large-Scale Dataset for Subjective Quality Assessment of Tone Mapped Images
- Author
-
Ali Ak, Abhishek Goswami, Wolf Hauser, Patrick Le Callet, and Frederic Dufaux
- Subjects
Signal Processing ,Media Technology ,Electrical and Electronic Engineering ,Computer Science Applications - Published
- 2022
- Full Text
- View/download PDF
27. Seeing By Haptic Glance: Reinforcement Learning Based 3d Object Recognition
- Author
-
Suiyi Ling, Guillaume Gallot, Kevin Riou, and Patrick Le Callet
- Subjects
Computer science ,business.industry ,Representation (systemics) ,Cognitive neuroscience of visual object recognition ,Reinforcement learning ,Robot ,Computer vision ,Use case ,Artificial intelligence ,Focus (optics) ,Object (computer science) ,business ,Haptic technology - Abstract
Human is able to conduct 3D recognition by a limited number of haptic contacts between the target object and his/her fingers without seeing the object. This capability is defined as `haptic glance' in cognitive neuroscience. Most of the existing 3D recognition models were developed based on dense 3D data. Nonetheless, in many real-life use cases, where robots are used to collect 3D data by haptic exploration, only a limited number of 3D points could be collected. In this study, we thus focus on solving the intractable problem of how to obtain cognitively representative 3D key-points of a target object with limited interactions between the robot and the object. A novel reinforcement learning based framework is proposed, where the haptic exploration procedure (the agent iteratively predicts the next position for the robot to explore) is optimized simultaneously with the objective 3D recognition with actively collected 3D points. As the model is rewarded only when the 3D object is accurately recognized, it is driven to find the sparse yet efficient haptic-perceptual 3D representation of the object. Experimental results show that our proposed model outperforms the state of the art models.
- Published
- 2021
28. Combining Video Quality Metrics To Select Perceptually Accurate Resolution In A Wide Quality Range: A Case Study
- Author
-
Jean-Marc Thiesse, Patrick Le Callet, and Madhukar Bhat
- Subjects
Quality (physics) ,Computer science ,Resolution (electron density) ,Range (statistics) ,Video quality ,Remote sensing - Published
- 2021
29. Evaluation of the Bubble view Metaphor for the Crowdsourcing Study of Visual Attention Deployment in Tone-Mapped Images
- Author
-
Waqas Ellahi, Toinon Vigier, Patrick Le Callet, Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
- Subjects
business.industry ,Metaphor ,Computer science ,media_common.quotation_subject ,Bubble view ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,[SCCO.COMP]Cognitive science/Computer science ,Crowdsourcing ,eye tracking ,Visualization ,eye movements ,Tone (musical instrument) ,visual attention ,Software deployment ,Human–computer interaction ,[SCCO.PSYC]Cognitive science/Psychology ,Natural (music) ,Eye tracking ,crowdsourcing ,Quality of experience ,Artificial intelligence ,business ,media_common - Abstract
International audience; Attention is an important attribute of human vision for study of user's quality of experience (QoE). The attention information collection from eye tracking is impossible in the current scenario of Covid-19. Different mouse metaphors have been proposed to study visual attention without eye tracking equipment. These methods have shown promising results on different types of images (visualizations, natural images and websites) with well-identified regions of interest. However, they have not been precisely tested for QoE applications, where natural images are processed with different algorithms (compression, tone-mapping, etc.) and visual content can induce more exploratory behavior. This paper studies and compares different configurations of bubble view metaphors for the study of visual attention in tone-mapped images.
- Published
- 2021
30. Visualizing navigation difficulties in video game experiences
- Author
-
Patrick Le Callet, Hippolyte Dubois, Antoine Coutrot, Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
- Subjects
Computer Science::Computer Science and Game Theory ,Computer science ,[SCCO.NEUR]Cognitive science/Neuroscience ,ComputingMilieux_PERSONALCOMPUTING ,video-game ,player experience ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Player experience ,Game design ,Human–computer interaction ,trajectory ,Trajectory ,Entropy (information theory) ,game design ,entropy ,Game Developer ,environment ,Video game - Abstract
International audience; When developing video games, gameplay metrics allow to track and analyze the behaviors of users interacting with the game. Here, we propose to harness player's spatial trajectories to objectively quantify their gaming experience. Spatial trajectories are complex signals that are determined both by the players and by the topology of the virtual spaces they evolve in. In this paper, we propose a new methodology to measure and visualize how the entropy of trajectories is distributed in virtual spaces, and explain how it can inform game developers on the design of the game levels. We apply our method on the Sea Hero Quest dataset, consisting of the trajectories of over 4 millions players finding their way in water mazes.
- Published
- 2021
31. Reference-Free Quality Assessment of Sonar Images via Contour Degradation Measurement
- Author
-
Zhifang Xia, Weiling Chen, Ke Gu, En Cheng, Weisi Lin, Patrick Le Callet, Beijing University of Technology, School of Computer Engineering [Singapore] (NTU), School of Computer Engineering, Nanyang Technological University, The State Information Center of P.R, Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Image Perception Interaction (IPI), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST)
- Subjects
Channel (digital image) ,business.industry ,Image quality ,Computer science ,02 engineering and technology ,Filter (signal processing) ,Computer Graphics and Computer-Aided Design ,Sonar ,Transmission (telecommunications) ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,14. Life underwater ,Artificial intelligence ,Underwater ,business ,ComputingMilieux_MISCELLANEOUS ,Software - Abstract
Sonar imagery plays a significant role in oceanic applications since there is little natural light underwater, and light is irrelevant to sonar imaging. Sonar images are very likely to be affected by various distortions during the process of transmission via the underwater acoustic channel for further analysis. At the receiving end, the reference image is unavailable due to the complex and changing underwater environment and our unfamiliarity with it. To the best of our knowledge, one of the important usages of sonar images is target recognition on the basis of contour information. The contour degradation degree for a sonar image is relevant to the distortions contained in it. To this end, we developed a new no-reference contour degradation measurement for perceiving the quality of sonar images. The sparsities of a series of transform coefficient matrices, which are descriptive of contour information, are first extracted as features from the frequency and spatial domains. The contour degradation degree for a sonar image is then measured by calculating the ratios of extracted features before and after filtering this sonar image. Finally, a bootstrap aggregating (bagging)-based support vector regression module is learned to capture the relationship between the contour degradation degree and the sonar image quality. The results of experiments validate that the proposed metric is competitive with the state-of-the-art reference-based quality metrics and outperforms the latest reference-free competitors.
- Published
- 2019
32. Fast Blind Quality Assessment of DIBR-Synthesized Video Based on High-High Wavelet Subband
- Author
-
Dragana Sandic-Stankovic, Patrick Le Callet, Dragan Kukolj, University of Novi Sad, Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Image Perception Interaction (IPI), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Quality assessment ,business.industry ,Computer science ,Flicker ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Rendering (computer graphics) ,Wavelet ,Nonlinear distortion ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Distortion ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Video based ,ComputingMilieux_MISCELLANEOUS ,Software - Abstract
Free-viewpoint video, as the development direction of the next-generation video technologies, uses the depth-image-based rendering (DIBR) technique for the synthesis of video sequences at viewpoints, where real captured videos are missing. As reference videos at multiple viewpoints are not available, a blind reliable real-time quality metric of the synthesized video is needed. Although no-reference quality metrics dedicated to synthesized views successfully evaluate synthesized images, they are not that effective when evaluating synthesized video due to additional temporal flicker distortion typical only for video. In this paper, a new fast no-reference quality metric of synthesized video with synthesis distortions is proposed. It is guided by the fact that the DIBR-synthesized images are characterized by increased high frequency content. The metric is designed under the assumption that the perceived quality of DIBR-synthesized video can be estimated by quantifying the selected areas in the high-high wavelet subband. The threshold is used to select the most important distortion sensitive regions. The proposed No-Reference Morphological Wavelet with Threshold (NR_MWT) metric is computationally extremely efficient, comparable to PSNR, as the morphological wavelet transformation uses very short filters and only integer arithmetic. It is completely blind, without using machine learning techniques. Tested on the publicly available dataset of synthesized video sequences characterized by synthesis distortions, the metric achieves better performances and higher computational efficiency than the state-of-the-art metrics dedicated to DIBR-synthesized images and videos.
- Published
- 2019
33. Prediction of the Influence of Navigation Scan-Path on Perceived Quality of Free-Viewpoint Videos
- Author
-
Suiyi Ling, Jesus Gutierrez, Patrick Le Callet, Ke Gu, Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Beijing University of Technology, Image Perception Interaction (IPI), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Computer science ,Quality assessment ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020206 networking & telecommunications ,02 engineering and technology ,Benchmarking ,Viewpoints ,Machine learning ,computer.software_genre ,Video quality ,Rendering (computer graphics) ,View synthesis ,Perceived quality ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
Free-Viewpoint Video (FVV) systems allow the viewers to freely change the viewpoints of the scene. In such systems, view synthesis and compression are the two main sources of artifacts influencing the perceived quality. To assess this influence, quality evaluation studies are often carried out using conventional displays and generating predefined navigation trajectories mimicking the possible movement of the viewers when exploring the content. Nevertheless, as different trajectories may lead to different conclusions in terms of visual quality when benchmarking the performance of the systems, methods to identify critical trajectories are needed. This paper aims at exploring the impact of exploration trajectories (defined as Hypothetical Rendering Trajectories: HRT) on perceived quality of FVV subjectively and objectively, providing two main contributions. Firstly, a subjective assessment test including different HRTs was carried out and analyzed. The results demonstrate and quantify the influence of HRT in the perceived quality. Secondly, we propose a new objective video quality assessment measure to objectively predict the impact of HRT. This measure, based on Sketch-Token representation, models how the categories of the contours change spatially and temporally from a higher semantic level. Performance in comparison with existing quality metrics for FVV, highlight promising results for automatic detection of most critical HRTs for the benchmark of immersive systems., 11 pages, 7 figures
- Published
- 2019
34. Learning a No-Reference Quality Predictor of Stereoscopic Images by Visual Binocular Properties
- Author
-
Jiebin Yan, Jiheng Wang, Yuming Fang, Patrick Le Callet, Xuelin Liu, Guangtao Zhai, Jiangxi University of Science and Technology, Shanghai Jiao Tong University [Shanghai], Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Image Perception Interaction (IPI), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
- Subjects
Binocular rivalry ,symmetric distortion ,Visual perception ,genetic structures ,General Computer Science ,Computer science ,Image quality ,Local binary patterns ,image quality assessment ,Stereoscopy ,02 engineering and technology ,law.invention ,law ,Histogram ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Computer vision ,ComputingMilieux_MISCELLANEOUS ,Stereoscopic images ,Monocular ,business.industry ,General Engineering ,020206 networking & telecommunications ,eye diseases ,no reference ,asymmetric distortion ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Human visual system model ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,lcsh:TK1-9971 - Abstract
In this work, we develop a novel no-reference (NR) quality assessment metric for stereoscopic images based on monocular and binocular features, motivated by visual perception properties of the human visual system (HVS) named binocular rivalry and binocular integration. To be more specific, we first calculate the normalized intensity feature maps of right- and left-view images through local contrast normalization, where statistical intensity features are extracted by the histogram of the normalized intensity feature map to represent monocular features. Then, we compute the disparity map of stereoscopic image, with which we extract structure feature map of stereoscopic image based on local binary pattern (LBP). We further extract statistical structure features and statistical depth features from structure feature map and disparity map by histogram to represent binocular features. Finally, we adopt support vector regression (SVR) to train the mapping function from the extracted monocular and binocular features to subjective quality scores. Comparison experiments are conducted on four large-scale stereoscopic image databases and the results demonstrate the promising performance of the proposed method in stereoscopic image quality assessment.
- Published
- 2019
35. Wide Color Gamut Image Content Characterization: Method, Evaluation, and Applications
- Author
-
Toinon Vigier, Junghyuk Lee, Patrick Le Callet, Jong-Seok Lee, Yonsei University, Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Institut de Recherche en Communications et en Cybernétique de Nantes (IRCCyN), Mines Nantes (Mines Nantes)-École Centrale de Nantes (ECN)-Ecole Polytechnique de l'Université de Nantes (EPUN), and Université de Nantes (UN)-Université de Nantes (UN)-PRES Université Nantes Angers Le Mans (UNAM)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,content selection ,02 engineering and technology ,quality of experience ,computer.software_genre ,Gamut ,color gamut mapping ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,FOS: Electrical engineering, electronic engineering, information engineering ,Use case ,Quality of experience ,content characterization ,Electrical and Electronic Engineering ,Reliability (statistics) ,ComputingMethodologies_COMPUTERGRAPHICS ,Wide color gamut ,Image and Video Processing (eess.IV) ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Image content ,Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science Applications ,Method evaluation ,Characterization (materials science) ,Multimedia (cs.MM) ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Signal Processing ,020201 artificial intelligence & image processing ,Data mining ,computer ,Computer Science - Multimedia - Abstract
International audience; In this paper, we propose a novel framework to characterize a wide color gamut image content based on perceived quality due to the processes that change color gamut, and demonstrate two practical use cases where the framework can be applied. We first introduce the main framework and implementation details. Then, we provide analysis for understanding of existing wide color gamut datasets with quantitative characterization criteria on their characteristics, where four criteria, i.e., coverage, total coverage, uniformity, and total uniformity, are proposed. Finally, the framework is applied to content selection in a gamut mapping evaluation scenario in order to enhance reliability and robustness of the evaluation results. As a result, the framework fulfils content characterization for studies where quality of experience of wide color gamut stimuli is involved.
- Published
- 2021
- Full Text
- View/download PDF
36. Considering user agreement in learning to predict the aesthetic quality
- Author
-
Suiyi Ling, Andreas Pastor, Junle Wang, Patrick Le Callet, Image Perception Interaction (LS2N - équipe IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Nantes Université (Nantes Univ), and Tencent [Shenzhen]
- Subjects
FOS: Computer and information sciences ,I.4.0 ,68T07 ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,[INFO]Computer Science [cs] ,ComputingMilieux_MISCELLANEOUS - Abstract
How to robustly rank the aesthetic quality of given images has been a long-standing ill-posed topic. Such challenge stems mainly from the diverse subjective opinions of different observers about the varied types of content. There is a growing interest in estimating the user agreement by considering the standard deviation of the scores, instead of only predicting the mean aesthetic opinion score. Nevertheless, when comparing a pair of contents, few studies consider how confident are we regarding the difference in the aesthetic scores. In this paper, we thus propose (1) a re-adapted multi-task attention network to predict both the mean opinion score and the standard deviation in an end-to-end manner; (2) a brand-new confidence interval ranking loss that encourages the model to focus on image-pairs that are less certain about the difference of their aesthetic scores. With such loss, the model is encouraged to learn the uncertainty of the content that is relevant to the diversity of observers' opinions, i.e., user disagreement. Extensive experiments have demonstrated that the proposed multi-task aesthetic model achieves state-of-the-art performance on two different types of aesthetic datasets, i.e., AVA and TMGA., Comment: 5 pages
- Published
- 2021
- Full Text
- View/download PDF
37. Implicitly using Human Skeleton in Self-supervised Learning: Influence on Spatio-temporal Puzzle Solving and on Video Action Recognition
- Author
-
Patrick Le Callet, Laurent Dollé, Mathieu Riand, Image Perception Interaction (LS2N - équipe IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Nantes Université (Nantes Univ), CEA Tech Pays-de-la-Loire (DP2L), CEA Tech en régions (CEA-TECH-Reg), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), and Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)
- Subjects
Self supervised learning ,Computer science ,business.industry ,Skeleton Keypoints ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Siamese Network ,Human skeleton ,medicine.anatomical_structure ,Action Recognition ,medicine ,Action recognition ,Artificial intelligence ,Few-shot Learning ,business ,Self-supervised Learning - Abstract
In this paper we studied the influence of adding skeleton data on top of human actions videos when performing self-supervised learning and action recognition. We show that adding this information without additional constraints actually hurts the accuracy of the network; we argue that the added skeleton is not considered by the network and seen as a noise masking part of the natural image. We bring first results on puzzle solving and video action recognition to support this hypothesis.
- Published
- 2021
38. Quality Assessment of Free-Viewpoint Videos by Quantifying the Elastic Changes of Multi-Scale Motion Trajectories
- Author
-
Suiyi, Ling, Jing, Li, Zhaohui, Che, Xiongkuo, Min, Guangtao, Zhai, and Patrick, Le Callet
- Abstract
Virtual viewpoints synthesis is an essential process for many immersive applications including Free-viewpoint TV (FTV). A widely used technique for viewpoints synthesis is Depth-Image-Based-Rendering (DIBR) technique. However, such technique may introduce challenging non-uniform spatial-temporal structure-related distortions. Most of the existing state-of-the-art quality metrics fail to handle these distortions, especially the temporal structure inconsistencies observed during the switch of different viewpoints. To tackle this problem, an elastic metric and multi-scale trajectory based video quality metric (EM-VQM) is proposed in this paper. Dense motion trajectory is first used as a proxy for selecting temporal sensitive regions, where local geometric distortions might significantly diminish the perceived quality. Afterwards, the amount of temporal structure inconsistencies and unsmooth viewpoints transitions are quantified by calculating 1) the amount of motion trajectory deformations with elastic metric and, 2) the spatial-temporal structural dissimilarity. According to the comprehensive experimental results on two FTV video datasets, the proposed metric outperforms the state-of-the-art metrics designed for free-viewpoint videos significantly and achieves a gain of 12.86% and 16.75% in terms of median Pearson linear correlation coefficient values on the two datasets compared to the best one, respectively.
- Published
- 2020
39. Visual Quality of 3D Meshes With Diffuse Colors in Virtual Reality: Subjective and Objective Evaluation
- Author
-
Yana Nehme, Patrick Le Callet, Florent Dupont, Jean-Philippe Farrugia, Guillaume Lavoué, Origami (Origami), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), and Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Image quality ,Computer science ,02 engineering and technology ,Solid modeling ,Virtual reality ,Perceptual Metric ,Rendering (computer graphics) ,Computer graphics ,3D Mesh ,Image texture ,Distortion ,Subjective Quality Evaluation ,Computer Graphics ,0202 electrical engineering, electronic engineering, information engineering ,Polygon mesh ,Computer vision ,Computer animation ,ComputingMethodologies_COMPUTERGRAPHICS ,business.industry ,020207 software engineering ,Visual Quality Assessment ,Computer Graphics and Computer-Aided Design ,[INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR] ,Diffuse Color ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,Signal Processing ,Objective Quality Evaluation ,Perception ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Dataset - Abstract
Surface meshes associated with diffuse texture or color attributes are becoming popular multimedia contents. They provide a high degree of realism and allow six degrees of freedom (6DoF) interactions in immersive virtual reality environments. Just like other types of multimedia, 3D meshes are subject to a wide range of processing, e.g., simplification and compression, which result in a loss of quality of the final rendered scene. Thus, both subjective studies and objective metrics are needed to understand and predict this visual loss. In this work, we introduce a large dataset of 480 animated meshes with diffuse color information, and associated with perceived quality judgments. The stimuli were generated from 5 source models subjected to geometry and color distortions. Each stimulus was associated with 6 hypothetical rendering trajectories (HRTs): combinations of 3 viewpoints and 2 animations. A total of 11520 quality judgments (24 per stimulus) were acquired in a subjective experiment conducted in virtual reality. The results allowed us to explore the influence of source models, animations and viewpoints on both the quality scores and their confidence intervals. Based on these findings, we propose the first metric for quality assessment of 3D meshes with diffuse colors, which works entirely on the mesh domain. This metric incorporates perceptually-relevant curvature-based and color-based features. We evaluate its performance, as well as a number of Image Quality Metrics (IQMs), on two datasets: ours and a dataset of distorted textured meshes. Our metric demonstrates good results and a better stability than IQMs. Finally, we investigated how the knowledge of the viewpoint (i.e., the visible parts of the 3D model) may improve the results of objective metrics.
- Published
- 2020
40. Towards Better Quality Assessment of High-Quality Videos
- Author
-
Sriram Sethuraman, Deepthi Nandakumar, Suiyi Ling, Yoann Baveye, and Patrick Le Callet
- Subjects
Protocol (science) ,Multimedia ,Computer science ,Quality assessment ,media_common.quotation_subject ,020206 networking & telecommunications ,02 engineering and technology ,Internet traffic ,Video quality ,Data rate ,computer.software_genre ,Category rating ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,computer ,media_common ,Data compression - Abstract
In recent times, video content encoded at High-Definition (HD) and Ultra-High-Definition (UHD) resolution dominates internet traffic. The significantly increased data rate and growing expectations of video quality from users create great challenges in video compression and quality assessment, especially for higher-resolution, higher-quality content. The development of robust video quality assessment metrics relies on the collection of subjective ground truths. As high-quality video content is more ambiguous and difficult for a human observer to rate, a more distinguishable subjective protocol/methodology should be considered. In this study, towards better quality assessment of high-quality videos, a subjective study was conducted focusing on high-quality HD and UHD content with the Degradation Category Rating (DCR) protocol. Commonly used video quality metrics were benchmarked in two quality ranges.
- Published
- 2020
41. QoEVMA'20
- Author
-
Wen Lu, Jiachen Yang, Patrick Le Callet, Jing Li, Xinbo Gao, and Zhi Li
- Subjects
Multimedia ,Computer science ,Stereoscopy ,Service provider ,Virtual reality ,computer.software_genre ,Live streaming ,law.invention ,Variety (cybernetics) ,law ,Augmented reality ,Performance indicator ,Quality of experience ,computer - Abstract
Nowadays, people spend dramatically more time on watching videos through different devices. The advanced hardware technology and network allow for the increasing demands of users viewing experience. Thus, enhancing the Quality of Experience of end-users in advanced multimedia is the ultimate goal of service providers, as good services would attract more consumers. Quality assessment is thus important. The first workshop on "Quality of Experience (QoE) in visual multimedia applications" (QoEVMA'20) focuses on the QoE assessment of any visual multimedia applications both subjectively and objectively. The topics include 1)QoE assessment on different visual multimedia applications, including VoD for movies, dramas, variety shows, UGC on social networks, live streaming videos for gaming/shopping/social, etc. 2)QoE assessment for different video formats in multimedia services, including 2D, stereoscopic 3D, High Dynamic Range (HDR), Augmented Reality (AR), Virtual Reality (VR), 360, Free-Viewpoint Video(FVV), etc. 3)Key performance indicators (KPI) analysis for QoE. This summary gives a brief overview of the workshop, which took place at October 16, 2020 in Seattle (U.S.), as a half-day workshop.
- Published
- 2020
42. A Probabilistic Graphical Model for Analyzing the Subjective Visual Quality Assessment Data from Crowdsourcing
- Author
-
Jing Li, Patrick Le Callet, Wang Junle, and Suiyi Ling
- Subjects
Ground truth ,business.industry ,Computer science ,media_common.quotation_subject ,Probabilistic logic ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,Crowdsourcing ,01 natural sciences ,Robustness (computer science) ,Categorical distribution ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,Graphical model ,Quality of experience ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,media_common - Abstract
The swift development of the multimedia technology has raised dramatically the users' expectation on the quality of experience. To obtain the ground-truth perceptual quality for model training, subjective assessment is necessary. Crowdsourcing platform provides us a convenient and feasible way to run large-scale experiments. However, the obtained perceptual quality labels are generally noisy. In this paper, we propose a probabilistic graphical annotation model to infer the underlying ground truth and discovering the annotator's behavior. In the proposed model, the ground truth quality label is considered following a categorical distribution rather than a unique number, i.e., different reliable opinions on the perceptual quality are allowed. In addition, different annotator's behaviors in crowdsourcing are modeled, which allows us to identify the possibility that the annotator makes noisy labels during the test. The proposed model has been tested on both simulated data and real-world data, where it always shows superior performance than the other state-of-the-art models in terms of accuracy and robustness.
- Published
- 2020
43. Rate-distortion video coding and uncertainties: to be blindly chasing marginal improvement or to be greener
- Author
-
Patrick Le Callet, Suiyi Ling, and Yoann Baveye
- Subjects
Ground truth ,Computational complexity theory ,Exploit ,Computer science ,business.industry ,Code rate ,Machine learning ,computer.software_genre ,Video quality ,Rule of thumb ,Codec ,Artificial intelligence ,business ,computer ,Coding (social sciences) - Abstract
The last decade has witnessed the rapid development of video encoding and video quality assessment. However, each new generation of codecs has come with a significant increase in computational complexity, especially to exploit their full potential. To ease the exponential growth in computational load, a greener video encoding scheme that consumes less power should be considered. The improvement of encoding efficiency is driven by Rate-Distortion Optimization (RDO), where the goal is to minimize the coding distortion under a target coding rate. As distortions are quantified by quality metrics, whether the applied quality metric is capable of judging accurately the perceived quality is vital for selecting encoding recipes, etc. In most cases, more complex codecs are developed seeking for any enhancement of quality scores predicted by an ad hoc quality metric, e.g., 1 dB by PSNR or 1/100 of the SSIM scale. Despite some rules of thumb, whether such improvement is worth a significant increase in power consumption is questionable, as the resolution of most quality metrics with respect to human judgement accuracy is often above the measured improvement. In this work, we propose a simple model to quantify the uncertainty/resolution of a quality metric, where confidence intervals (CI) of the quality scores predicted by the metric are computed with respect to existing observed ground truth (e.g. human observer opinion). As a possible use case, if the CI of two encoding recipes overlap, the greener one could be selected. Extensive experiments have been conducted on several datasets tailored to different purposes, and the uncertainties of mainstream quality metrics are reported. Perspectives of the trade-off between complexity and efficiency of encoding techniques are provided.
- Published
- 2020
44. HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators
- Author
-
Patrick Le Callet, Toinon Vigier, Waqas Ellahi, University of Nantes, 44007 Nantes, France, Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Brightness ,Computer science ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Index Terms-scanpath ,[SCCO.COMP]Cognitive science/Computer science ,02 engineering and technology ,Tone mapping ,Similarity measure ,050105 experimental psychology ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,0202 electrical engineering, electronic engineering, information engineering ,Contrast (vision) ,0501 psychology and cognitive sciences ,Computer vision ,Hidden Markov model ,Spatial analysis ,High dynamic range ,media_common ,hidden markov model ,business.industry ,[SCCO.NEUR]Cognitive science/Neuroscience ,tone mapping operators ,05 social sciences ,visual atten- tion ,eye movements ,Metric (mathematics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
International audience; Recent advancements in the image capturing techniques and post processing software generate High Dynamic Range (HDR) images. Such images retain maximum information of the scene by capturing more realistic visual contents, which are often missed in traditional image capturing techniques. In this regard, tone mapping operators (TMOs) play a significant role in displaying HDR image contents on a traditional Low Dynamic Range (LDR) display. These operators tend to introduce artifacts in the original HDR image to change its brightness and contrast in such a way that it can destroy the important textures and information of the image. The assessment of these TMOs is a challenging topic to select best the best technique considering different perceptual and quality dimensions. In this paper, we propose to compare TMOs through their impact on visual behavior in comparison with HDR condition. This study is the first of its kind to utilize hidden markov model (HMM) as a similarity measure to evaluate perceived quality of TMO. The findings suggest that the proposed HMM-based method which emphasizes on temporal information produce better evaluation metric than the traditional approaches which are based only on visual spatial information.
- Published
- 2020
45. Towards Perceptually-Optimized Compression Of User Generated Content (UGC): Prediction Of UGC Rate-Distortion Category
- Author
-
Jim Skinner, Ioannis Katsavounidis, Yoann Baveye, Patrick Le Callet, and Suiyi Ling
- Subjects
Computer science ,Feature extraction ,User-generated content ,Feature selection ,030229 sport sciences ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Rate–distortion theory ,03 medical and health sciences ,Identification (information) ,0302 clinical medicine ,Data mining ,Cluster analysis ,computer ,0105 earth and related environmental sciences ,Data compression - Abstract
How to best evaluate the perceptual quality, and efficiently optimize the compression of User Generated Content (UGC) within an adaptive streaming system is becoming one of the most intractable challenges in the community. Rate-Distortion (R-D) characteristic based content analyses, which could be applied on the non-pristine originals, is inevitable to provide guidance in developing quality metrics and efficient compression system. To this end, we present a novel complete R-D category prediction system through the identification of discriminate features. To better understand the Rate-Distortion (R-D) behaviors of UGC, we first propose a Bjontegaard Delta (BD)-Rate, BD-Quality-based algorithm to categorize UGC. By using the predicted R-D related categories as ground-truth labels, we further identify features that characterize the R-D behaviors of UGC via a hierarchical feature selection framework. Finally, selected features are employed to predict the R-D category of under-test UGC. Comprehensive observations and results are summarized through extensive experiments.
- Published
- 2020
46. Few-Shot Pill Recognition
- Author
-
Suiyi Ling, Andreas Pastor, Jieun Kim, Jing Li, Zhaohui Che, Wang Junle, Patrick Le Callet, Image Perception Interaction (LS2N - équipe IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Nantes Université (Nantes Univ), Alibaba Group [Hangzhou], Shanghai Jiao Tong University [Shanghai], Tencent [Shenzhen], and Hanyang University
- Subjects
Training set ,Artificial neural network ,Noise measurement ,Computer science ,business.industry ,Shot (filmmaking) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Pill ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business ,computer ,ComputingMilieux_MISCELLANEOUS ,0105 earth and related environmental sciences - Abstract
Pill image recognition is vital for many personal/public health-care applications and should be robust to diverse unconstrained real-world conditions. Most existing pill recognition models are limited in tackling this challenging few-shot learning problem due to the insufficient instances per category. With limited training data, neural network-based models have limitations in discovering most discriminating features, or going deeper. Especially, existing models fail to handle the hard samples taken under less controlled imaging conditions. In this study, a new pill image database, namely CURE, is first developed with more varied imaging conditions and instances for each pill category. Secondly, a W2-net is proposed for better pill segmentation. Thirdly, a Multi-Stream (MS) deep network that captures task-related features along with a novel two-stage training methodology are proposed. Within the proposed framework, a Batch All strategy that considers all the samples is first employed for the sub-streams, and then a Batch Hard strategy that considers only the hard samples mined in the first stage is utilized for the fusion network. By doing so, complex samples that could not be represented by one type of feature could be focused and the model could be forced to exploit other domain-related information more effectively. Experiment results show that the proposed model outperforms state-of-the-art models on both the National Institute of Health (NIH) and our CURE database.
- Published
- 2020
47. Can Visual Scanpath Reveal Personal Image Memorability? Investigation of HMM Tools for Gaze Patterns Analysis
- Author
-
Patrick Le Callet, Waqas Ellahi, Toinon Vigier, University of Nantes, 44007 Nantes, France, Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Institut de Recherche en Communications et en Cybernétique de Nantes (IRCCyN), Mines Nantes (Mines Nantes)-École Centrale de Nantes (ECN)-Ecole Polytechnique de l'Université de Nantes (EPUN), Université de Nantes (UN)-Université de Nantes (UN)-PRES Université Nantes Angers Le Mans (UNAM)-Centre National de la Recherche Scientifique (CNRS), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Ecole Polytechnique de l'Université de Nantes (EPUN), and Université de Nantes (UN)
- Subjects
scanpath ,Computer science ,[SCCO.COMP]Cognitive science/Computer science ,050105 experimental psychology ,Personalization ,Visual behavior ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,03 medical and health sciences ,0302 clinical medicine ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Human–computer interaction ,Visual attention ,0501 psychology and cognitive sciences ,Hidden Markov model ,memorability ,[SCCO.NEUR]Cognitive science/Neuroscience ,05 social sciences ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Eye movement ,Gaze ,eye movements ,visual attention ,Hidden markov model ,[SCCO.PSYC]Cognitive science/Psychology ,Visual patterns ,030217 neurology & neurosurgery - Abstract
International audience; Visual attention has been shown as a good proxy for QoE, revealing specific visual patterns considering content, system and contextual aspects of a multimedia applications. In this paper, we propose a novel approach based on hidden markov models to analyze visual scanpaths in an image memorability task. This new method ensures the consideration of both temporal and idiosyncrasic aspects of visual behavior. The study shows promising results for the use of indirect measures for the personalization of QoE assessment and prediction.
- Published
- 2020
- Full Text
- View/download PDF
48. No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image
- Author
-
Suiyi Ling, Patrick Le Callet, Ali Ak, Ecole Polytechnique de l'Université de Nantes (EPUN), Université de Nantes (UN), Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Institut de Recherche en Communications et en Cybernétique de Nantes (IRCCyN), Mines Nantes (Mines Nantes)-École Centrale de Nantes (ECN)-Ecole Polytechnique de l'Université de Nantes (EPUN), and Université de Nantes (UN)-Université de Nantes (UN)-PRES Université Nantes Angers Le Mans (UNAM)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Image quality ,Computer science ,Epipolar geometry ,Histogram of oriented gradients ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,No-reference ,02 engineering and technology ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Image quality assessment ,Histogram ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Light field ,Epipolar Plane Image ,business.industry ,[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] ,Convolutional sparse coding ,020206 networking & telecommunications ,Pattern recognition ,Kernel (image processing) ,[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
International audience; As an emerging technology, Light Field (LF) has gained ever-increasing importance in the domain. In order to provide guidance for the development of perceptually accurate Light Field (LF) processing tools and supervise the entire streaming system, robust perceptual quality assessment metrics are required. Especially, No-Reference (NR) metrics are preferable to compare LF with different angular resolutions. Some metrics have been developed by extending commonly used 2D image quality metrics to the 4D LF domain with angular consistency terms. Nonetheless, although these models consistently show slightly improved performance, most of them are limited in evaluating the quality of LF using the sub-aperture views with additional terms on the angular domain. There is an evident lack of reliable quality metrics that are tailored to LF content. To remedy this lack, we propose a NR quality metric for LF contents based on representing EPI with structural descriptors, including the Histogram of Gradients and the Convolutional Sparse Coding based de-scriptors. The primary motivation resides in our observations that (1) LF related distortions on the angular domain are highly noticeable on the Epipolar Plane Image representations (EPI); (2) most of the distortions in EPI are structure-related. Extensive experiments on the MPI-LFA [1] LF image quality dataset demonstrate that our method provides competitive performance with the state-of-the-art NR image quality metrics.
- Published
- 2020
- Full Text
- View/download PDF
49. MapStack: Exploring Multilayered Geospatial Data in Virtual Reality
- Author
-
Guillaume Moreau, Erwan J. David, Patrick Le Callet, Maxim Spur, Vincent Tourre, Ambiances, Architectures, Urbanités (AAU), École Centrale de Nantes (ECN)-École nationale supérieure d'architecture de Nantes (ENSA Nantes)-Ministère de la Culture et de la Communication (MCC)-Centre National de la Recherche Scientifique (CNRS)-École nationale supérieure d'architecture de Grenoble (ENSAG ), Université Grenoble Alpes (UGA)-Université Grenoble Alpes (UGA), Centre de recherche nantais Architectures Urbanités (CRENAU ), Université Grenoble Alpes (UGA)-Université Grenoble Alpes (UGA)-École Centrale de Nantes (ECN)-École nationale supérieure d'architecture de Nantes (ENSA Nantes)-Ministère de la Culture et de la Communication (MCC)-Centre National de la Recherche Scientifique (CNRS)-École nationale supérieure d'architecture de Grenoble (ENSAG ), École Centrale de Nantes (ECN), Image Perception Interaction (IPI), Laboratoire des Sciences du Numérique de Nantes (LS2N), Centre National de la Recherche Scientifique (CNRS)-École Centrale de Nantes (ECN)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-École Centrale de Nantes (ECN)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST)
- Subjects
[SHS.ARCHI]Humanities and Social Sciences/Architecture, space management ,Geospatial analysis ,Immersive Analytics ,Geospatial Data Visualization ,Computer science ,Virtual Reality ,Workspace ,Virtual reality ,Grid ,computer.software_genre ,[INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR] ,Stack (abstract data type) ,Virtual machine ,Human–computer interaction ,Immersive analytics ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Coordinated And Multiple Views ,Urbanism ,computer - Abstract
International audience; Virtual reality (VR) headsets offer a large and immersive workspace for displaying visualizations with stereo-scopic vision, compared to traditional environments with monitors or printouts. The controllers for these devices further allow direct three-dimensional interaction with the virtual environment. In this paper, we make use of these advantages to implement a novel multiple and coordinated view (MCV) in the form of a vertical stack, showing tilted layers of geospatial data to facilitate an understanding of multi-layered maps. A formal study based on a use-case from urbanism that requires cross-referencing four layers of geospatial urban data augments our arguments for it by comparing it to more conventional systems similarly implemented in VR: a simpler grid of layers, and switching (blitting) layers on one map. Performance and oculometric analyses showed an advantage of the two spatial-multiplexing methods (the grid or the stack) over the temporal mul-tiplexing in blitting. Overall, users tended to prefer the stack, be ambivalent to the grid, and show dislike for the blitting map. Perhaps more interestingly, we were also able to associate preferences in systems with user characteristics and behavior.
- Published
- 2020
50. Capturing and Explaining Trajectory Singularities using Composite Signal Neural Networks
- Author
-
Hippolyte Dubois, Michael Hornberger, Antoine Coutrot, Hugo J. Spiers, Patrick Le Callet, Université de Nantes (UN), Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Ecole Polytechnique de l'Université de Nantes (EPUN), University of East Anglia [Norwich] (UEA), University College of London [London] (UCL), and Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Index Terms-graph signal processing ,neural network ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,Signal ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Machine Learning (cs.LG) ,Set (abstract data type) ,Statistics - Machine Learning ,explainability ,pattern analysis ,0202 electrical engineering, electronic engineering, information engineering ,Isolation (database systems) ,Time series ,cnn ,Artificial neural network ,business.industry ,[SCCO.NEUR]Cognitive science/Neuroscience ,gcnn ,020206 networking & telecommunications ,Pattern recognition ,Visualization ,trajectory ,Trajectory ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Spatial trajectories are ubiquitous and complex signals. Their analysis is crucial in many research fields, from urban planning to neuroscience. Several approaches have been proposed to cluster trajectories. They rely on hand-crafted features, which struggle to capture the spatio-temporal complexity of the signal, or on Artificial Neural Networks (ANNs) which can be more efficient but less interpretable. In this paper we present a novel ANN architecture designed to capture the spatio-temporal patterns characteristic of a set of trajectories, while taking into account the demographics of the navigators. Hence, our model extracts markers linked to both behaviour and demographics. We propose a composite signal analyser (CompSNN) combining three simple ANN modules. Each of these modules uses different signal representations of the trajectory while remaining interpretable. Our CompSNN performs significantly better than its modules taken in isolation and allows to visualise which parts of the signal were most useful to discriminate the trajectories., Comment: 5 pages, 9 figures, submitted to Eusipco2020 conference
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.