Author: "Toderici, George" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Toderici, George"' showing total 117 results

Start Over Author "Toderici, George"

117 results on '"Toderici, George"'

1. Towards flexible perception with visual memory

Author: Geirhos, Robert, Jaini, Priyank, Stone, Austin, Medapati, Sourabh, Yi, Xi, Toderici, George, Ogale, Abhijit, and Shlens, Jonathon
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in "stone" weights., Comment: Adding link to code at https://github.com/google-deepmind/visual-memory
Published: 2024

2. High-Fidelity Image Compression with Score-based Generative Models

Author: Hoogeboom, Emiel, Agustsson, Eirikur, Mentzer, Fabian, Versari, Luca, Toderici, George, and Theis, Lucas
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.
Published: 2023

3. Multi-Realism Image Compression with a Conditional Generator

Author: Agustsson, Eirikur, Minnen, David, Toderici, George, and Mentzer, Fabian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before., Comment: CVPR'23 Camera Ready
Published: 2022

4. VCT: A Video Compression Transformer

Author: Mentzer, Fabian, Toderici, George, Minnen, David, Hwang, Sung-Jin, Caelles, Sergi, Lucic, Mario, and Agustsson, Eirikur
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past. The resulting video compression transformer outperforms previous methods on standard video compression data sets. Experiments on synthetic data show that our model learns to handle complex motion patterns such as panning, blurring and fading purely from data. Our approach is easy to implement, and we release code to facilitate future research., Comment: NeurIPS'22 Camera Ready Version. Code: https://goo.gle/vct-paper
Published: 2022

5. LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks

Author: Isik, Berivan, Chou, Philip A., Hwang, Sung Jin, Johnston, Nick, and Toderici, George
Subjects: Computer Science - Graphics, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to the network include both spatial coordinates and a latent vector per block. We represent the latent vectors using coefficients of the region-adaptive hierarchical transform (RAHT) used in the MPEG geometry-based point cloud codec G-PCC. The coefficients, which are highly compressible, are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms RAHT by 2--4 dB. This is the first work to compress volumetric functions represented by local coordinate-based neural networks. As such, we expect it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields., Comment: 30 pages, 29 figures
Published: 2021

6. Neural Video Compression using GANs for Detail Synthesis and Propagation

Author: Mentzer, Fabian, Agustsson, Eirikur, Ballé, Johannes, Minnen, David, Johnston, Nick, and Toderici, George
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies., Comment: First two authors contributed equally. ECCV Camera ready version
Published: 2021

7. End-to-end Learning of Compressible Features

Author: Singh, Saurabh, Abu-El-Haija, Sami, Johnston, Nick, Ballé, Johannes, Shrivastava, Abhinav, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression, while general purpose lossy compression methods based on energy compaction (e.g. PCA followed by quantization and entropy coding) are sub-optimal, as they are not tuned to task specific objective. We propose a learned method that jointly optimizes for compressibility along with the task objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that our method produces features that are an order of magnitude more compressible, while having a regularization effect that leads to a consistent improvement in accuracy., Comment: Accepted at ICIP 2020
Published: 2020

8. Nonlinear Transform Coding

Author: Ballé, Johannes, Chou, Philip A., Minnen, David, Singh, Saurabh, Johnston, Nick, Agustsson, Eirikur, Hwang, Sung Jin, and Toderici, George
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate--distortion trade-off of nonlinear transforms, introducing a simplified one., Comment: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing
Published: 2020

9. High-Fidelity Generative Image Compression

Author: Mentzer, Fabian, Toderici, George, Tschannen, Michael, and Agustsson, Eirikur
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system. In particular, we investigate normalization layers, generator and discriminator architectures, training strategies, as well as perceptual losses. In contrast to previous work, i) we obtain visually pleasing reconstructions that are perceptually similar to the input, ii) we operate in a broad range of bitrates, and iii) our approach can be applied to high-resolution images. We bridge the gap between rate-distortion-perception theory and practice by evaluating our approach both quantitatively with various perceptual metrics, and with a user study. The study shows that our method is preferred to previous approaches even if they use more than 2x the bitrate., Comment: This is the Camera Ready version for NeurIPS 2020. Project page: https://hific.github.io
Published: 2020

10. Neural Video Compression Using GANs for Detail Synthesis and Propagation

Author: Mentzer, Fabian, Agustsson, Eirikur, Ballé, Johannes, Minnen, David, Johnston, Nick, Toderici, George, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

11. Joint Autoregressive and Hierarchical Priors for Learned Image Compression

Author: Minnen, David, Ballé, Johannes, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate--distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics., Comment: Accepted at the 32nd Conference on Neural Information Processing Systems (NIPS 2018)
Published: 2018

12. Towards a Semantic Perceptual Image Metric

Author: Chinen, Troy, Ballé, Johannes, Gu, Chunhui, Hwang, Sung Jin, Ioffe, Sergey, Johnston, Nick, Leung, Thomas, Minnen, David, O'Malley, Sean, Rosenberg, Charles, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments methods. More interestingly, it shows strong responses to objects potentially carrying semantic relevance such as faces and text, which we demonstrate using a visualization technique and ablation experiments. In effect, the metric appears to model a higher influence of semantic context on judgments, which we observe particularly in untrained raters. As the vast majority of users of image processing systems are unfamiliar with Image Quality Assessment (IQA) tasks, these findings may have significant impact on real-world applications of perceptual metrics.
Published: 2018

13. Image-Dependent Local Entropy Models for Learned Image Compression

Author: Minnen, David, Toderici, George, Singh, Saurabh, Hwang, Sung Jin, and Covell, Michele
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The leading approach for image compression with artificial neural networks (ANNs) is to learn a nonlinear transform and a fixed entropy model that are optimized for rate-distortion performance. We show that this approach can be significantly improved by incorporating spatially local, image-dependent entropy models. The key insight is that existing ANN-based methods learn an entropy model that is shared between the encoder and decoder, but they do not transmit any side information that would allow the model to adapt to the structure of a specific image. We present a method for augmenting ANN-based image coders with image-dependent side information that leads to a 17.8% rate reduction over a state-of-the-art ANN-based baseline model on a standard evaluation set, and 70-98% reductions on images with low visual complexity that are poorly captured by a fixed, global entropy model.
Published: 2018

14. Spatially adaptive image compression using a tiled deep network

Author: Minnen, David, Toderici, George, Covell, Michele, Chinen, Troy, Johnston, Nick, Shor, Joel, Hwang, Sung Jin, Vincent, Damien, and Singh, Saurabh
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images. Existing image compression algorithms based on neural networks learn quantized representations with a constant spatial bit rate across each image. While entropy coding introduces some spatial variation, traditional codecs have benefited significantly by explicitly adapting the bit rate based on local image complexity and visual saliency. This paper introduces an algorithm that combines deep neural networks with quality-sensitive bit rate adaptation using a tiled network. We demonstrate the importance of spatial context prediction and show improved quantitative (PSNR) and qualitative (subjective rater assessment) results compared to a non-adaptive baseline and a recently published image compression model based on fully-convolutional neural networks.
Published: 2018

15. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Author: Gu, Chunhui, Sun, Chen, Ross, David A., Vondrick, Carl, Pantofaru, Caroline, Li, Yeqing, Vijayanarasimhan, Sudheendra, Toderici, George, Ricco, Susanna, Sukthankar, Rahul, Schmid, Cordelia, and Malik, Jitendra
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding., Comment: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for details
Published: 2017

16. Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

Author: Covell, Michele, Johnston, Nick, Minnen, David, Hwang, Sung Jin, Shor, Joel, Singh, Saurabh, Vincent, Damien, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and post-entropy coding, even using unstructured LZ77 code compression. The pre-LZ77 gains are achieved by trimming stop codes. The post-LZ77 gains are due to the highly unequal distributions of 0/1 codes from the SCT architectures. With these code compressions, the SCT architecture maintains or exceeds the image quality at all compression rates compared to JPEG and to RNN auto-encoders across the Kodak dataset. In addition, the SCT coding results in lower variance in image quality across the extent of the image, a characteristic that has been shown to be important in human ratings of image quality
Published: 2017

17. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Author: Johnston, Nick, Vincent, Damien, Minnen, David, Covell, Michele, Singh, Saurabh, Chinen, Troy, Hwang, Sung Jin, Shor, Joel, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network's hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks.
Published: 2017

18. YouTube-8M: A Large-Scale Video Classification Benchmark

Author: Abu-El-Haija, Sami, Kothari, Nisarg, Lee, Joonseok, Natsev, Paul, Toderici, George, Varadarajan, Balakrishnan, and Vijayanarasimhan, Sudheendra
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale. It is possible to train models over millions of examples within a few days. Although large-scale datasets exist for image understanding, such as ImageNet, there are no comparable size video classification datasets. In this paper, we introduce YouTube-8M, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities. To get the videos and their labels, we used a YouTube video annotation system, which labels videos with their main topics. While the labels are machine-generated, they have high-precision and are derived from a variety of human-based signals including metadata and query click signals. We filtered the video labels (Knowledge Graph entities) using both automated and manual curation strategies, including asking human raters if the labels are visually recognizable. Then, we decoded each video at one-frame-per-second, and used a Deep CNN pre-trained on ImageNet to extract the hidden representation immediately prior to the classification layer. Finally, we compressed the frame features and make both the features and video-level labels available for download. We trained various (modest) classification models on the dataset, evaluated them using popular evaluation metrics, and report them as baselines. Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow. We plan to release code for training a TensorFlow model and for computing metrics., Comment: 10 pages
Published: 2016

19. Full Resolution Image Compression with Recurrent Neural Networks

Author: Toderici, George, Vincent, Damien, Johnston, Nick, Hwang, Sung Jin, Minnen, David, Shor, Joel, and Covell, Michele
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study "one-shot" versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding., Comment: Updated with content for CVPR and removed supplemental material to an external link for size limitations
Published: 2016

20. Variable Rate Image Compression with Recurrent Neural Networks

Author: Toderici, George, O'Malley, Sean M., Hwang, Sung Jin, Vincent, Damien, Minnen, David, Baluja, Shumeet, Covell, Michele, and Sukthankar, Rahul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, Computer Science - Neural and Evolutionary Computing
Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$\times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more., Comment: Under review as a conference paper at ICLR 2016
Published: 2015

21. Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Author: Mori, Greg, Pantofaru, Caroline, Kothari, Nisarg, Leung, Thomas, Toderici, George, Toshev, Alexander, and Yang, Weilong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method.
Published: 2015

22. Efficient Large Scale Video Classification

Author: Varadarajan, Balakrishnan, Toderici, George, Vijayanarasimhan, Sudheendra, and Natsev, Apostol
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Neural and Evolutionary Computing
Abstract: Video classification has advanced tremendously over the recent years. A large part of the improvements in video classification had to do with the work done by the image classification community and the use of deep convolutional networks (CNNs) which produce competitive results with hand- crafted motion features. These networks were adapted to use video frames in various ways and have yielded state of the art classification results. We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost. In the context of large scale video processing, training CNNs on video frames is extremely time consuming, due to the large number of frames involved. We propose to avoid this problem by training CNNs on either YouTube thumbnails or Flickr images, and then using these networks' outputs as features for other higher level classifiers. We discuss the challenges of achieving this and propose two models for frame-level and video-level classification. The first is a highly efficient mixture of experts while the latter is based on long short term memory neural networks. We present results on the Sports-1M video dataset (1 million videos, 487 classes) and on a new dataset which has 12 million videos and 150,000 labels.
Published: 2015

23. Beyond Short Snippets: Deep Networks for Video Classification

Author: Ng, Joe Yue-Hei, Hausknecht, Matthew, Vijayanarasimhan, Sudheendra, Vinyals, Oriol, Monga, Rajat, and Toderici, George
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improvements over previously published results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.6% vs. 88.0%) and without additional optical flow information (82.6% vs. 72.8%).
Published: 2015

24. The 2nd YouTube-8M Large-Scale Video Understanding Challenge

Author: Lee, Joonseok, Natsev, Apostol (Paul), Reade, Walter, Sukthankar, Rahul, Toderici, George, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Leal-Taixé, Laura, editor, and Roth, Stefan, editor
Published: 2019
Full Text: View/download PDF

25. 3D-2D face recognition with pose and illumination normalization

Author: Kakadiaris, Ioannis A., Toderici, George, Evangelopoulos, Georgios, Passalis, Georgios, Chu, Dat, Zhao, Xi, Shah, Shishir K., and Theoharis, Theoharis
Published: 2017
Full Text: View/download PDF

26. Multi-Realism Image Compression with a Conditional Generator

Author: Agustsson, Eirikur, primary, Minnen, David, additional, Toderici, George, additional, and Mentzer, Fabian, additional
Published: 2023
Full Text: View/download PDF

27. Face Recognition, 3D-Based

Author: Kakadiaris, Ioannis A., Passalis, Georgios, Toderici, George, Perakis, Takis, Theoharis, Theoharis, Li, Stan Z., editor, and Jain, Anil K., editor
Published: 2015
Full Text: View/download PDF

28. UHDB11 Database for 3D-2D Face Recognition

Author: Toderici, George, Evangelopoulos, Georgios, Fang, Tianhong, Theoharis, Theoharis, Kakadiaris, Ioannis A., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Klette, Reinhard, editor, Rivera, Mariano, editor, and Satoh, Shin’ichi, editor
Published: 2014
Full Text: View/download PDF

29. Face Recognition, 3D-Based

Author: Kakadiaris, Ioannis A., Passalis, Georgios, Toderici, George, Perakis, Takis, Theoharis, Theoharis, Li, Stan Z., editor, and Jain, Anil, editor
Published: 2009
Full Text: View/download PDF

30. Quo Vadis: 3D Face and Ear Recognition?

Author: Kakadiaris, Ioannis A., Passalis, Georgios, Toderici, George, Murtuza, Mohammed N., Theoharis, Theoharis, Hammoud, Riad I., editor, Abidi, Besma R., editor, and Abidi, Mongi A., editor
Published: 2007
Full Text: View/download PDF

31. LVAC: Learned volumetric attribute compression for point clouds using coordinate based networks

Author: Isik, Berivan, Chou, Philip A., Hwang, Sung Jin, Johnston, Nick, and Toderici, George
Subjects: Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Graphics, Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Data_CODINGANDINFORMATIONTHEORY, Electrical Engineering and Systems Science - Image and Video Processing, Electrical Engineering and Systems Science - Signal Processing, Graphics (cs.GR), Machine Learning (cs.LG)
Abstract: We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to the network include both spatial coordinates and a latent vector per block. We represent the latent vectors using coefficients of the region-adaptive hierarchical transform (RAHT) used in the MPEG geometry-based point cloud codec G-PCC. The coefficients, which are highly compressible, are rate-distortion optimized by back-propagation through a rate-distortion Lagrangian loss in an auto-decoder configuration. The result outperforms RAHT by 2--4 dB. This is the first work to compress volumetric functions represented by local coordinate-based neural networks. As such, we expect it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields., 30 pages, 29 figures
Published: 2022
Full Text: View/download PDF

32. Unified 3D face and ear recognition using wavelets on geometry images

Author: Theoharis, Theoharis, Passalis, Georgios, Toderici, George, and Kakadiaris, Ioannis A.
Published: 2008
Full Text: View/download PDF

33. Ethnicity- and Gender-based Subject Retrieval Using 3-D Face-Recognition Techniques

Author: Toderici, George, O’Malley, Sean M., Passalis, George, Theoharis, Theoharis, and Kakadiaris, Ioannis A.
Published: 2010
Full Text: View/download PDF

34. UHDB11 Database for 3D-2D Face Recognition

Author: Toderici, George, primary, Evangelopoulos, Georgios, additional, Fang, Tianhong, additional, Theoharis, Theoharis, additional, and Kakadiaris, Ioannis A., additional
Published: 2014
Full Text: View/download PDF

35. Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach

Author: Kakadiaris, Ionnis A., Passalis, Georgios, Toderici, George, Murtuza, Mohammed N., Lu, Yunliang, Karampatziakis, Nikos, and Theoharis, Theoharis
Subjects: Biometric technology, Biometry -- Usage, Information storage and retrieval -- Methods
Abstract: In this paper, we present the computational tools and a hardware prototype for 3D face recognition. Full automation is provided through the use of advanced multistage alignment algorithms, resilience to facial expressions by employing a deformable model framework, and invariance to 3D capture devices through suitable preprocessing steps. In addition, scalability in both time and space is achieved by converting 3D facial scans into compact metadata. We present our results on the largest known, and now publicly available, Face Recognition Grand Challenge 3D facial database consisting of several thousand scans. To the best of our knowledge, this is the highest performance reported on the FRGC v2 database for the 3D modality. Index Terms--Face and gesture recognition, information search and retrieval.
Published: 2007

36. Nonlinear Transform Coding

Author: Balle, Johannes, primary, Chou, Philip A., additional, Minnen, David, additional, Singh, Saurabh, additional, Johnston, Nick, additional, Agustsson, Eirikur, additional, Hwang, Sung Jin, additional, and Toderici, George, additional
Published: 2021
Full Text: View/download PDF

37. End-to-End Learning of Compressible Features

Author: Singh, Saurabh, primary, Abu-El-Haija, Sami, additional, Johnston, Nick, additional, Balle, Johannes, additional, Shrivastava, Abhinav, additional, and Toderici, George, additional
Published: 2020
Full Text: View/download PDF

38. Scale-Space Flow for End-to-End Optimized Video Compression

Author: Agustsson, Eirikur, primary, Minnen, David, additional, Johnston, Nick, additional, Balle, Johannes, additional, Hwang, Sung Jin, additional, and Toderici, George, additional
Published: 2020
Full Text: View/download PDF

39. Image-Dependent Local Entropy Models for Learned Image Compression

Author: Minnen, David, primary, Toderici, George, additional, Singh, Saurabh, additional, Hwang, Sung Jin, additional, and Covell, Michele, additional
Published: 2018
Full Text: View/download PDF

40. Towards A Semantic Perceptual Image Metric

Author: Chinen, Troy, primary, Balle, Johannes, additional, Gu, Chunhui, additional, Hwang, Sung Jin, additional, Ioffe, Sergey, additional, Johnston, Nick, additional, Leung, Thomas, additional, Minnen, David, additional, O'Malley, Sean, additional, Rosenberg, Charles, additional, and Toderici, George, additional
Published: 2018
Full Text: View/download PDF

41. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Author: Johnston, Nick, primary, Vincent, Damien, additional, Minnen, David, additional, Covell, Michele, additional, Singh, Saurabh, additional, Chinen, Troy, additional, Jin Hwang, Sung, additional, Shor, Joel, additional, and Toderici, George, additional
Published: 2018
Full Text: View/download PDF

42. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions

Author: Gu, Chunhui, primary, Sun, Chen, additional, Ross, David A., additional, Vondrick, Carl, additional, Pantofaru, Caroline, additional, Li, Yeqing, additional, Vijayanarasimhan, Sudheendra, additional, Toderici, George, additional, Ricco, Susanna, additional, Sukthankar, Rahul, additional, Schmid, Cordelia, additional, and Malik, Jitendra, additional
Published: 2018
Full Text: View/download PDF

43. Full Resolution Image Compression with Recurrent Neural Networks

Author: Toderici, George, primary, Vincent, Damien, additional, Johnston, Nick, additional, Hwang, Sung Jin, additional, Minnen, David, additional, Shor, Joel, additional, and Covell, Michele, additional
Published: 2017
Full Text: View/download PDF

44. An Automated Method for Human Face Modeling and Relighting with Application to Face Recognition

Author: Toderici, George, Passalis, Georgios, Theoharis, Theoharis, Kakadiaris, Ioannis, Documentation, Inria Rhône-Alpes, and Peter Belhumeur and Katsushi Ikeuchi and Emmanuel Prados and Stefano Soatto and Peter Sturm
Subjects: [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV], ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, [INFO.INFO-GR] Computer Science [cs]/Graphics [cs.GR], ComputingMethodologies_COMPUTERGRAPHICS
Abstract: In this paper, we present a novel method for human face modeling and its application to face relighting and recognition. An annotated face model is fitted onto the raw 3D data using a subdivision-based deformable model framework. The fitted face model is subsequently converted to a geometry image representation. This results in regularly sampled, registered and annotated geometry data. The albedo of the skin is retrieved by using an analytical skin reflectance model that removes the lighting (shadows, diffuse and specular) from the texture. Additional provisions are made such that if the input contains over-saturated specular highlights, an inpainting method with texture synthesis is used as a post-processing step in order to estimate the texture. The method is fully automatic and uses as input only the 3D geometry and texture data of the face, as acquired by commercial 3D scanners. No measurement or calibration of the lighting environment is required. The method's fully automatic nature and its minimum input requirements make it applicable to both computer vision applications (e.g., face recognition) and computer graphics applications (i.e., relighting, face synthesis and facial expressions transfer). Moreover, it allows the utilization of existing 3D facial databases. We present very encouraging results on a challenging dataset .
Published: 2007

45. Quo Vadis: 3D Face and Ear Recognition?

Author: Kakadiaris, Ioannis A., primary, Passalis, Georgios, additional, Toderici, George, additional, Murtuza, Mohammed N., additional, and Theoharis, Theoharis, additional
Full Text: View/download PDF

46. Large-Scale Video Classification with Convolutional Neural Networks

Author: Karpathy, Andrej, primary, Toderici, George, additional, Shetty, Sanketh, additional, Leung, Thomas, additional, Sukthankar, Rahul, additional, and Fei-Fei, Li, additional
Published: 2014
Full Text: View/download PDF

47. Beyond short snippets: Deep networks for video classification.

Author: Joe Yue-Hei Ng, Hausknecht, Matthew, Vijayanarasimhan, Sudheendra, Vinyals, Oriol, Monga, Rajat, and Toderici, George
Published: 2015
Full Text: View/download PDF

48. Discriminative tag learning on YouTube videos with latent sub-tags

Author: Yang, Weilong, primary and Toderici, George, additional
Published: 2011
Full Text: View/download PDF

49. Finding meaning on YouTube: Tag recommendation and category discovery

Author: Toderici, George, primary, Aradhye, Hrishikesh, additional, Pasca, Marius, additional, Sbaiz, Luciano, additional, and Yagnik, Jay, additional
Published: 2010
Full Text: View/download PDF

50. Video2Text: Learning to Annotate Video Content

Author: Aradhye, Hrishikesh, primary, Toderici, George, additional, and Yagnik, Jay, additional
Published: 2009
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

117 results on '"Toderici, George"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources