34 results on '"Gavves E"'
Search Results
2. Learning Lie Group Symmetry Transformations with Neural Networks
- Author
-
Gabel, A, Klein, V, Valperga, R, Lamb, JSW, Webster, K, Quax, R, and Gavves, E
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (cs.LG) - Abstract
The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings., 9 pages, 5 figures, Proceedings of the 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. 2023
- Published
- 2023
3. WeakSTIL: weak whole-slide image level stromal tumor infiltrating lymphocyte scores are all you need
- Author
-
Schirris, Y., Engelaer, M., Panteli, A., Horlings, H.M., Gavves, E., Teuwen, J., Tomaszewski, J.E., Ward, A.D., Levenson, R.M., Video & Image Sense Lab (IvI, FNWI), and IvI Research (FNWI)
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Image and Video Processing (eess.IV) ,Computer Science - Computer Vision and Pattern Recognition ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
We present WeakSTIL, an interpretable two-stage weak label deep learning pipeline for scoring the percentage of stromal tumor infiltrating lymphocytes (sTIL%) in H&E-stained whole-slide images (WSIs) of breast cancer tissue. The sTIL% score is a prognostic and predictive biomarker for many solid tumor types. However, due to the high labeling efforts and high intra- and interobserver variability within and between expert annotators, this biomarker is currently not used in routine clinical decision making. WeakSTIL compresses tiles of a WSI using a feature extractor pre-trained with self-supervised learning on unlabeled histopathology data and learns to predict precise sTIL% scores for each tile in the tumor bed by using a multiple instance learning regressor that only requires a weak WSI-level label. By requiring only a weak label, we overcome the large annotation efforts required to train currently existing TIL detection methods. We show that WeakSTIL is at least as good as other TIL detection methods when predicting the WSI-level sTIL% score, reaching a coefficient of determination of $0.45\pm0.15$ when compared to scores generated by an expert pathologist, and an AUC of $0.89\pm0.05$ when treating it as the clinically interesting sTIL-high vs sTIL-low classification task. Additionally, we show that the intermediate tile-level predictions of WeakSTIL are highly interpretable, which suggests that WeakSTIL pays attention to latent features related to the number of TILs and the tissue type. In the future, WeakSTIL may be used to provide consistent and interpretable sTIL% predictions to stratify breast cancer patients into targeted therapy arms., 8 pages, 8 figures, 1 table, 4 pages supplementary
- Published
- 2022
- Full Text
- View/download PDF
4. Fully Automated Thrombus Segmentation on CT Images of Patients with Acute Ischemic Stroke
- Author
-
Mojtahedi, M., Kappelhof, M., Ponomareva, E., Tolhuisen, M., Jansen, I., Bruggeman, A. A. E., Dutra, B. G., Yo, L., Lecouffe, N., Hoving, J. W., van Voorst, H., Brouwer, J., Terreros, N. A., Konduri, P., Meijer, F. J. A., Appelman, A., Treurniet, K. M., Coutinho, J. M., Roos, Y., van Zwam, W., Dippel, D., Gavves, E., Emmer, B. J., Majoie, C., Marquering, H., Mojtahedi, M., Kappelhof, M., Ponomareva, E., Tolhuisen, M., Jansen, I., Bruggeman, A. A. E., Dutra, B. G., Yo, L., Lecouffe, N., Hoving, J. W., van Voorst, H., Brouwer, J., Terreros, N. A., Konduri, P., Meijer, F. J. A., Appelman, A., Treurniet, K. M., Coutinho, J. M., Roos, Y., van Zwam, W., Dippel, D., Gavves, E., Emmer, B. J., Majoie, C., and Marquering, H.
- Abstract
Thrombus imaging characteristics are associated with treatment success and functional outcomes in stroke patients. However, assessing these characteristics based on manual annotations is labor intensive and subject to observer bias. Therefore, we aimed to create an automated pipeline for consistent and fast full thrombus segmentation. We used multicenter, multi-scanner datasets of anterior circulation stroke patients with baseline NCCT and CTA for training (n = 228) and testing (n = 100). We first found the occlusion location using StrokeViewer LVO and created a bounding box around it. Subsequently, we trained dual modality U-Net based convolutional neural networks (CNNs) to segment the thrombus inside this bounding box. We experimented with: (1) U-Net with two input channels for NCCT and CTA, and U-Nets with two encoders where (2) concatenate, (3) add, and (4) weighted-sum operators were used for feature fusion. Furthermore, we proposed a dynamic bounding box algorithm to adjust the bounding box. The dynamic bounding box algorithm reduces the missed cases but does not improve Dice. The two-encoder U-Net with a weighted-sum feature fusion shows the best performance (surface Dice 0.78, Dice 0.62, and 4% missed cases). Final segmentation results have high spatial accuracies and can therefore be used to determine thrombus characteristics and potentially benefit radiologists in clinical practice.
- Published
- 2022
5. Continual Learning of Dynamical Systems with Competitive Federated Reservoir Computing
- Author
-
Bereska, L., Gavves, E., and Video & Image Sense Lab (IvI, FNWI)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Machine learning recently proved efficient in learning differential equations and dynamical systems from data. However, the data is commonly assumed to originate from a single never-changing system. In contrast, when modeling real-world dynamical processes, the data distribution often shifts due to changes in the underlying system dynamics. Continual learning of these processes aims to rapidly adapt to abrupt system changes without forgetting previous dynamical regimes. This work proposes an approach to continual learning based on reservoir computing, a state-of-the-art method for training recurrent neural networks on complex spatiotemporal dynamical systems. Reservoir computing fixes the recurrent network weights - hence these cannot be forgotten - and only updates linear projection heads to the output. We propose to train multiple competitive prediction heads concurrently. Inspired by neuroscience's predictive coding, only the most predictive heads activate, laterally inhibiting and thus protecting the inactive heads from forgetting induced by interfering parameter updates. We show that this multi-head reservoir minimizes interference and catastrophic forgetting on several dynamical systems, including the Van-der-Pol oscillator, the chaotic Lorenz attractor, and the high-dimensional Lorenz-96 weather model. Our results suggest that reservoir computing is a promising candidate framework for the continual learning of dynamical systems. We provide our code for data generation, method, and comparisons at \url{https://github.com/leonardbereska/multiheadreservoir}., Comment: CoLLAs 2022
- Published
- 2022
- Full Text
- View/download PDF
6. 3D Equivariant Graph Implicit Functions
- Author
-
Chen, Y., Fernando, B., Bilen, H., Nießner, M., Gavves, E., Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., and Video & Image Sense Lab (IvI, FNWI)
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology. In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. To this end, we introduce a novel family of graph implicit functions with equivariant layers that facilitates modeling fine local details and guaranteed robustness to various groups of geometric transformations, through local $k$-NN graph embeddings with sparse point set observations at multiple resolutions. Our method improves over the existing rotation-equivariant implicit function from 0.69 to 0.89 (IoU) on the ShapeNet reconstruction task. We also show that our equivariant implicit function can be extended to other types of similarity transformations and generalizes to unseen translations and scaling., Comment: Video: https://youtu.be/W7goOzZP2Kc
- Published
- 2022
- Full Text
- View/download PDF
7. Roto-translated Local Coordinate Frames For Interacting Dynamical Systems
- Author
-
Gavves, E., Kofinas, M., Nagaraja, N., Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., Wortman Vaughan, J., and Video & Image Sense Lab (IvI, FNWI)
- Abstract
Modelling interactions is critical in learning complex dynamical systems, namely systems of interacting objects with highly non-linear and time-dependent behaviour. A large class of such systems can be formalized as geometric graphs, i.e. graphs with nodes positioned in the Euclidean space given an arbitrarily chosen global coordinate system, for instance vehicles in a traffic scene. Notwithstanding the arbitrary global coordinate system, the governing dynamics of the respective dynamical systems are invariant to rotations and translations, also known as Galilean invariance. As ignoring these invariances leads to worse generalization, in this work we propose local coordinate systems per node-object to induce roto-translation invariance to the geometric graph of the interacting dynamical system. Further, the local coordinate systems allow for a natural definition of anisotropic filtering in graph neural networks. Experiments in traffic scenes, 3D motion capture, and colliding particles demonstrate the proposed approach comfortably outperforms the recent state-of-the-art.
- Published
- 2022
8. Spectral Smoothing Unveils Phase Transitions in Hierarchical Variational Autoencoders
- Author
-
Pervez, A., Gavves, E., and Video & Image Sense Lab (IvI, FNWI)
- Abstract
Variational autoencoders with deep hierarchies of stochastic layers have been known to suffer from the problem of posterior collapse, where the top layers fall back to the prior and become independent of input. We suggest that the hierarchical VAE objective explicitly includes the variance of the function parameterizing the mean and variance of the latent Gaussian distribution which itself is often a high variance function. Building on this we generalize VAE neural networks by incorporating a smoothing parameter motivated by Gaussian analysis to reduce higher frequency components and consequently the variance in parameterizing functions and show that this can help to solve the problem of posterior collapse. We further show that under such smoothing the VAE loss exhibits a phase transition, where the top layer KL divergence sharply drops to zero at a critical value of the smoothing parameter that is similar for the same model across datasets. We validate the phenomenon across model configurations and datasets.
- Published
- 2021
9. Neural Feature Matching in Implicit 3D Representations
- Author
-
Chen, Y., Fernando, B., Bilen, H., Mensink, T., Gavves, E., Video & Image Sense Lab (IvI, FNWI), and IvI Research (FNWI)
- Abstract
Recently, neural implicit functions have achieved impressive results for encoding 3D shapes. Conditioning on low-dimensional latent codes generalises a single implicit function to learn shared representation space for a variety of shapes, with the advantage of smooth interpolation. While the benefits from the global latent space do not correspond to explicit points at local level, we propose to track the continuous point trajectory by matching implicit features with the latent code interpolating between shapes, from which we corroborate the hierarchical functionality of the deep implicit functions, where early layers map the latent code to fitting the coarse shape structure, and deeper layers further refine the shape details. Furthermore, the structured representation space of implicit functions enables to apply feature matching for shape deformation, with the benefits to handle topology and semantics inconsistency, such as from an armchair to a chair with no arms, without explicit flow functions or manual annotations.
- Published
- 2021
10. 1240P Multi-centric validation of an AI-based sTIL% scoring model for breast cancer H&E whole-slide images
- Author
-
Schirris, Y., Voorthuis, R.A.B., Opdam, M., Gavves, E., de Menezes, R., Linn, S., and Horlings, H.
- Published
- 2023
- Full Text
- View/download PDF
11. Automated Final Lesion Segmentation in Posterior Circulation Acute Ischemic Stroke Using Deep Learning
- Author
-
Zoetmulder, R, Konduri, PR, Obdeijn, I, Gavves, E, Isgum, I, Majoie, CBLM, Dippel, DWJ, Roos, YBWEM, Goyal, M, Mitchell, PJ, Campbell, BC, Lopes, DK, Reimann, G, Jovin, TG, Saver, JL, Muir, KW, White, P, Bracard, S, Chen, B, Brown, S, Schonewille, WJ, van der Hoeven, E, Puetz, V, Marquering, HA, Zoetmulder, R, Konduri, PR, Obdeijn, I, Gavves, E, Isgum, I, Majoie, CBLM, Dippel, DWJ, Roos, YBWEM, Goyal, M, Mitchell, PJ, Campbell, BC, Lopes, DK, Reimann, G, Jovin, TG, Saver, JL, Muir, KW, White, P, Bracard, S, Chen, B, Brown, S, Schonewille, WJ, van der Hoeven, E, Puetz, V, and Marquering, HA
- Abstract
Final lesion volume (FLV) is a surrogate outcome measure in anterior circulation stroke (ACS). In posterior circulation stroke (PCS), this relation is plausibly understudied due to a lack of methods that automatically quantify FLV. The applicability of deep learning approaches to PCS is limited due to its lower incidence compared to ACS. We evaluated strategies to develop a convolutional neural network (CNN) for PCS lesion segmentation by using image data from both ACS and PCS patients. We included follow-up non-contrast computed tomography scans of 1018 patients with ACS and 107 patients with PCS. To assess whether an ACS lesion segmentation generalizes to PCS, a CNN was trained on ACS data (ACS-CNN). Second, to evaluate the performance of only including PCS patients, a CNN was trained on PCS data. Third, to evaluate the performance when combining the datasets, a CNN was trained on both datasets. Finally, to evaluate the performance of transfer learning, the ACS-CNN was fine-tuned using PCS patients. The transfer learning strategy outperformed the other strategies in volume agreement with an intra-class correlation of 0.88 (95% CI: 0.83-0.92) vs. 0.55 to 0.83 and a lesion detection rate of 87% vs. 41-77 for the other strategies. Hence, transfer learning improved the FLV quantification and detection rate of PCS lesions compared to the other strategies.
- Published
- 2021
12. Siamese Tracking of Cell Behaviour Patterns
- Author
-
Panteli, A., Gupta, D.K., de Bruijn, N., Gavves, E., and Intelligent Sensory Information Systems (IVI, FNWI)
- Abstract
Tracking and segmentation of biological cells in video sequences is a challenging problem, especially due to the similarity of the cells and high levels of inherent noise. Most machine learning-based approaches lack robustness and suffer from sensitivity to less prominent events such as mitosis, apoptosis and cell collisions. Due to the large variance in medical image characteristics, most approaches are dataset-specific and do not generalise well on other datasets. In this paper, we propose a simple end-to-end cascade neural architecture that can effectively model the movement behaviour of biological cells and predict collision and mitosis events. Our approach uses U-Net for an initial segmentation which is then improved through processing by a siamese tracker capable of matching each cell along the temporal axis. By facilitating the re-segmentation of collided and mitotic cells, our method demonstrates its capability to handle volatile trajectories and unpredictable cell locations while being invariant to cell morphology. We demonstrate that our tracking approach achieves state-of-the-art results on PhC-C2DL-PSC and Fluo-N2DH-SIM+ datasets and ranks second on the DIC-C2DH-HeLa dataset of the cell tracking challenge benchmarks.
- Published
- 2020
13. Low Bias Low Variance Gradient Estimates for Boolean Stochastic Networks
- Author
-
Pervez, A., Cohen, T., Gavves, E., Intelligent Sensory Information Systems (IVI, FNWI), and Amsterdam Machine Learning lab (IVI, FNWI)
- Abstract
Stochastic neural networks with discrete random variables are an important class of models for their expressiveness and interpretability. Since direct differentiation and backpropagation is not possible, Monte Carlo gradient estimation techniques are a popular alternative. Efficient stochastic gradient estimators, such Straight-Through and Gumbel-Softmax, work well for shallow stochastic models. Their performance, however, suffers with hierarchical, more complex models. We focus on stochastic networks with Boolean latent variables. To analyze such networks, we introduce the framework of harmonic analysis for Boolean functions to derive an analytic formulation for the bias and variance in the Straight-Through estimator. Exploiting these formulations, we propose \emph{FouST}, a low-bias and low-variance gradient estimation algorithm that is just as efficient. Extensive experiments show that FouST performs favorably compared to state-of-the-art biased estimators and is much faster than unbiased ones.
- Published
- 2020
14. The Seventh Visual Object Tracking VOT2019 Challenge Results
- Author
-
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Čehovin Zajc, L., Drbohlav, O., Lukežič, A., Berg, A., Eldesokey, A., Käpylä, J., Fernández, G., Gonzalez-Garcia, A., Memarmoghadam, A., Lu, A., He, A., Varfolomieiev, A., Chan, A., Shekhar Tripathi, A., Smeulders, A., Suraj Pedasingu, B., Chen, B.X., Zhang, B., Wu, B., Li, B., He, B., Yan, B., Bai, B., Kim, B.H., Ma, C., Fang, C., Qian, C., Chen, C., Li, C., Zhang, C., Tsai, C.-Y., Luo, C., Micheloni, C., Tao, D., Gupta, D., Song, D., Wang, D., Gavves, E., Yi, E., Khan, F.S., Zhang, F., Wang, F., Zhao, F., De Ath, G., Bhat, G., Chen, G., Wang, G., Li, G., Cevikalp, H., Du, H., Zhao, H., Saribas, H., Jung, H.M., Bai, H., Hu, H., Peng, H., Lu, H., Li, H., Li, J., Fu, J., Chen, J., Gao, J., Zhao, J., Tang, J., Wu, J., Liu, J., Wang, J., Qi, J., Zhang, J., Tsotsos, J.K., Lee, J.H., van de Weijer, J., Kittler, J., Zhuang, J., Zhang, K., Wang, K., Dai, K., Chen, L., Liu, L., Guo, L., Zhang, L., Wang, L., Zhou, L., Zheng, L., Rout, L., Van Gool, L., Bertinetto, L., Danelljan, M., Dunnhofer, M., Ni, M., Kim, M.Y., Tang, M., Yang, M.-H., Paluru, N., Martinel, N., Xu, P., Zhang, P., Zheng, P., Torr, P.H.S., Zhang, Q., Wang, Q., Guo, Q., Timofte, R., Gorthi, R.K., Everson, R., Han, R., Zhang, R., You, S., Zhao, S.-C., Zhao, S., Li, S., Ge, S., Bai, S., Guan, S., Xing, T., Xu, T., Yang, T., Zhang, T., Vojíř, T., Feng, W., Hu, W., Wang, W., Tang, W., Zeng, W., Liu, W., Chen, X., Qiu, X., Bai, X., Wu, X.-J., Yang, X., Li, X., Sun, X., Tian, X., Tang, X., Zhu, X.-F., Huang, Y., Chen, Y., Lian, Y., Gu, Y., Liu, Y., Zhang, Y., Xu, Y., Wang, Y., Li, Y., Zhou, Y., Dong, Y., Wang, Z., Luo, Z., Zhang, Z., Feng, Z.-H., He, Z., Song, Z., Chen, Z., Wu, Z., Xiong, Z., Huang, Z., Teng, Z., Ni, Z., and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
Source code ,Computer science ,business.industry ,media_common.quotation_subject ,Object tracking ,Performance evaluation ,VOT challenge ,020206 networking & telecommunications ,02 engineering and technology ,Visualization ,Datorseende och robotik (autonoma system) ,Robustness (computer science) ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Computer Vision and Robotics (Autonomous Systems) ,media_common - Abstract
The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOT-ST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2019 focused on long-term tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard short-term, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website(1). Funding Agencies|Slovenian research agencySlovenian Research Agency - Slovenia [J2-8175, P2-0214, P2-0094]; Czech Science Foundation Project GACR [P103/12/G084]; MURI project - MoD/DstlMURI; EPSRCEngineering & Physical Sciences Research Council (EPSRC) [EP/N019415/1]; WASP; VR (ELLIIT, LAST, and NCNN); SSF (SymbiCloud); AIT Strategic Research Programme; Faculty of Computer Science, University of Ljubljana, Slovenia
- Published
- 2019
- Full Text
- View/download PDF
15. Relaxed Quantization for Discretized Neural Networks
- Author
-
Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M., Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
Optimization ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Stochastic systems ,Gradient descent ,Differentiability ,Large models ,Machine Learning (stat.ML) ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Machine Learning (cs.LG) ,Gradient methods ,Loss of performance ,Statistics - Machine Learning ,Continuous distribution ,Resourceconstrained devices ,Computer Science::Databases ,Gradient-based optimization - Abstract
Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices. In order to train networks that can be effectively discretized without loss of performance, we introduce a differentiable quantization procedure. Differentiability can be achieved by transforming continuous distributions over the weights and activations of the network to categorical distributions over the quantization grid. These are subsequently relaxed to continuous surrogates that can allow for efficient gradient-based optimization. We further show that stochastic rounding can be seen as a special case of the proposed approach and that under this formulation the quantization grid itself can also be optimized with gradient descent. We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classification. © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved.
- Published
- 2019
16. Initialized Equilibrium Propagation for Backprop-Free Training
- Author
-
O'Connor, P., Gavves, E., Welling, M., Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
- Abstract
Deep neural networks are almost universally trained with reverse-mode automatic differentiation (a.k.a. backpropagation). Biological networks, on the other hand, appear to lack any mechanism for sending gradients back to their input neurons, and thus cannot be learning in this way. In response to this, Scellier & Bengio (2017) proposed Equilibrium Propagation - a method for gradient-based train- ing of neural networks which uses only local learning rules and, crucially, does not rely on neurons having a mechanism for back-propagating an error gradient. Equilibrium propagation, however, has a major practical limitation: inference involves doing an iterative optimization of neural activations to find a fixed-point, and the number of steps required to closely approximate this fixed point scales poorly with the depth of the network. In response to this problem, we propose Initialized Equilibrium Propagation, which trains a feedforward network to initialize the iterative inference procedure for Equilibrium propagation. This feed-forward network learns to approximate the state of the fixed-point using a local learning rule. After training, we can simply use this initializing network for inference, resulting in a learned feedforward network. Our experiments show that this network appears to work as well or better than the original version of Equilibrium propagation. This shows how we might go about training deep networks without using backpropagation.
- Published
- 2019
17. Training a Spiking Neural Network with Equilibrium Propagation
- Author
-
O'Connor, P., Gavves, E., Welling, M., Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
Quantitative Biology::Neurons and Cognition ,Computer Science::Neural and Evolutionary Computation - Abstract
Backpropagation is almost universally used to train artificial neural networks. However, there are several reasons that backpropagation could not be plausibly implemented by biological neurons. Among these are the facts that (1) biological neurons appear to lack any mechanism for sending gradients backwards across synapses, and (2) biological “spiking” neurons emit binary signals, whereas back-propagation requires that neurons communicate continuous values between one another. Recently, Scellier and Bengio [2017], demonstrated an alternative to backpropagation, called Equilibrium Propagation, wherein gradients are implicitly computed by the dynamics of the neural network, so that neurons do not need an internal mechanism for backpropagation of gradients. This provides an interesting solution to problem (1). In this paper, we address problem (2) by proposing a way in which Equilibrium Propagation can be implemented with neurons which are constrained to just communicate binary values at each time step. We show that with appropriate step-size annealing, we can converge to the same fixed-point as a real-valued neural network, and that with predictive coding, we can make this convergence much faster. We demonstrate that the resulting model can be used to train a spiking neural network using the update scheme from Equilibrium propagation.
- Published
- 2019
18. Long-Term Tracking in the Wild: A Benchmark
- Author
-
Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W.M., Torr, P.H.S., Gavves, E., Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,02 engineering and technology ,Tracking (particle physics) ,Object (computer science) ,Machine learning ,computer.software_genre ,Field (computer science) ,Term (time) ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms. Benchmarks have enabled great strides in the field of object tracking by defining standardized evaluations on large sets of diverse videos. However, these works have focused exclusively on sequences that are just tens of seconds in length and in which the target is always visible. Consequently, most researchers have designed methods tailored to this "short-term" scenario, which is poorly representative of practitioners' needs. Aiming to address this disparity, we compile a long-term, large-scale tracking dataset of sequences with average length greater than two minutes and with frequent target object disappearance. The OxUvA dataset is much larger than the object tracking datasets of recent years: it comprises 366 sequences spanning 14 hours of video. We assess the performance of several algorithms, considering both the ability to locate the target and to determine whether it is present or absent. Our goal is to offer the community a large and diverse benchmark to enable the design and evaluation of tracking methods ready to be used "in the wild". The project website is http://oxuva.net, Comment: To appear at ECCV 2018
- Published
- 2018
- Full Text
- View/download PDF
19. Online Action Detection
- Author
-
De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C., Tuytelaars, T., Leibe, B., Matas, J., Sebe, N., Welling, M., and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
business.industry ,Computer science ,020207 software engineering ,02 engineering and technology ,Variation (game tree) ,Machine learning ,computer.software_genre ,Action (philosophy) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Truncation (statistics) ,business ,Protocol (object-oriented programming) ,computer - Abstract
In online action detection, the goal is to detect the start of an action in a video stream as soon as it happens. For instance, if a child is chasing a ball, an autonomous car should recognize what is going on and respond immediately. This is a very challenging problem for four reasons. First, only partial actions are observed. Second, there is a large variability in negative data. Third, the start of the action is unknown, so it is unclear over what time window the information should be integrated. Finally, in real world data, large within-class variability exists. This problem has been addressed before, but only to some extent. Our contributions to online action detection are threefold. First, we introduce a realistic dataset composed of 27 episodes from 6 popular TV series. The dataset spans over 16 h of footage annotated with 30 action classes, totaling 6,231 action instances. Second, we analyze and compare various baseline methods, showing this is a challenging problem for which none of the methods provides a good solution. Third, we analyze the change in performance when there is a variation in viewpoint, occlusion, truncation, etc. We introduce an evaluation protocol for fair comparison. The dataset, the baselines and the models will all be made publicly available to encourage (much needed) further research on online action detection on realistic data.
- Published
- 2016
- Full Text
- View/download PDF
20. Attributes Make Sense on Segmented Objects
- Author
-
Li, Z., Gavves, E., Mensink, T., Snoek, C.G.M., Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
Basis (linear algebra) ,business.industry ,Computer science ,Novelty ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object (computer science) ,Machine learning ,computer.software_genre ,Oracle ,Image (mathematics) ,Ranking ,Segmentation ,Artificial intelligence ,business ,Scale (map) ,computer - Abstract
In this paper we aim for object classification and segmentation by attributes. Where existing work considers attributes either for the global image or for the parts of the object, we propose, as our first novelty, to learn and extract attributes on segments containing the entire object. Object-level attributes suffer less from accidental content around the object and accidental image conditions such as partial occlusions, scale changes and viewpoint changes. As our second novelty, we propose joint learning for simultaneous object classification and segment proposal ranking, solely on the basis of attributes. This naturally brings us to our third novelty: object-level attributes for zero-shot, where we use attribute descriptions of unseen classes for localizing their instances in new images and classifying them accordingly. Results on the Caltech UCSD Birds, Leeds Butterflies, and an a-Pascal subset demonstrate that i) extracting attributes on oracle object-level brings substantial benefits ii) our joint learning model leads to accurate attribute-based classification and segmentation, approaching the oracle results and iii) object-level attributes also allow for zero-shot classification and segmentation.We conclude that attributes make sense on segmented objects.
- Published
- 2014
21. Nuances in visual recognition
- Author
-
Gavves, E., Smeulders, Arnold, Snoek, Cees, and Intelligent Sensory Information Systems (IVI, FNWI)
- Published
- 2014
22. The MediaMill TRECVID 2010 semantic video search engine
- Author
-
Snoek, C., Sande, K.E.A., Rooij, O., Huurnink, B., Gavves, E., Odijk, D., Rijke, Maarten, Gevers, T., Worring, Marcel, Koelma, D.C., Smeulders, Arnold, Information and Language Processing Syst (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
- Subjects
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION - Abstract
In this paper we describe our TRECVID 2010 video retrieval experiments. The MediaMill team participated in three tasks: semantic indexing, known-item search, and instance search. The starting point for the MediaMill concept detection approach is our top-performing bag-of-words system of TRECVID 2009, which uses multiple color SIFT descriptors, sparse codebooks with spatial pyramids, kernel-based machine learning, and multi-frame video processing. We improve upon this baseline system by further speeding up its execution times for both training and classification using GPU-optimized algorithms, approximated histogram intersection kernels, and several multi-frame combination methods. Being more efficient allowed us to supplement the Internet video training collection with positively labeled examples from international news broadcasts and Dutch documentary video from the TRECVID 2005-2009 benchmarks. Our experimental setup covered a huge training set of 170 thousand keyframes and a test set of 600 thousand keyframes in total. Ultimately leading to 130 robust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justifies the need to rely on as many auxiliary information channels as possible. For automatic known item search we therefore explore how we can learn to rank various information channels simultaneously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive known item search experiments investigate how to combine metadata search and visualization into a single interface. The 2010 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for concept detection in the semantic indexing task. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most important lessons at the end of this paper.
- Published
- 2010
23. Visual Synonyms for Landmark Image Retrieval
- Author
-
Gavves, E., Snoek, C., Smeulders, A.W.M. (Arnold), Gavves, E., Snoek, C., and Smeulders, A.W.M. (Arnold)
- Published
- 2012
24. Convex Reduction of High-Dimensional Kernels for Visual Classification
- Author
-
Gavves, E., Snoek, C., Smeulders, A.W.M. (Arnold), Gavves, E., Snoek, C., and Smeulders, A.W.M. (Arnold)
- Published
- 2012
25. Personalizing Automated Image Annotation Using Cross-Entropy
- Author
-
Li, X. (Xirong), Gavves, E., Snoek, C., Worring, M. (Marcel), Smeulders, A.W.M. (Arnold), Li, X. (Xirong), Gavves, E., Snoek, C., Worring, M. (Marcel), and Smeulders, A.W.M. (Arnold)
- Published
- 2011
26. Development and external evaluation of a self-learning auto-segmentation model for Colorectal Cancer Liver Metastases Assessment (COALA).
- Author
-
Bereska JI, Zeeuw M, Wagenaar L, Jenssen HB, Wesdorp NJ, van der Meulen D, Bereska LF, Gavves E, Janssen BV, Besselink MG, Marquering HA, van Waesberghe JTM, Aghayan DL, Pelanis E, van den Bergh J, Nota IIM, Moos S, Kemmerich G, Syversveen T, Kolrud FK, Huiskens J, Swijnenburg RJ, Punt CJA, Stoker J, Edwin B, Fretland ÅA, Kazemier G, and Verpalen IM
- Abstract
Objectives: Total tumor volume (TTV) is associated with overall and recurrence-free survival in patients with colorectal cancer liver metastases (CRLM). However, the labor-intensive nature of such manual assessments has hampered the clinical adoption of TTV as an imaging biomarker. This study aimed to develop and externally evaluate a CRLM auto-segmentation model on CT scans, to facilitate the clinical adoption of TTV., Methods: We developed an auto-segmentation model to segment CRLM using 783 contrast-enhanced portal venous phase CTs (CT-PVP) of 373 patients. We used a self-learning setup whereby we first trained a teacher model on 99 manually segmented CT-PVPs from three radiologists. The teacher model was then used to segment CRLM in the remaining 663 CT-PVPs for training the student model. We used the DICE score and the intraclass correlation coefficient (ICC) to compare the student model's segmentations and the TTV obtained from these segmentations to those obtained from the merged segmentations. We evaluated the student model in an external test set of 50 CT-PVPs from 35 patients from the Oslo University Hospital and an internal test set of 21 CT-PVPs from 10 patients from the Amsterdam University Medical Centers., Results: The model reached a mean DICE score of 0.85 (IQR: 0.05) and 0.83 (IQR: 0.10) on the internal and external test sets, respectively. The ICC between the segmented volumes from the student model and from the merged segmentations was 0.97 on both test sets., Conclusion: The developed colorectal cancer liver metastases auto-segmentation model achieved a high DICE score and near-perfect agreement for assessing TTV., Critical Relevance Statement: AI model segments colorectal liver metastases on CT with high performance on two test sets. Accurate segmentation of colorectal liver metastases could facilitate the clinical adoption of total tumor volume as an imaging biomarker for prognosis and treatment response monitoring., Key Points: Developed colorectal liver metastases segmentation model to facilitate total tumor volume assessment. Model achieved high performance on internal and external test sets. Model can improve prognostic stratification and treatment planning for colorectal liver metastases., Competing Interests: Declarations. Ethics approval and consent to participate: The Medical Ethics Review Committee of the Amsterdam UMC, the Regional Ethical Committee of Norway, and the Data Protection Officer of Oslo University Hospital approved this study protocol. All patients were managed per institutional practices. All patients signed a written informed consent form permitting the use of their data for studies. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
27. Deep-Learning-Based Thrombus Localization and Segmentation in Patients with Posterior Circulation Stroke.
- Author
-
Zoetmulder R, Bruggeman AAE, Išgum I, Gavves E, Majoie CBLM, Beenen LFM, Dippel DWJ, Boodt N, den Hartog SJ, van Doormaal PJ, Cornelissen SAP, Roos YBWEM, Brouwer J, Schonewille WJ, Pirson AFV, van Zwam WH, van der Leij C, Brans RJB, van Es ACGM, and Marquering HA
- Abstract
Thrombus volume in posterior circulation stroke (PCS) has been associated with outcome, through recanalization. Manual thrombus segmentation is impractical for large scale analysis of image characteristics. Hence, in this study we develop the first automatic method for thrombus localization and segmentation on CT in patients with PCS. In this multi-center retrospective study, 187 patients with PCS from the MR CLEAN Registry were included. We developed a convolutional neural network (CNN) that segments thrombi and restricts the volume-of-interest (VOI) to the brainstem (Polar-UNet). Furthermore, we reduced false positive localization by removing small-volume objects, referred to as volume-based removal (VBR). Polar-UNet is benchmarked against a CNN that does not restrict the VOI (BL-UNet). Performance metrics included the intra-class correlation coefficient (ICC) between automated and manually segmented thrombus volumes, the thrombus localization precision and recall, and the Dice coefficient. The majority of the thrombi were localized. Without VBR, Polar-UNet achieved a thrombus localization recall of 0.82, versus 0.78 achieved by BL-UNet. This high recall was accompanied by a low precision of 0.14 and 0.09. VBR improved precision to 0.65 and 0.56 for Polar-UNet and BL-UNet, respectively, with a small reduction in recall to 0.75 and 0.69. The Dice coefficient achieved by Polar-UNet was 0.44, versus 0.38 achieved by BL-UNet with VBR. Both methods achieved ICCs of 0.41 (95% CI: 0.27-0.54). Restricting the VOI to the brainstem improved the thrombus localization precision, recall, and segmentation overlap compared to the benchmark. VBR improved thrombus localization precision but lowered recall.
- Published
- 2022
- Full Text
- View/download PDF
28. Fully Automated Thrombus Segmentation on CT Images of Patients with Acute Ischemic Stroke.
- Author
-
Mojtahedi M, Kappelhof M, Ponomareva E, Tolhuisen M, Jansen I, Bruggeman AAE, Dutra BG, Yo L, LeCouffe N, Hoving JW, van Voorst H, Brouwer J, Terreros NA, Konduri P, Meijer FJA, Appelman A, Treurniet KM, Coutinho JM, Roos Y, van Zwam W, Dippel D, Gavves E, Emmer BJ, Majoie C, and Marquering H
- Abstract
Thrombus imaging characteristics are associated with treatment success and functional outcomes in stroke patients. However, assessing these characteristics based on manual annotations is labor intensive and subject to observer bias. Therefore, we aimed to create an automated pipeline for consistent and fast full thrombus segmentation. We used multi-center, multi-scanner datasets of anterior circulation stroke patients with baseline NCCT and CTA for training (n = 228) and testing (n = 100). We first found the occlusion location using StrokeViewer LVO and created a bounding box around it. Subsequently, we trained dual modality U-Net based convolutional neural networks (CNNs) to segment the thrombus inside this bounding box. We experimented with: (1) U-Net with two input channels for NCCT and CTA, and U-Nets with two encoders where (2) concatenate, (3) add, and (4) weighted-sum operators were used for feature fusion. Furthermore, we proposed a dynamic bounding box algorithm to adjust the bounding box. The dynamic bounding box algorithm reduces the missed cases but does not improve Dice. The two-encoder U-Net with a weighted-sum feature fusion shows the best performance (surface Dice 0.78, Dice 0.62, and 4% missed cases). Final segmentation results have high spatial accuracies and can therefore be used to determine thrombus characteristics and potentially benefit radiologists in clinical practice.
- Published
- 2022
- Full Text
- View/download PDF
29. Automated Final Lesion Segmentation in Posterior Circulation Acute Ischemic Stroke Using Deep Learning.
- Author
-
Zoetmulder R, Konduri PR, Obdeijn IV, Gavves E, Išgum I, Majoie CBLM, Dippel DWJ, Roos YBWEM, Goyal M, Mitchell PJ, Campbell BCV, Lopes DK, Reimann G, Jovin TG, Saver JL, Muir KW, White P, Bracard S, Chen B, Brown S, Schonewille WJ, van der Hoeven E, Puetz V, and Marquering HA
- Abstract
Final lesion volume (FLV) is a surrogate outcome measure in anterior circulation stroke (ACS). In posterior circulation stroke (PCS), this relation is plausibly understudied due to a lack of methods that automatically quantify FLV. The applicability of deep learning approaches to PCS is limited due to its lower incidence compared to ACS. We evaluated strategies to develop a convolutional neural network (CNN) for PCS lesion segmentation by using image data from both ACS and PCS patients. We included follow-up non-contrast computed tomography scans of 1018 patients with ACS and 107 patients with PCS. To assess whether an ACS lesion segmentation generalizes to PCS, a CNN was trained on ACS data (ACS-CNN). Second, to evaluate the performance of only including PCS patients, a CNN was trained on PCS data. Third, to evaluate the performance when combining the datasets, a CNN was trained on both datasets. Finally, to evaluate the performance of transfer learning, the ACS-CNN was fine-tuned using PCS patients. The transfer learning strategy outperformed the other strategies in volume agreement with an intra-class correlation of 0.88 (95% CI: 0.83-0.92) vs. 0.55 to 0.83 and a lesion detection rate of 87% vs. 41-77 for the other strategies. Hence, transfer learning improved the FLV quantification and detection rate of PCS lesions compared to the other strategies.
- Published
- 2021
- Full Text
- View/download PDF
30. Unsharp Mask Guided Filtering.
- Author
-
Shi Z, Chen Y, Gavves E, Mettes P, and Snoek CGM
- Abstract
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering by means of an additional guidance image. Where classical guided filters transfer structures using hand-designed functions, recent guided filters have been considerably advanced through parametric learning of deep networks. The state-of-the-art leverages deep networks to estimate the two core coefficients of the guided filter. In this work, we posit that simultaneously estimating both coefficients is suboptimal, resulting in halo artifacts and structure inconsistencies. Inspired by unsharp masking, a classical technique for edge enhancement that requires only a single coefficient, we propose a new and simplified formulation of the guided filter. Our formulation enjoys a filtering prior from a low-pass filter and enables explicit structure transfer by estimating a single coefficient. Based on our proposed formulation, we introduce a successive guided filtering network, which provides multiple filtering results from a single network, allowing for a trade-off between accuracy and efficiency. Extensive ablations, comparisons and analysis show the effectiveness and efficiency of our formulation and network, resulting in state-of-the-art results across filtering tasks like upsampling, denoising, and cross-modality filtering. Code is available at https://github.com/shizenglin/Unsharp-Mask-Guided-Filtering.
- Published
- 2021
- Full Text
- View/download PDF
31. Automatic Triage of 12-Lead ECGs Using Deep Convolutional Neural Networks.
- Author
-
van de Leur RR, Blom LJ, Gavves E, Hof IE, van der Heijden JF, Clappers NC, Doevendans PA, Hassink RJ, and van Es R
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Automation, Clinical Decision-Making, Female, Heart Diseases physiopathology, Heart Diseases therapy, Humans, Male, Middle Aged, Predictive Value of Tests, Reproducibility of Results, Young Adult, Deep Learning, Electrocardiography, Heart Diseases diagnosis, Signal Processing, Computer-Assisted, Triage
- Abstract
BACKGROUND The correct interpretation of the ECG is pivotal for the accurate diagnosis of many cardiac abnormalities, and conventional computerized interpretation has not been able to reach physician-level accuracy in detecting (acute) cardiac abnormalities. This study aims to develop and validate a deep neural network for comprehensive automated ECG triage in daily practice. METHODS AND RESULTS We developed a 37-layer convolutional residual deep neural network on a data set of free-text physician-annotated 12-lead ECGs. The deep neural network was trained on a data set with 336.835 recordings from 142.040 patients and validated on an independent validation data set (n=984), annotated by a panel of 5 cardiologists electrophysiologists. The 12-lead ECGs were acquired in all noncardiology departments of the University Medical Center Utrecht. The algorithm learned to classify these ECGs into the following 4 triage categories: normal, abnormal not acute, subacute, and acute. Discriminative performance is presented with overall and category-specific concordance statistics, polytomous discrimination indexes, sensitivities, specificities, and positive and negative predictive values. The patients in the validation data set had a mean age of 60.4 years and 54.3% were men. The deep neural network showed excellent overall discrimination with an overall concordance statistic of 0.93 (95% CI, 0.92-0.95) and a polytomous discriminatory index of 0.83 (95% CI, 0.79-0.87). CONCLUSIONS This study demonstrates that an end-to-end deep neural network can be accurately trained on unstructured free-text physician annotations and used to consistently triage 12-lead ECGs. When further fine-tuned with other clinical outcomes and externally validated in clinical practice, the demonstrated deep learning-based ECG interpretation can potentially improve time to treatment and decrease healthcare burden.
- Published
- 2020
- Full Text
- View/download PDF
32. Action Recognition with Dynamic Image Networks.
- Author
-
Bilen H, Fernando B, Gavves E, and Vedaldi A
- Abstract
We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of 'rank pooling'. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. We call the resulting representation dynamic image because it summarizes the video dynamics in addition to appearance. This powerful idea allows to convert any video to an image so that existing CNN models pre-trained with still images can be immediately extended to videos. We also present an efficient approximate rank pooling operator that runs two orders of magnitude faster than the standard ones with any loss in ranking performance and can be formulated as a CNN layer. To demonstrate the power of the representation, we introduce a novel four stream CNN architecture which can learn from RGB and optical flow frames as well as from their dynamic image representations. We show that the proposed network achieves state-of-the-art performance, 95.5 and 72.5 percent accuracy, in the UCF101 and HMDB51, respectively.
- Published
- 2018
- Full Text
- View/download PDF
33. Reflectance and Natural Illumination from Single-Material Specular Objects Using Deep Learning.
- Author
-
Georgoulis S, Rematas K, Ritschel T, Gavves E, Fritz M, Van Gool L, and Tuytelaars T
- Abstract
In this paper, we present a method that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination. We follow a data-driven, learning-based approach trained on a very large dataset, but in contrast to earlier work we do not assume one or more components (shape, reflectance, or illumination) to be known. We propose a two-step approach, where we first estimate the object's reflectance map, and then further decompose it into reflectance and illumination. For the first step, we introduce a Convolutional Neural Network (CNN) that directly predicts a reflectance map from the input image itself, as well as an indirect scheme that uses additional supervision, first estimating surface orientation and afterwards inferring the reflectance map using a learning-based sparse data interpolation technique. For the second step, we suggest a CNN architecture to reconstruct both Phong reflectance parameters and high-resolution spherical illumination maps from the reflectance map. We also propose new datasets to train these CNNs. We demonstrate the effectiveness of our approach for both steps by extensive quantitative and qualitative evaluation in both synthetic and real data as well as through numerous applications, that show improvements over the state-of-the-art.
- Published
- 2018
- Full Text
- View/download PDF
34. Rank Pooling for Action Recognition.
- Author
-
Fernando B, Gavves E, Oramas M JO, Ghodrati A, and Tuytelaars T
- Abstract
We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.