Author: "Siddharth, N." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Siddharth, N."' showing total 405 results

Start Over Author "Siddharth, N."

405 results on '"Siddharth, N."'

1. Banyan: Improved Representation Learning with Explicit Structure

Author: Opper, Mattia and Siddharth, N.
Subjects: Computer Science - Computation and Language
Abstract: We present Banyan, an improved model to learn semantic representations by inducing explicit structure over data. In contrast to prior approaches using structure spanning single sentences, Banyan learns by resolving multiple constituent structures into a shared one explicitly incorporating global context. Combined with an improved message-passing scheme inspired by Griffin, Banyan learns significantly better representations, avoids spurious false negatives with contrastive learning, and drastically improves memory efficiency in such explicit-structured models. Using the Self-StrAE framework, we show that Banyan (a) outperforms baselines using sentential structure across various settings (b) matches or outperforms unstructured baselines like GloVe (+augmentations) and a RoBERTa medium (+simcse) pre-trained on 100M tokens, despite having just a handful of (non-embedding) parameters, and (c) also learns effective representations across several low resource (Asian and African) languages as measured on SemRel tasks., Comment: First Draft
Published: 2024

2. Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Author: Sun, Chuanhao, Yuan, Zhihang, Xu, Kai, Mai, Luo, Siddharth, N., Chen, Shuo, and Marina, Mahesh K.
Subjects: Computer Science - Machine Learning
Abstract: Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. Despite their effectiveness, existing PEs require manual, empirical adjustment of crucial hyperparameters, specifically the Fourier features, tailored to each unique task. Further, PEs face challenges in efficiently learning high-frequency functions, particularly in tasks with limited data. In this paper, we introduce sinusoidal PE (SPE), designed to efficiently learn adaptive frequency features closely aligned with the true underlying function. Our experiments demonstrate that SPE, without hyperparameter tuning, consistently achieves enhanced fidelity and faster training across various tasks, including 3D view synthesis, Text-to-Speech generation, and 1D regression. SPE is implemented as a direct replacement for existing PEs. Its plug-and-play nature lets numerous tasks easily adopt and benefit from SPE., Comment: 16 pages, Conference, Accepted by ICML 2024
Published: 2024

3. Multi-Label Classification for Implicit Discourse Relation Recognition

Author: Long, Wanqiu, Siddharth, N., and Webber, Bonnie
Subjects: Computer Science - Computation and Language
Abstract: Discourse relations play a pivotal role in establishing coherence within textual content, uniting sentences and clauses into a cohesive narrative. The Penn Discourse Treebank (PDTB) stands as one of the most extensively utilized datasets in this domain. In PDTB-3, the annotators can assign multiple labels to an example, when they believe that multiple relations are present. Prior research in discourse relation recognition has treated these instances as separate examples during training, and only one example needs to have its label predicted correctly for the instance to be judged as correct. However, this approach is inadequate, as it fails to account for the interdependence of labels in real-world contexts and to distinguish between cases where only one sense relation holds and cases where multiple relations hold simultaneously. In our work, we address this challenge by exploring various multi-label classification frameworks to handle implicit discourse relation recognition. We show that multi-label classification methods don't depress performance for single-label prediction. Additionally, we give comprehensive analysis of results and data. Our work contributes to advancing the understanding and application of discourse relations and provide a foundation for the future study, Comment: ACL2024 Finding
Published: 2024

4. Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn More With Less

Author: Opper, Mattia and Siddharth, N.
Subjects: Computer Science - Computation and Language
Abstract: This paper presents two simple improvements to the Self-Structuring AutoEncoder (Self-StrAE). Firstly, we show that including reconstruction to the vocabulary as an auxiliary objective improves representation quality. Secondly, we demonstrate that increasing the number of independent channels leads to significant improvements in embedding quality, while simultaneously reducing the number of parameters. Surprisingly, we demonstrate that this trend can be followed to the extreme, even to point of reducing the total number of non-embedding parameters to seven. Our system can be pre-trained from scratch with as little as 10M tokens of input data, and proves effective across English, Spanish and Afrikaans., Comment: SemEval 2024
Published: 2024
Full Text: View/download PDF

5. On the effect of curriculum learning with developmental data for grammar acquisition

Author: Opper, Mattia, Morrison, J., and Siddharth, N.
Subjects: Computer Science - Computation and Language
Abstract: This work explores the degree to which grammar acquisition is driven by language `simplicity' and the source modality (speech vs. text) of data. Using BabyBERTa as a probe, we find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles. We arrive at this finding by examining various ways of presenting input data to our model. First, we assess the impact of various sequence-level complexity based curricula. We then examine the impact of learning over `blocks' -- covering spans of text that are balanced for the number of tokens in each of the source corpora (rather than number of lines). Finally, we explore curricula that vary the degree to which the model is exposed to different corpora. In all cases, we find that over-exposure to AO-Childes and Open Subtitles significantly drives performance. We verify these findings through a comparable control dataset in which exposure to these corpora, and speech more generally, is limited by design. Our findings indicate that it is not the proportion of tokens occupied by high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data. We hope this encourages future research into the use of more developmentally plausible linguistic data (which tends to be more scarce) to augment general purpose pre-training regimes., Comment: CoNLL-CMCL Shared Task BabyLM Challenge 2023
Published: 2023

6. Bayesian Program Learning by Decompiling Amortized Knowledge

Author: Palmarini, Alessandro B., Lucas, Christopher G., and Siddharth, N.
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: DreamCoder is an inductive program synthesis system that, whilst solving problems, learns to simplify search in an iterative wake-sleep procedure. The cost of search is amortized by training a neural search policy, reducing search breadth and effectively "compiling" useful information to compose program solutions across tasks. Additionally, a library of program components is learnt to compress and express discovered solutions in fewer components, reducing search depth. We present a novel approach for library learning that directly leverages the neural search policy, effectively "decompiling" its amortized knowledge to extract relevant program components. This provides stronger amortized inference: the amortized knowledge learnt to reduce search breadth is now also used to reduce search depth. We integrate our approach with DreamCoder and demonstrate faster domain proficiency with improved generalization on a range of domains, particularly when fewer example solutions are available.
Published: 2023

7. Autoencoding Conditional Neural Processes for Representation Learning

Author: Prokhorov, Victor, Titov, Ivan, and Siddharth, N.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Conditional neural processes (CNPs) are a flexible and efficient family of models that learn to learn a stochastic process from data. They have seen particular application in contextual image completion - observing pixel values at some locations to predict a distribution over values at other unobserved locations. However, the choice of pixels in learning CNPs is typically either random or derived from a simple statistical measure (e.g. pixel variance). Here, we turn the problem on its head and ask: which pixels would a CNP like to observe - do they facilitate fitting better CNPs, and do such pixels tell us something meaningful about the underlying image? To this end we develop the Partial Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNP context as latent variables learnt simultaneously with the CNP. We evaluate PPS-VAE over a number of tasks across different visual data, and find that not only can it facilitate better-fit CNPs, but also that the spatial arrangement and values meaningfully characterise image information - evaluated through the lens of classification on both within and out-of-data distributions. Our model additionally allows for dynamic adaption of context-set size and the ability to scale-up to larger images, providing a promising avenue to explore learning meaningful and effective visual representations.
Published: 2023

8. StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

Author: Opper, Mattia, Prokhorov, Victor, and Siddharth, N.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured representations, enables effective learning of multi-level representations. Through comparison over different forms of structure, we verify that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then further extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm. This variant, called Self-StrAE, outperforms baselines that don't involve explicit hierarchical compositions, and is comparable to models given informative structure (e.g. constituency parses). Our experiments are conducted in a data-constrained (circa 10M tokens) setting to help tease apart the contribution of the inductive bias to effective learning. However, we find that this framework can be robust to scale, and when extended to a much larger dataset (circa 100M tokens), our 430 parameter model performs comparably to a 6-layer RoBERTa many orders of magnitude larger in size. Our findings support the utility of incorporating explicit composition as an inductive bias for effective representation learning., Comment: EMNLP 2023 Main
Published: 2023
Full Text: View/download PDF

9. Drawing out of Distribution with Neuro-Symbolic Generative Models

Author: Liang, Yichao, Tenenbaum, Joshua B., Le, Tuan Anh, and Siddharth, N.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Computer Science - Symbolic Computation
Abstract: Learning general-purpose representations from perceptual inputs is a hallmark of human intelligence. For example, people can write out numbers or characters, or even draw doodles, by characterizing these tasks as different instantiations of the same generic underlying process -- compositional arrangements of different forms of pen strokes. Crucially, learning to do one task, say writing, implies reasonable competence at another, say drawing, on account of this shared process. We present Drawing out of Distribution (DooD), a neuro-symbolic generative model of stroke-based drawing that can learn such general-purpose representations. In contrast to prior work, DooD operates directly on images, requires no supervision or expensive test-time inference, and performs unsupervised amortised inference with a symbolic stroke model that better enables both interpretability and generalization. We evaluate DooD on its ability to generalise across both data and tasks. We first perform zero-shot transfer from one dataset (e.g. MNIST) to another (e.g. Quickdraw), across five different datasets, and show that DooD clearly outperforms different baselines. An analysis of the learnt representations further highlights the benefits of adopting a symbolic stroke model. We then adopt a subset of the Omniglot challenge tasks, and evaluate its ability to generate new exemplars (both unconditionally and conditionally), and perform one-shot classification, showing that DooD matches the state of the art. Taken together, we demonstrate that DooD does indeed capture general-purpose representations across both data and task, and takes a further step towards building general and robust concept-learning systems., Comment: Preprint. Under review. 25 pages
Published: 2022

10. Novel N100 area reliably captures aberrant sensory processing and is associated with neurocognition in early psychosis

Author: Machiraju, Siddharth N., Wyss, Jeffrey, Light, Gregory, Braff, David L., and Cadenhead, Kristin S.
Published: 2024
Full Text: View/download PDF

11. Adversarial Masking for Self-Supervised Learning

Author: Shi, Yuge, Siddharth, N., Torr, Philip H. S., and Kosiorek, Adam R.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We propose ADIOS, a masked image model (MIM) framework for self-supervised learning, which simultaneously learns a masking function and an image encoder using an adversarial objective. The image encoder is trained to minimise the distance between representations of the original and that of a masked image. The masking function, conversely, aims at maximising this distance. ADIOS consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets -- including classification on ImageNet100 and STL10, transfer learning on CIFAR10/100, Flowers102 and iNaturalist, as well as robustness evaluated on the backgrounds challenge (Xiao et al., 2021) -- while generating semantically meaningful masks. Unlike modern MIM models such as MAE, BEiT and iBOT, ADIOS does not rely on the image-patch tokenisation construction of Vision Transformers, and can be implemented with convolutional backbones. We further demonstrate that the masks learned by ADIOS are more effective in improving representation learning of SSL methods than masking schemes used in popular MIM models. Code is available at https://github.com/YugeTen/adios.
Published: 2022

12. Cervical Schwannoma camouflaged by cervical intervertebral disc prolapse—A case report

Author: Hadgaonkar, Shailesh R., Situt, Nishad V., Marya, Shivan, Aiyer, Siddharth N., and Sancheti, Parag K.
Published: 2023
Full Text: View/download PDF

13. Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface

Author: Le, Tuan Anh, Collins, Katherine M., Hewitt, Luke, Ellis, Kevin, Siddharth, N., Gershman, Samuel J., and Tenenbaum, Joshua B.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.
Published: 2021

14. On Incorporating Inductive Biases into VAEs

Author: Miao, Ning, Mathieu, Emile, Siddharth, N., Teh, Yee Whye, and Rainforth, Tom
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We explain why directly changing the prior can be a surprisingly ineffective mechanism for incorporating inductive biases into VAEs, and introduce a simple and effective alternative approach: Intermediary Latent Space VAEs(InteL-VAEs). InteL-VAEs use an intermediary set of latent variables to control the stochasticity of the encoding process, before mapping these in turn to the latent representation using a parametric function that encapsulates our desired inductive bias(es). This allows us to impose properties like sparsity or clustering on learned representations, and incorporate human knowledge into the generative model. Whereas changing the prior only indirectly encourages behavior through regularizing the encoder, InteL-VAEs are able to directly enforce desired characteristics. Moreover, they bypass the computation and encoder design issues caused by non-Gaussian priors, while allowing for additional flexibility through training of the parametric mapping function. We show that these advantages, in turn, lead to both better generative models and better representations being learned.
Published: 2021

15. Learning Multimodal VAEs through Mutual Supervision

Author: Joy, Tom, Shi, Yuge, Torr, Philip H. S., Rainforth, Tom, Schmon, Sebastian M., and Siddharth, N.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.
Published: 2021

16. Gradient Matching for Domain Generalization

Author: Shi, Yuge, Seely, Jeffrey, Torr, Philip H. S., Siddharth, N., Hannun, Awni, Usunier, Nicolas, and Synnaeve, Gabriel
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Machine learning systems typically assume that the distributions of training and test sets match closely. However, a critical requirement of such systems in the real world is their ability to generalize to unseen domains. Here, we propose an inter-domain gradient matching objective that targets domain generalization by maximizing the inner product between gradients from different domains. Since direct optimization of the gradient inner product can be computationally prohibitive -- requires computation of second-order derivatives -- we derive a simpler first-order algorithm named Fish that approximates its optimization. We demonstrate the efficacy of Fish on 6 datasets from the Wilds benchmark, which captures distribution shift across a diverse range of modalities. Our method produces competitive results on these datasets and surpasses all baselines on 4 of them. We perform experiments on both the Wilds benchmark, which captures distribution shift in the real world, as well as datasets in DomainBed benchmark that focuses more on synthetic-to-real transfer. Our method produces competitive results on both benchmarks, demonstrating its effectiveness across a wide range of domain generalization tasks.
Published: 2021

17. Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Author: Shi, Yuge, Paige, Brooks, Torr, Philip H. S., and Siddharth, N.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable representations, the training of such models often requires a large amount of "related" multimodal data that shares commonality, which can be expensive to come by. To mitigate this, we develop a novel contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. We show in experiments that our method enables data-efficient multimodal learning on challenging datasets for various multimodal VAE models. We also show that under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
Published: 2020

18. Capturing Label Characteristics in VAEs

Author: Joy, Tom, Schmon, Sebastian M., Torr, Philip H. S., Siddharth, N., and Rainforth, Tom
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of mappings between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints., Comment: Accepted to ICLR 2021
Published: 2020

19. Simulation-Based Inference for Global Health Decisions

Author: de Witt, Christian Schroeder, Gram-Hansen, Bradley, Nardelli, Nantas, Gambardella, Andrew, Zinkov, Rob, Dokania, Puneet, Siddharth, N., Espinosa-Gonzalez, Ana Belen, Darzi, Ara, Torr, Philip, and Baydin, Atılım Güneş
Subjects: Computer Science - Machine Learning, Statistics - Applications, Statistics - Machine Learning
Abstract: The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators.
Published: 2020

20. A Revised Generative Evaluation of Visual Dialogue

Author: Massiceti, Daniela, Kulharia, Viveka, Dokania, Puneet K., Siddharth, N., and Torr, Philip H. S.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge. The current evaluation scheme of the VisDial dataset computes the ranks of ground-truth answers in predefined candidate sets, which Massiceti et al. (2018) show can be susceptible to the exploitation of dataset biases. This scheme also does little to account for the different ways of expressing the same answer--an aspect of language that has been well studied in NLP. We propose a revised evaluation scheme for the VisDial dataset leveraging metrics from the NLP literature to measure consensus between answers generated by the model and a set of relevant answers. We construct these relevant answer sets using a simple and effective semi-supervised method based on correlation, which allows us to automatically extend and scale sparse relevance annotations from humans to the entire dataset. We release these sets and code for the revised evaluation scheme as DenseVisDial, and intend them to be an improvement to the dataset in the face of its existing constraints and design choices., Comment: 16 pages, 5 figures
Published: 2020

21. Lessons from reinforcement learning for biological representations of space

Author: Muryy, Alex, Siddharth, N., Nardelli, Nantas, Torr, Philip H. S., and Glennerster, Andrew
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks such as interpolating between learned locations using decoding of feature vectors. We introduce a hand-crafted representation that has, by design, a high degree of geometric consistency and demonstrate that, in this case, information about the persistence of features as the camera translates (e.g. distant features persist) can improve performance on the geometric tasks. These examples avoid Cartesian (in this case, 2D) representations of space. Non-Cartesian, learned representations provide an important stimulus in neuroscience to the search for alternatives to a 'cognitive map'., Comment: 40 pages including Appendix, 6 figures plus 3 figures in Appendix. Accepted for publication in Vision Research
Published: 2019

22. Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Author: Shi, Yuge, Siddharth, N., Paige, Brooks, and Torr, Philip H. S.
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the modalities. In this work, we characterise successful learning of such models as the fulfillment of four criteria: i) implicit latent decomposition into shared and private subspaces, ii) coherent joint generation over all modalities, iii) coherent cross-generation across individual modalities, and iv) improved model learning for individual modalities through multi-modal integration. Here, we propose a mixture-of-experts multimodal variational autoencoder (MMVAE) to learn generative models on different sets of modalities, including a challenging image-language dataset, and demonstrate its ability to satisfy all four criteria, both qualitatively and quantitatively.
Published: 2019

23. Multitask Soft Option Learning

Author: Igl, Maximilian, Gambardella, Andrew, He, Jinke, Nardelli, Nantas, Siddharth, N., Böhmer, Wendelin, and Whiteson, Shimon
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This ''soft'' version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policies and their terminations. Furthermore, it allows fine-tuning of options for new tasks without forgetting their learned policies, leading to faster training without reducing the expressiveness of the hierarchical policy. We demonstrate empirically that MSOL significantly outperforms both hierarchical and flat transfer-learning baselines., Comment: Published at UAI 2020
Published: 2019

24. Visual Dialogue without Vision or Dialogue

Author: Massiceti, Daniela, Dokania, Puneet K., Siddharth, N., and Torr, Philip H. S.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli. To do so, we develop an embarrassingly simple method based on Canonical Correlation Analysis (CCA) that, on the standard dataset, achieves near state-of-the-art performance on mean rank (MR). In direct contrast to current complex and over-parametrised architectures that are both compute and time intensive, our method ignores the visual stimuli, ignores the sequencing of dialogue, does not need gradients, uses off-the-shelf feature extractors, has at least an order of magnitude fewer parameters, and learns in practically no time. We argue that these results are indicative of issues in current approaches to Visual Dialogue and conduct analyses to highlight implicit dataset biases and effects of over-constrained evaluation metrics. Our code is publicly available., Comment: 2018 NeurIPS Workshop on Critiquing and Correcting Trends in Machine Learning
Published: 2018

25. Disentangling Disentanglement in Variational Autoencoders

Author: Mathieu, Emile, Rainforth, Tom, Siddharth, N., and Teh, Yee Whye
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We develop a generalisation of disentanglement in VAEs---decomposition of the latent representation---characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior. Decomposition permits disentanglement, i.e. explicit independence between latents, as a special case, but also allows for a much richer class of properties to be imposed on the learnt representation, such as sparsity, clustering, independent subspaces, or even intricate hierarchical dependency relationships. We show that the $\beta$-VAE varies from the standard VAE predominantly in its control of latent overlap and that for the standard choice of an isotropic Gaussian prior, its objective is invariant to rotations of the latent representation. Viewed from the decomposition perspective, breaking this invariance with simple manipulations of the prior can yield better disentanglement with little or no detriment to reconstructions. We further demonstrate how other choices of prior can assist in producing different decompositions and introduce an alternative training objective that allows the control of both decomposition factors in a principled manner., Comment: Accepted for publication at ICML 2019
Published: 2018

26. Process optimization for acid-amine coupling: a catalytic approach

Author: Ranjitsinh C. Dabhi, Unnati P. Patel, Vaibhavi B. Rathod, Siddharth N. Shah, and Jayesh J. Maru
Subjects: Chemistry, QD1-999
Abstract: Proficient routes were devised for coupling different aromatic/aliphatic acids with amines to form amide linkage using various catalysts. Under the optimized reaction conditions, highest conversion was possible without formation of any by-products. All synthesized compounds were purified using column chromatography and characterized by mass spectrometry, nuclear magnetic resonance spectrometry and liquid chromatography-mass spectrometric analysis.
Published: 2023
Full Text: View/download PDF

27. Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow

Author: Le, Tuan Anh, Kosiorek, Adam R., Siddharth, N., Teh, Yee Whye, and Wood, Frank
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations---which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) (Bornschein and Bengio, 2015) algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs., Comment: Tuan Anh Le and Adam R. Kosiorek contributed equally; accepted to Uncertainty in Artificial Intelligence 2019
Published: 2018

28. DGPose: Deep Generative Models for Human Body Analysis

Author: de Bem, Rodrigo, Ghosh, Arnab, Ajanthan, Thalaiyasingam, Miksik, Ondrej, Boukhayma, Adnane, Siddharth, N., and Torr, Philip
Subjects: Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks., Comment: IJCV 2020 special issue on 'Generating Realistic Visual Data of Human Behavior' preprint. Keywords: deep generative models, semi-supervised learning, human pose estimation, variational autoencoders, generative adversarial networks
Published: 2018

29. Structured Disentangled Representations

Author: Esmaeili, Babak, Wu, Hao, Jain, Sarthak, Bozkurt, Alican, Siddharth, N., Paige, Brooks, Brooks, Dana H., Dy, Jennifer, and van de Meent, Jan-Willem
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.
Published: 2018

30. FlipDial: A Generative Model for Two-Way Visual Dialogue

Author: Massiceti, Daniela, Siddharth, N., Dokania, Puneet K., and Torr, Philip H. S.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present FlipDial, a generative model for visual dialogue that simultaneously plays the role of both participants in a visually-grounded dialogue. Given context in the form of an image and an associated caption summarising the contents of the image, FlipDial learns both to answer questions and put forward questions, capable of generating entire sequences of dialogue (question-answer pairs) which are diverse and relevant to the image. To do this, FlipDial relies on a simple but surprisingly powerful idea: it uses convolutional neural networks (CNNs) to encode entire dialogues directly, implicitly capturing dialogue context, and conditional VAEs to learn the generative model. FlipDial outperforms the state-of-the-art model in the sequential answering task (one-way visual dialogue) on the VisDial dataset by 5 points in Mean Rank using the generated answers. We are the first to extend this paradigm to full two-way visual dialogue, where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.
Published: 2018

31. Faithful Inversion of Generative Models for Effective Amortized Inference

Author: Webb, Stefan, Golinski, Adam, Zinkov, Robert, Siddharth, N., Rainforth, Tom, Teh, Yee Whye, and Wood, Frank
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency structure in a heuristic way that fails to capture these dependencies correctly, thereby limiting the achievable accuracy of the resulting approximations. We introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. Such inverses have two crucial properties: (a) they do not encode any independence assertions that are absent from the model and; (b) they are local maxima for the number of true independencies encoded. We prove the correctness of our approach and empirically show that the resulting minimally faithful inverses lead to better inference amortization than existing heuristic approaches., Comment: To appear at the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada
Published: 2017

32. Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Author: Siddharth, N., Paige, Brooks, van de Meent, Jan-Willem, Desmaison, Alban, Goodman, Noah D., Kohli, Pushmeet, Wood, Frank, and Torr, Philip H. S.
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets., Comment: Accepted for publication at NIPS 2017
Published: 2017

33. Excitonic Properties versus Structure Stability Trade‐Off in Halide Perovskite Photovoltaics Caused by van der Waals Interactions.

Author: Rathod, Siddharth N. and Farajian, Amir A.
Subjects: *BAND gaps, *DENSITY functional theory, *DIELECTRIC function, *STRUCTURAL stability, *PERMITTIVITY
Abstract: Lead halide perovskites, and their derivatives, are among the most promising photovoltaic materials for third generation solar cells. Despite the large number of available works on some of these materials, excitonic properties whose assessment has been challenging are less investigated. These include quantitative measures of excitonic properties variations with van der Waals (vdW) interactions. Consistent comparisons of how vdW interactions affect phononic and optical properties are also desirable. This work focuses on cubic phases of MAPbX3$\left(\text{MAPbX}\right)_{3}$ with X = Cl, Br, I, and MA = methylammonium, using density functional theory simulations including vdW interactions. These cause 30%–38% increase of absolute cohesive energies and 15%–37% reduction of ionic/vibrational contributions to static dielectric constants, along with 10%–29% reduction of exciton Bohr radii and 29%–107% increase of exciton binding energies. The effects on band gaps, frequency‐dependent dielectric functions, and exciton effective masses are less pronounced. Within the Mott–Wannier exciton model, the results suggest a trade‐off between photovoltaic performance and structure stability. The results can help assess stability, feasibility, and performance of hybrid photovoltaic materials. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Playing Doom with SLAM-Augmented Deep Reinforcement Learning

Author: Bhatti, Shehroze, Desmaison, Alban, Miksik, Ondrej, Nardelli, Nantas, Siddharth, N., and Torr, Philip H. S.
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent's localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction.We evaluate the efficacy of our approach in Doom, a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.
Published: 2016

35. Study of Friction Stir Welding on Aerospace Grade ZE41AMg Alloy and Its Comparison with Laser Beam Welding on ZE41AMg Alloy

Author: Annamalai, Adithyan, Babu, T. R. Kishore, Karthikeyan, S., Siddharth, N., Muralidharan, S., Cavas-Martínez, Francisco, Series Editor, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Haddar, Mohamed, Series Editor, Ivanov, Vitalii, Series Editor, Kwon, Young W., Series Editor, Trojanowska, Justyna, Series Editor, Dave, Harshit K., editor, and Nedelcu, Dumitru, editor
Published: 2021
Full Text: View/download PDF

36. Brucellosis sacroiliitis masquerading as inflammatory spondyloarthropathy

Author: Alok Gupta, Ashok M Shyam, Parag K Sancheti, and Siddharth N Aiyer
Subjects: brucellosis, inflammatory spondyloarthropathy, sacroiliitis, zoonosis, Orthopedic surgery, RD701-811
Abstract: Brucellosis is the most common zoonosis globally, and it is endemic to the Indian subcontinent. It can mimic a number of febrile illnesses and inflammatory disease conditions. An 18-year-old boy presented with low back pain and a fever of three-month duration. Magnetic resonance imaging revealed a unilateral sacroiliitis, which was being treated as an inflammatory spondyloarthropathy. Because of non-resolving symptoms, a biopsy was performed, which showed a granulomatous inflammation that was consistent with tuberculosis or brucellosis infection. A history of exposure to livestock and consumption of unpasteurized milk led to a clinical suspicion of brucellosis, which was confirmed on a positive serology. He was treated with antibiotics with improvement in symptoms and complete resolution of the sacroiliitis. A high index of suspicion must be maintained for brucellosis, especially in patients with a rural residence, exposure to livestock, and febrile illness with a clinically suspected unilateral sacroiliitis.
Published: 2022
Full Text: View/download PDF

37. A Novel Smart Approach to Plant Health - Automated Detection and Diagnosis of Leaf Diseases

Author: Sangeetha, T., primary, Rajarajan, R., additional, Krishna, S. Rithick, additional, and Sakthi Siddharth, N., additional
Published: 2024
Full Text: View/download PDF

38. On Incorporating Inductive Biases into VAEs.

Author: Ning Miao, Emile Mathieu, Siddharth N, Yee Whye Teh, and Tom Rainforth
Published: 2022

39. Inducing Interpretable Representations with Variational Autoencoders

Author: Siddharth, N., Paige, Brooks, Desmaison, Alban, Van de Meent, Jan-Willem, Wood, Frank, Goodman, Noah D., Kohli, Pushmeet, and Torr, Philip H. S.
Subjects: Statistics - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning
Abstract: We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes., Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Published: 2016

40. Metabolic Disruption Induced by mTOR Signaling Pathway Inhibition in Regulatory T-Cell Expansion for Clinical Application

Author: Roberto Gedaly, Gabriel Orozco, Alexandre P. Ancheta, Mackenzie Donoho, Siddharth N. Desai, Fanny Chapelin, Aman Khurana, Lillie J. Lewis, Cuiping Zhang, and Francesc Marti
Subjects: regulatory T-cells, transplantation, cellular therapy, mTOR signaling, Cytology, QH573-671
Abstract: Background: Regulatory T cell (Treg) therapy is considered an alternative approach to induce tolerance in transplantation. If successful, this therapy may have implications on immunosuppression minimization/withdrawal to reduce drug-induced toxicity in patients. The aim of this study was to assess the efficacy of the mTORC1/C2 inhibitor, AZD8055, in the manufacturing of clinically competent Treg cells and compare the effects with those induced by rapamycin (RAPA), another mTOR inhibitor commonly used in Treg expansion protocols. Methods: Primary human Treg cells were isolated from leukapheresis product. Cell viability, expansion rates, suppressive function, autophagy, mitochondrial unfolded protein response (mitoUPR), and cell metabolic profile were assessed. Results: We observed a stronger inhibition of the mTORC2 signaling pathway and downstream events triggered by Interleukin 2 (IL2)-receptor in AZD8055-treated cells compared with those treated with RAPA. AZD8055 induced progressive metabolic changes in mitochondrial respiration and glycolytic pathways that disrupted the long-term expansion and suppressive function of Tregs. Unlike RAPA, AZD8055 treatment impaired autophagy and enhanced the mitoUPR cell stress response pathway. Conclusions: A distinct pattern of mTOR inhibition by AZD, compared with RAPA, induced mitochondrial stress response and dysfunction, impaired autophagy, and disrupted cellular bioenergetics, resulting in the loss of proliferative potential and suppressive function of Treg cells.
Published: 2023
Full Text: View/download PDF

41. Gene Xpert/MTB RIF assay for spinal tuberculosis- sensitivity, specificity and clinical utility

Author: Karthek, Vijay, Bhilare, Pramod, Hadgaonkar, Shailesh, Kothari, Ajay, Shyam, Ashok, Sancheti, Parag, and Aiyer, Siddharth N.
Published: 2021
Full Text: View/download PDF

42. Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs

Author: Stuhlmüller, Andreas, Hawkins, Robert X. D., Siddharth, N., and Goodman, Noah D.
Subjects: Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be used to derive annealing sequences that are more sensitive to domain structure. We propose an algorithm for transforming probabilistic programs to coarse-to-fine programs which have the same marginal distribution as the original programs, but generate the data at increasing levels of detail, from coarse to fine. We apply this algorithm to an Ising model, its depth-from-disparity variation, and a factorial hidden Markov model. We show preliminary evidence that the use of coarse-to-fine models can make existing generic inference algorithms more efficient.
Published: 2015

43. The Compositional Nature of Event Representations in the Human Brain

Author: Barbu, Andrei, Siddharth, N., Xiong, Caiming, Corso, Jason J., Fellbaum, Christiane D., Hanson, Catherine, Hanson, Stephen José, Hélie, Sébastien, Malaia, Evguenia, Pearlmutter, Barak A., Siskind, Jeffrey Mark, Talavage, Thomas Michael, and Wilbur, Ronnie B.
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: How does the human brain represent simple compositions of constituents: actors, verbs, objects, directions, and locations? Subjects viewed videos during neuroimaging (fMRI) sessions from which sentential descriptions of those videos were identified by decoding the brain representations based only on their fMRI activation patterns. Constituents (e.g., "fold" and "shirt") were independently decoded from a single presentation. Independent constituent classification was then compared to joint classification of aggregate concepts (e.g., "fold-shirt"); results were similar as measured by accuracy and correlation. The brain regions used for independent constituent classification are largely disjoint and largely cover those used for joint classification. This allows recovery of sentential descriptions of stimulus videos by composing the results of the independent constituent classifiers. Furthermore, classifiers trained on the words one set of subjects think of when watching a video can recognise sentences a different subject thinks of when watching a different video., Comment: 28 pages; 8 figures; 8 tables
Published: 2015

44. A Semi-supervised Deep Generative Model for Human Body Analysis

Author: de Bem, Rodrigo, Ghosh, Arnab, Ajanthan, Thalaiyasingam, Miksik, Ondrej, Siddharth, N., Torr, Philip, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Leal-Taixé, Laura, editor, and Roth, Stefan, editor
Published: 2019
Full Text: View/download PDF

45. Clinical and radiological factors associated with postoperative shoulder imbalance and correlation with patient-reported outcomes following scoliosis surgery

Author: Hadgaonkar, Shailesh, Shah, Shubham, Bhilare, Pramod, Kothari, Ajay, Shyam, Ashok, Sancheti, Parag, and Aiyer, Siddharth N.
Published: 2020
Full Text: View/download PDF

46. Indian guidelines on hypertension-IV (2019)

Author: Shah, Siddharth N., Munjal, Y. P., Kamath, Sandhya A., Wander, Gurpreet S., Mehta, Nihar, Mukherjee, Sukumar, Kirpalani, Ashok, Gupta, Pritam, Shah, Hardik, Rohatgi, Ragini, Billimoria, Aspi R., Maiya, M., Das, Mrinal Kanti, Goswami, Kewal C., Sharma, Rajan, Rajapurkar, Mohan M., Chawla, Rajeev, Saboo, Banshi, and Jha, Vivekanand
Published: 2020
Full Text: View/download PDF

47. DGPose: Deep Generative Models for Human Body Analysis

Author: de Bem, Rodrigo, Ghosh, Arnab, Ajanthan, Thalaiyasingam, Miksik, Ondrej, Boukhayma, Adnane, Siddharth, N., and Torr, Philip
Published: 2020
Full Text: View/download PDF

48. Saying What You're Looking For: Linguistics Meets Video Search

Author: Barbu, Andrei, Siddharth, N., and Siskind, Jeffrey Mark
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies., Comment: 13 pages, 8 figures
Published: 2013

49. Seeing What You're Told: Sentence-Guided Activity Recognition In Video

Author: Siddharth, N., Barbu, Andrei, and Siskind, Jeffrey Mark
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners., Comment: To appear in CVPR 2014
Published: 2013

50. The Compositional Nature of Verb and Argument Representations in the Human Brain

Author: Barbu, Andrei, Siddharth, N., Xiong, Caiming, Corso, Jason J., Fellbaum, Christiane D., Hanson, Catherine, Hanson, Stephen José, Hélie, Sébastien, Malaia, Evguenia, Pearlmutter, Barak A., Siskind, Jeffrey Mark, Talavage, Thomas Michael, and Wilbur, Ronnie B.
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: How does the human brain represent simple compositions of objects, actors,and actions? We had subjects view action sequence videos during neuroimaging (fMRI) sessions and identified lexical descriptions of those videos by decoding (SVM) the brain representations based only on their fMRI activation patterns. As a precursor to this result, we had demonstrated that we could reliably and with high probability decode action labels corresponding to one of six action videos (dig, walk, etc.), again while subjects viewed the action sequence during scanning (fMRI). This result was replicated at two different brain imaging sites with common protocols but different subjects, showing common brain areas, including areas known for episodic memory (PHG, MTL, high level visual pathways, etc.,i.e. the 'what' and 'where' systems, and TPJ, i.e. 'theory of mind'). Given these results, we were also able to successfully show a key aspect of language compositionality based on simultaneous decoding of object class and actor identity. Finally, combining these novel steps in 'brain reading' allowed us to accurately estimate brain representations supporting compositional decoding of a complex event composed of an actor, a verb, a direction, and an object., Comment: 11 pages, 6 figures
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

405 results on '"Siddharth, N."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources