Author: "Niculae, Vlad" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Niculae, Vlad"' showing total 112 results

Start Over Author "Niculae, Vlad"

112 results on '"Niculae, Vlad"'

1. ARM: Efficient Guided Decoding with Autoregressive Reward Models

Author: Troshin, Sergey, Niculae, Vlad, and Fokkens, Antske
Subjects: Computer Science - Computation and Language
Abstract: Language models trained on large amounts of data require careful tuning to be safely deployed in real world. We revisit the guided decoding paradigm, where the goal is to augment the logits of the base language model using the scores from a task-specific reward model. We propose a simple but efficient parameterization of the autoregressive reward model enabling fast and effective guided decoding. On detoxification and sentiment control tasks, we show that our efficient parameterization performs on par with RAD, a strong but less efficient guided decoding approach.
Published: 2024

2. Sparse and Structured Hopfield Networks

Author: Santos, Saul, Niculae, Vlad, McNamee, Daniel, and Martins, Andre F. T.
Subjects: Computer Science - Machine Learning
Abstract: Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach., Comment: 20 pages, 4 figures
Published: 2024

3. On Measuring Context Utilization in Document-Level MT Systems

Author: Mohammed, Wafaa and Niculae, Vlad
Subjects: Computer Science - Computation and Language
Abstract: Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We propose to complement accuracy-based evaluation with measures of context utilization. We find that perturbation-based analysis (comparing models' performance when provided with correct versus random context) is an effective measure of overall context utilization. For a finer-grained phenomenon-specific evaluation, we propose to measure how much the supporting context contributes to handling context-dependent discourse phenomena. We show that automatically-annotated supporting context gives similar conclusions to human-annotated context and can be used as alternative for cases where human annotations are not available. Finally, we highlight the importance of using discourse-rich datasets when assessing context utilization.
Published: 2024

4. The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation

Author: Tokarchuk, Evgeniia and Niculae, Vlad
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.
Published: 2023

5. Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

Author: Araabi, Ali, Niculae, Vlad, and Monz, Christof
Subjects: Computer Science - Computation and Language, 68T50, I.2.7
Abstract: Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, resulting in significant enhancement of compositionality, which is a key aspect of generalization. We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores. Furthermore, we conduct an error analysis, and find Joint Dropout to also enhance generalizability of low-resource NMT in terms of robustness and adaptability across different domains, Comment: Accepted at MT Summit 2023
Published: 2023

6. Two derivations of Principal Component Analysis on datasets of distributions

Author: Niculae, Vlad
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives., Comment: 4 pages, 1 figure
Published: 2023

7. Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

Author: Stap, David, Niculae, Vlad, and Monz, Christof
Subjects: Computer Science - Computation and Language
Abstract: We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups., Comment: Accepted to EMNLP 2023 Findings
Published: 2023

8. DAG Learning on the Permutahedron

Author: Zantedeschi, Valentina, Franceschi, Luca, Kaddour, Jean, Kusner, Matt J., and Niculae, Vlad
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD., Comment: The Eleventh International Conference on Learning Representations
Published: 2023

9. Discrete Latent Structure in Neural Networks

Author: Niculae, Vlad, Corro, Caio F., Nangia, Nikita, Mihaylova, Tsvetomila, and Martins, André F. T.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning, I.2.6
Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
Published: 2023

10. How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?

Author: Araabi, Ali, Monz, Christof, and Niculae, Vlad
Subjects: Computer Science - Computation and Language, 68T50, I.2.7
Abstract: Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results for a wide range of translation tasks in terms of automatic evaluation metrics. While it is often assumed that by using BPE, NMT systems are capable of handling OOV words, the effectiveness of BPE in translating OOV words has not been explicitly measured. In this paper, we study to what extent BPE is successful in translating OOV words at the word-level. We analyze the translation quality of OOV words based on word type, number of segments, cross-attention weights, and the frequency of segment n-grams in the training data. Our experiments show that while careful BPE settings seem to be fairly useful in translating OOV words across datasets, a considerable percentage of OOV words are translated incorrectly. Furthermore, we highlight the slightly higher effectiveness of BPE in translating OOV words for special cases, such as named-entities and when the languages involved are linguistically close to each other., Comment: 14 pages, 6 figures, 1 table, To be published in AMTA 2022 conference
Published: 2022

11. Modeling Structure with Undirected Neural Networks

Author: Mihaylova, Tsvetomila, Niculae, Vlad, and Martins, André F. T.
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem -- e.g., factor graphs -- neural networks are usually monolithic mappings from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing different directions of computation and interaction between the modeled variables. In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order. For particular choices, our proposed models subsume and extend many existing architectures: feed-forward, recurrent, self-attention networks, auto-encoders, and networks with implicit layers. We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks: tree-constrained dependency parsing, convolutional image classification, and sequence completion with attention. By varying the computation order, we show how a single UNN can be used both as a classifier and a prototype generator, and how it can fill in missing parts of an input sequence, making them a promising field for further research., Comment: ICML 2022
Published: 2022

12. Sparse Communication via Mixed Distributions

Author: Farinhas, António, Aziz, Wilker, Niculae, Vlad, and Martins, André F. T.
Subjects: Computer Science - Machine Learning
Abstract: Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the Hard Concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids, which we call "mixed random variables." Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic ("sample-and-project") and an intrinsic one (based on face stratification). We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables., Comment: Accepted for oral presentation at ICLR 2022
Published: 2021

13. Sparse Continuous Distributions and Fenchel-Young Losses

Author: Martins, André F. T., Treviso, Marcos, Farinhas, António, Aguiar, Pedro M. Q., Figueiredo, Mário A. T., Blondel, Mathieu, and Niculae, Vlad
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $\Omega$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain ``deformed exponential families,'' which include $\alpha$-entmax and sparsemax ($\alpha=2$) as particular cases. For quadratic energy functions, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions., Comment: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.07214
Published: 2021

14. Learning Binary Decision Trees by Argmin Differentiation

Author: Zantedeschi, Valentina, Kusner, Matt J., and Niculae, Vlad
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees.
Published: 2020

15. Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Author: Mihaylova, Tsvetomila, Niculae, Vlad, and Martins, André F. T.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases., Comment: EMNLP 2020
Published: 2020

16. Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

Author: Correia, Gonçalo M., Niculae, Vlad, Aziz, Wilker, and Martins, André F. T.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations., Comment: Accepted for spotlight presentation at NeurIPS 2020
Published: 2020

17. Sparse and Continuous Attention Mechanisms

Author: Martins, André F. T., Farinhas, António, Treviso, Marcos, Niculae, Vlad, Aguiar, Pedro M. Q., and Figueiredo, Mário A. T.
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions., Comment: Accepted for spotlight presentation at NeurIPS 2020
Published: 2020

18. Sparse and Structured Visual Attention

Author: Martins, Pedro Henrique, Niculae, Vlad, Marinho, Zita, and Martins, André
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual attention mechanisms are widely used in multimodal tasks, as visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign some probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attention mechanism with two alternative sparsity-promoting transformations: sparsemax, which is able to select only the relevant regions (assigning zero weight to the rest), and a newly proposed Total-Variation Sparse Attention (TVmax), which further encourages the joint selection of adjacent spatial locations. Experiments in VQA show gains in accuracy as well as higher similarity to human attention, which suggests better interpretability.
Published: 2020

19. LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction

Author: Niculae, Vlad and Martins, André F. T.
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some downstream applications. However, SparseMAP requires a tractable MAP inference oracle. This excludes, e.g., loopy graphical models or factor graphs with logic constraints, which generally require approximate inference. In this paper, we introduce LP-SparseMAP, an extension of SparseMAP that addresses this limitation via a local polytope relaxation. LP-SparseMAP uses the flexible and powerful domain specific language of factor graphs for defining and backpropagating through arbitrary hidden structure, supporting coarse decompositions, hard logic constraints, and higher-order correlations. We derive the forward and backward algorithms needed for using LP-SparseMAP as a hidden or output layer. Experiments in three structured prediction tasks show benefits compared to SparseMAP and Structured SVM., Comment: 34 pages, 5 tables, 4 figures. ICML 2020
Published: 2020

20. Adaptively Sparse Transformers

Author: Correia, Gonçalo M., Niculae, Vlad, and Martins, André F. T.
Subjects: Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $\alpha$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $\alpha$ parameter -- which controls the shape and sparsity of $\alpha$-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations., Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China
Published: 2019

21. Notes on Latent Structure Models and SPIGOT

Author: Martins, André F. T. and Niculae, Vlad
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: These notes aim to shed light on the recently proposed structured projected intermediate gradient optimization technique (SPIGOT, Peng et al., 2018). SPIGOT is a variant of the straight-through estimator (Bengio et al., 2013) which bypasses gradients of the argmax function by back-propagating a surrogate "gradient." We provide a new interpretation to the proposed gradient and put this technique into perspective, linking it to other methods for training neural networks with discrete latent variables. As a by-product, we suggest alternate variants of SPIGOT which will be further explored in future work., Comment: 7 pages
Published: 2019

22. Sparse Sequence-to-Sequence Models

Author: Peters, Ben, Niculae, Vlad, and Martins, André F. T.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models., Comment: ACL 2019 Camera Ready
Published: 2019

23. Learning with Fenchel-Young Losses

Author: Blondel, Mathieu, Martins, André F. T., and Niculae, Vlad
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice., Comment: In Journal of Machine Learning Research, volume 21
Published: 2019

24. Towards Dynamic Computation Graphs via Sparse Latent Structure

Author: Niculae, Vlad, Martins, André F. T., and Cardie, Claire
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning, 68T50, I.2.6, I.2.7
Abstract: Deep NLP models benefit from underlying structures in the data---e.g., parse trees---typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability., Comment: EMNLP 2018; 9 pages (incl. appendix)
Published: 2018

25. Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

Author: Blondel, Mathieu, Martins, André F. T., and Niculae, Vlad
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distributions. We formulate conditions for a generalized entropy to yield losses with a separation margin, and probability distributions with sparse support. Finally, we derive efficient algorithms, making Fenchel-Young losses appealing both in theory and practice., Comment: In proceedings of AISTATS 2019
Published: 2018

26. SparseMAP: Differentiable Sparse Structured Inference

Author: Niculae, Vlad, Martins, André F. T., Blondel, Mathieu, and Cardie, Claire
Subjects: Statistics - Machine Learning, Computer Science - Computation and Language, Computer Science - Learning, 68T50, I.2.6
Abstract: Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structures, including implausible ones. Importantly, SparseMAP can be computed using only calls to a MAP oracle, making it applicable to problems with intractable marginal inference, e.g., linear assignment. Sparsity makes gradient backpropagation efficient regardless of the structure, enabling us to augment deep neural networks with generic and sparse structured hidden layers. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems., Comment: Published in ICML 2018. 14 pages, including appendix
Published: 2018

27. A Regularized Framework for Sparse and Structured Neural Attention

Author: Niculae, Vlad and Blondel, Mathieu
Subjects: Statistics - Machine Learning, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax., Comment: In proceedings of NeurIPS 2017; added errata
Published: 2017

28. Multi-output Polynomial Networks and Factorization Machines

Author: Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, and Ueda, Naonori
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy., Comment: Published at NIPS 2017. 17 pages, including appendix
Published: 2017

29. Argument Mining with Structured SVMs and RNNs

Author: Niculae, Vlad, Park, Joonsuk, and Cardie, Claire
Subjects: Computer Science - Computation and Language, 68T50, I.2.7
Abstract: We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets., Comment: Accepted for publication at ACL 2017. 11 pages, 5 figures. Code at https://github.com/vene/marseille and data at http://joonsuk.org/
Published: 2017

30. Conversational Markers of Constructive Discussions

Author: Niculae, Vlad and Danescu-Niculescu-Mizil, Cristian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks, Physics - Physics and Society, Statistics - Machine Learning
Abstract: Group discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discussion, the sum becomes greater than the parts. However, this assumption is not irrefutable: anecdotal evidence of wasteful discussions abounds, and in our own experiments we find that over 30% of discussions are unproductive. We propose a framework for analyzing conversational dynamics in order to determine whether a given task-oriented discussion is worth having or not. We exploit conversational patterns reflecting the flow of ideas and the balance between the participants, as well as their linguistic choices. We apply this framework to conversations naturally occurring in an online collaborative world exploration game developed and deployed to support this research. Using this setting, we show that linguistic cues and conversational patterns extracted from the first 20 seconds of a team discussion are predictive of whether it will be a wasteful or a productive one., Comment: To appear at NAACL-HLT 2016. 11pp, 5 fig. Data and other info available at http://vene.ro/constructive/
Published: 2016

31. Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-faith Online Discussions

Author: Tan, Chenhao, Niculae, Vlad, Danescu-Niculescu-Mizil, Cristian, and Lee, Lillian
Subjects: Computer Science - Social and Information Networks, Computer Science - Computation and Language, Physics - Physics and Society
Abstract: Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their own opinions and reasoning, invite others to contest them, and acknowledge when the ensuing discussions change their original views. In this work, we study these interactions to understand the mechanisms behind persuasion. We find that persuasive arguments are characterized by interesting patterns of interaction dynamics, such as participant entry-order and degree of back-and-forth exchange. Furthermore, by comparing similar counterarguments to the same opinion, we show that language factors play an essential role. In particular, the interplay between the language of the opinion holder and that of the counterargument provides highly predictive cues of persuasiveness. Finally, since even in this favorable setting people may not be persuaded, we investigate the problem of determining whether someone's opinion is susceptible to being changed at all. For this more difficult task, we show that stylistic choices in how the opinion is expressed carry predictive power., Comment: 12 pages, 10 figures, to appear in Proceedings of WWW 2016, data and more at https://chenhaot.com/pages/changemyview.html (v2 made a minor correction on submission rules in ChangeMyView.)
Published: 2016
Full Text: View/download PDF

32. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

Author: Niculae, Vlad, Kumar, Srijan, Boyd-Graber, Jordan, and Danescu-Niculescu-Mizil, Cristian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks, Physics - Physics and Society, Statistics - Machine Learning
Abstract: Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal., Comment: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal/
Published: 2015

33. QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns

Author: Niculae, Vlad, Suen, Caroline, Zhang, Justine, Danescu-Niculescu-Mizil, Cristian, and Leskovec, Jure
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks, Physics - Physics and Society
Abstract: Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent of systematicity remains widely disputed. In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. We apply this framework to a massive dataset of news articles spanning the six years of Obama's presidency and all of his speeches, and reveal that a systematic pattern does indeed emerge from the outlet's quoting behavior. Moreover, we show that this pattern can be successfully exploited in an unsupervised prediction setting, to determine which new quotes an outlet will select to broadcast. By encoding bias patterns in a low-rank space we provide an analysis of the structure of political media coverage. This reveals a latent media bias space that aligns surprisingly well with political ideology and outlet type. A linguistic analysis exposes striking differences across these latent dimensions, showing how the different types of media outlets portray different realities even when reporting on the same events. For example, outlets mapped to the mainstream conservative side of the latent space focus on quotes that portray a presidential persona disproportionately characterized by negativity., Comment: To appear in the Proceedings of WWW 2015. 11pp, 10 fig. Interactive visualization, data, and other info available at http://snap.stanford.edu/quotus/
Published: 2015

34. API design for machine learning software: experiences from the scikit-learn project

Author: Buitinck, Lars, Louppe, Gilles, Blondel, Mathieu, Pedregosa, Fabian, Mueller, Andreas, Grisel, Olivier, Niculae, Vlad, Prettenhofer, Peter, Gramfort, Alexandre, Grobler, Jaques, Layton, Robert, Vanderplas, Jake, Joly, Arnaud, Holt, Brian, and Varoquaux, Gaël
Subjects: Computer Science - Learning, Computer Science - Mathematical Software
Abstract: Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.
Published: 2013

35. Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

Author: Stap, David, primary, Niculae, Vlad, additional, and Monz, Christof, additional
Published: 2023
Full Text: View/download PDF

36. Romanian Syllabication Using Machine Learning

Author: Dinu, Liviu P., Niculae, Vlad, Sulea, Octavia-Maria, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Habernal, Ivan, editor, and Matoušek, Václav, editor
Published: 2013
Full Text: View/download PDF

37. Aspects regarding the construction of safe earthquake staircases

Author: Pirvanus Ana-Maria, Niculae Vlad Stefan, and Stoica Daniel
Subjects: General Medicine
Published: 2021

38. On Target Representation in Continuous-output Neural Machine Translation

Author: Tokarchuk, Evgeniia, primary and Niculae, Vlad, additional
Published: 2022
Full Text: View/download PDF

39. Learning Binary Decision Trees by Argmin Differentiation

Author: Zantedeschi, Valentina, Kusner, Matt J, Niculae, Vlad, The Inria London Programme (Inria-London), Computer science department [University College London] (UCL-CS), University College of London [London] (UCL)-University College of London [London] (UCL)-Institut National de Recherche en Informatique et en Automatique (Inria), MOdel for Data Analysis and Learning (MODAL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Paul Painlevé - UMR 8524 (LPP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)-Université de Lille, Sciences et Technologies, University College of London [London] (UCL), University of Amsterdam [Amsterdam] (UvA), Department of Computer science [University College of London] (UCL-CS), Laboratoire Paul Painlevé (LPP), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille), and Zantedeschi, Valentina
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Computer Science - Artificial Intelligence, Statistics - Machine Learning, self-supervised learning, argmin differentiation, Machine Learning (stat.ML), implicit layer, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], binary decision trees, Machine Learning (cs.LG)
Abstract: International audience; We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters(i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously usingargmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass throughthe program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This meansthat our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees.
Published: 2021

40. Sparse And Structured Visual Attention

Author: Martins, Pedro Henrique, primary, Niculae, Vlad, additional, Marinho, Zita, additional, and Martins, Andre F. T., additional
Published: 2021
Full Text: View/download PDF

41. Aspects regarding the use of the Tuned Mass Dampers

Author: Zainulabdeen K., Abdulfattah, primary, Niculae, Vlad Stefan, additional, and Stoica, Daniel, additional
Published: 2021
Full Text: View/download PDF

42. Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

Author: Correia, Gon��alo M., Niculae, Vlad, Aziz, Wilker, and Martins, Andr�� F. T.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations., Accepted for spotlight presentation at NeurIPS 2020
Published: 2020

43. Romanian Syllabication Using Machine Learning

Author: Dinu, Liviu P., primary, Niculae, Vlad, additional, and Sulea, Octavia-Maria, additional
Published: 2013
Full Text: View/download PDF

44. Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Author: Mihaylova, Tsvetomila, primary, Niculae, Vlad, additional, and Martins, André F. T., additional
Published: 2020
Full Text: View/download PDF

45. Aspecte referitoare la construcția de scări sigure pentru cutremur.

Author: Pîrvănuş, Ana- Maria, Niculae, Vlad Ştefan, and Stoica, Daniel
Subjects: REINFORCED concrete, STAIRS, CASE studies, PALEOSEISMOLOGY
Abstract: Copyright of Romanian Journal of Civil Engineering / Revista Română de Inginerie Civilă is the property of Matrix Rom and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2021
Full Text: View/download PDF

46. Learning Deep Models with Linguistically-Inspired Structure

Author: Niculae, Vlad
Published: 2018
Full Text: View/download PDF

47. Latent Structure Models for Natural Language Processing

Author: Martins, André F. T., primary, Mihaylova, Tsvetomila, additional, Nangia, Nikita, additional, and Niculae, Vlad, additional
Published: 2019
Full Text: View/download PDF

48. Sparse Sequence-to-Sequence Models

Author: Peters, Ben, primary, Niculae, Vlad, additional, and Martins, André F. T., additional
Published: 2019
Full Text: View/download PDF

49. Adaptively Sparse Transformers

Author: Correia, Gonçalo M., primary, Niculae, Vlad, additional, and Martins, André F. T., additional
Published: 2019
Full Text: View/download PDF

50. Aspecte privind utilizarea dispozitivelor de amortizare cu masa acordata.

Author: Abdulfattah Abdulfattah, Zainulabdeen K., Niculae, Vlad Stefan, and Stoica, Daniel
Subjects: TUNED mass dampers, REINFORCED concrete, CASE studies, SKYSCRAPERS, TALL buildings
Abstract: Copyright of Romanian Journal of Civil Engineering / Revista Română de Inginerie Civilă is the property of Matrix Rom and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

112 results on '"Niculae, Vlad"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources