112 results on '"Niculae, Vlad"'
Search Results
2. Sparse and Structured Hopfield Networks
- Author
-
Santos, Saul, Niculae, Vlad, McNamee, Daniel, and Martins, Andre F. T.
- Subjects
Computer Science - Machine Learning - Abstract
Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach., Comment: 20 pages, 4 figures
- Published
- 2024
3. On Measuring Context Utilization in Document-Level MT Systems
- Author
-
Mohammed, Wafaa and Niculae, Vlad
- Subjects
Computer Science - Computation and Language - Abstract
Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We propose to complement accuracy-based evaluation with measures of context utilization. We find that perturbation-based analysis (comparing models' performance when provided with correct versus random context) is an effective measure of overall context utilization. For a finer-grained phenomenon-specific evaluation, we propose to measure how much the supporting context contributes to handling context-dependent discourse phenomena. We show that automatically-annotated supporting context gives similar conclusions to human-annotated context and can be used as alternative for cases where human annotations are not available. Finally, we highlight the importance of using discourse-rich datasets when assessing context utilization.
- Published
- 2024
4. The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation
- Author
-
Tokarchuk, Evgeniia and Niculae, Vlad
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.
- Published
- 2023
5. Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables
- Author
-
Araabi, Ali, Niculae, Vlad, and Monz, Christof
- Subjects
Computer Science - Computation and Language ,68T50 ,I.2.7 - Abstract
Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, resulting in significant enhancement of compositionality, which is a key aspect of generalization. We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores. Furthermore, we conduct an error analysis, and find Joint Dropout to also enhance generalizability of low-resource NMT in terms of robustness and adaptability across different domains, Comment: Accepted at MT Summit 2023
- Published
- 2023
6. Two derivations of Principal Component Analysis on datasets of distributions
- Author
-
Niculae, Vlad
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives., Comment: 4 pages, 1 figure
- Published
- 2023
7. Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens
- Author
-
Stap, David, Niculae, Vlad, and Monz, Christof
- Subjects
Computer Science - Computation and Language - Abstract
We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups., Comment: Accepted to EMNLP 2023 Findings
- Published
- 2023
8. DAG Learning on the Permutahedron
- Author
-
Zantedeschi, Valentina, Franceschi, Luca, Kaddour, Jean, Kusner, Matt J., and Niculae, Vlad
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD., Comment: The Eleventh International Conference on Learning Representations
- Published
- 2023
9. Discrete Latent Structure in Neural Networks
- Author
-
Niculae, Vlad, Corro, Caio F., Nangia, Nikita, Mihaylova, Tsvetomila, and Martins, André F. T.
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning ,I.2.6 - Abstract
Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
- Published
- 2023
10. How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?
- Author
-
Araabi, Ali, Monz, Christof, and Niculae, Vlad
- Subjects
Computer Science - Computation and Language ,68T50 ,I.2.7 - Abstract
Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results for a wide range of translation tasks in terms of automatic evaluation metrics. While it is often assumed that by using BPE, NMT systems are capable of handling OOV words, the effectiveness of BPE in translating OOV words has not been explicitly measured. In this paper, we study to what extent BPE is successful in translating OOV words at the word-level. We analyze the translation quality of OOV words based on word type, number of segments, cross-attention weights, and the frequency of segment n-grams in the training data. Our experiments show that while careful BPE settings seem to be fairly useful in translating OOV words across datasets, a considerable percentage of OOV words are translated incorrectly. Furthermore, we highlight the slightly higher effectiveness of BPE in translating OOV words for special cases, such as named-entities and when the languages involved are linguistically close to each other., Comment: 14 pages, 6 figures, 1 table, To be published in AMTA 2022 conference
- Published
- 2022
11. Modeling Structure with Undirected Neural Networks
- Author
-
Mihaylova, Tsvetomila, Niculae, Vlad, and Martins, André F. T.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem -- e.g., factor graphs -- neural networks are usually monolithic mappings from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing different directions of computation and interaction between the modeled variables. In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order. For particular choices, our proposed models subsume and extend many existing architectures: feed-forward, recurrent, self-attention networks, auto-encoders, and networks with implicit layers. We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks: tree-constrained dependency parsing, convolutional image classification, and sequence completion with attention. By varying the computation order, we show how a single UNN can be used both as a classifier and a prototype generator, and how it can fill in missing parts of an input sequence, making them a promising field for further research., Comment: ICML 2022
- Published
- 2022
12. Sparse Communication via Mixed Distributions
- Author
-
Farinhas, António, Aziz, Wilker, Niculae, Vlad, and Martins, André F. T.
- Subjects
Computer Science - Machine Learning - Abstract
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the Hard Concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids, which we call "mixed random variables." Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic ("sample-and-project") and an intrinsic one (based on face stratification). We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables., Comment: Accepted for oral presentation at ICLR 2022
- Published
- 2021
13. Sparse Continuous Distributions and Fenchel-Young Losses
- Author
-
Martins, André F. T., Treviso, Marcos, Farinhas, António, Aguiar, Pedro M. Q., Figueiredo, Mário A. T., Blondel, Mathieu, and Niculae, Vlad
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $\Omega$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain ``deformed exponential families,'' which include $\alpha$-entmax and sparsemax ($\alpha=2$) as particular cases. For quadratic energy functions, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions., Comment: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.07214
- Published
- 2021
14. Learning Binary Decision Trees by Argmin Differentiation
- Author
-
Zantedeschi, Valentina, Kusner, Matt J., and Niculae, Vlad
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees.
- Published
- 2020
15. Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning
- Author
-
Mihaylova, Tsvetomila, Niculae, Vlad, and Martins, André F. T.
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases., Comment: EMNLP 2020
- Published
- 2020
16. Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
- Author
-
Correia, Gonçalo M., Niculae, Vlad, Aziz, Wilker, and Martins, André F. T.
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations., Comment: Accepted for spotlight presentation at NeurIPS 2020
- Published
- 2020
17. Sparse and Continuous Attention Mechanisms
- Author
-
Martins, André F. T., Farinhas, António, Treviso, Marcos, Niculae, Vlad, Aguiar, Pedro M. Q., and Figueiredo, Mário A. T.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions., Comment: Accepted for spotlight presentation at NeurIPS 2020
- Published
- 2020
18. Sparse and Structured Visual Attention
- Author
-
Martins, Pedro Henrique, Niculae, Vlad, Marinho, Zita, and Martins, André
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual attention mechanisms are widely used in multimodal tasks, as visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign some probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attention mechanism with two alternative sparsity-promoting transformations: sparsemax, which is able to select only the relevant regions (assigning zero weight to the rest), and a newly proposed Total-Variation Sparse Attention (TVmax), which further encourages the joint selection of adjacent spatial locations. Experiments in VQA show gains in accuracy as well as higher similarity to human attention, which suggests better interpretability.
- Published
- 2020
19. LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
- Author
-
Niculae, Vlad and Martins, André F. T.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some downstream applications. However, SparseMAP requires a tractable MAP inference oracle. This excludes, e.g., loopy graphical models or factor graphs with logic constraints, which generally require approximate inference. In this paper, we introduce LP-SparseMAP, an extension of SparseMAP that addresses this limitation via a local polytope relaxation. LP-SparseMAP uses the flexible and powerful domain specific language of factor graphs for defining and backpropagating through arbitrary hidden structure, supporting coarse decompositions, hard logic constraints, and higher-order correlations. We derive the forward and backward algorithms needed for using LP-SparseMAP as a hidden or output layer. Experiments in three structured prediction tasks show benefits compared to SparseMAP and Structured SVM., Comment: 34 pages, 5 tables, 4 figures. ICML 2020
- Published
- 2020
20. Adaptively Sparse Transformers
- Author
-
Correia, Gonçalo M., Niculae, Vlad, and Martins, André F. T.
- Subjects
Computer Science - Computation and Language ,Statistics - Machine Learning - Abstract
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $\alpha$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $\alpha$ parameter -- which controls the shape and sparsity of $\alpha$-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations., Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China
- Published
- 2019
21. Notes on Latent Structure Models and SPIGOT
- Author
-
Martins, André F. T. and Niculae, Vlad
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
These notes aim to shed light on the recently proposed structured projected intermediate gradient optimization technique (SPIGOT, Peng et al., 2018). SPIGOT is a variant of the straight-through estimator (Bengio et al., 2013) which bypasses gradients of the argmax function by back-propagating a surrogate "gradient." We provide a new interpretation to the proposed gradient and put this technique into perspective, linking it to other methods for training neural networks with discrete latent variables. As a by-product, we suggest alternate variants of SPIGOT which will be further explored in future work., Comment: 7 pages
- Published
- 2019
22. Sparse Sequence-to-Sequence Models
- Author
-
Peters, Ben, Niculae, Vlad, and Martins, André F. T.
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models., Comment: ACL 2019 Camera Ready
- Published
- 2019
23. Learning with Fenchel-Young Losses
- Author
-
Blondel, Mathieu, Martins, André F. T., and Niculae, Vlad
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice., Comment: In Journal of Machine Learning Research, volume 21
- Published
- 2019
24. Towards Dynamic Computation Graphs via Sparse Latent Structure
- Author
-
Niculae, Vlad, Martins, André F. T., and Cardie, Claire
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Statistics - Machine Learning ,68T50 ,I.2.6 ,I.2.7 - Abstract
Deep NLP models benefit from underlying structures in the data---e.g., parse trees---typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability., Comment: EMNLP 2018; 9 pages (incl. appendix)
- Published
- 2018
25. Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms
- Author
-
Blondel, Mathieu, Martins, André F. T., and Niculae, Vlad
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distributions. We formulate conditions for a generalized entropy to yield losses with a separation margin, and probability distributions with sparse support. Finally, we derive efficient algorithms, making Fenchel-Young losses appealing both in theory and practice., Comment: In proceedings of AISTATS 2019
- Published
- 2018
26. SparseMAP: Differentiable Sparse Structured Inference
- Author
-
Niculae, Vlad, Martins, André F. T., Blondel, Mathieu, and Cardie, Claire
- Subjects
Statistics - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Learning ,68T50 ,I.2.6 - Abstract
Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structures, including implausible ones. Importantly, SparseMAP can be computed using only calls to a MAP oracle, making it applicable to problems with intractable marginal inference, e.g., linear assignment. Sparsity makes gradient backpropagation efficient regardless of the structure, enabling us to augment deep neural networks with generic and sparse structured hidden layers. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems., Comment: Published in ICML 2018. 14 pages, including appendix
- Published
- 2018
27. A Regularized Framework for Sparse and Structured Neural Attention
- Author
-
Niculae, Vlad and Blondel, Mathieu
- Subjects
Statistics - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax., Comment: In proceedings of NeurIPS 2017; added errata
- Published
- 2017
28. Multi-output Polynomial Networks and Factorization Machines
- Author
-
Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, and Ueda, Naonori
- Subjects
Statistics - Machine Learning ,Computer Science - Learning - Abstract
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy., Comment: Published at NIPS 2017. 17 pages, including appendix
- Published
- 2017
29. Argument Mining with Structured SVMs and RNNs
- Author
-
Niculae, Vlad, Park, Joonsuk, and Cardie, Claire
- Subjects
Computer Science - Computation and Language ,68T50 ,I.2.7 - Abstract
We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets., Comment: Accepted for publication at ACL 2017. 11 pages, 5 figures. Code at https://github.com/vene/marseille and data at http://joonsuk.org/
- Published
- 2017
30. Conversational Markers of Constructive Discussions
- Author
-
Niculae, Vlad and Danescu-Niculescu-Mizil, Cristian
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Social and Information Networks ,Physics - Physics and Society ,Statistics - Machine Learning - Abstract
Group discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discussion, the sum becomes greater than the parts. However, this assumption is not irrefutable: anecdotal evidence of wasteful discussions abounds, and in our own experiments we find that over 30% of discussions are unproductive. We propose a framework for analyzing conversational dynamics in order to determine whether a given task-oriented discussion is worth having or not. We exploit conversational patterns reflecting the flow of ideas and the balance between the participants, as well as their linguistic choices. We apply this framework to conversations naturally occurring in an online collaborative world exploration game developed and deployed to support this research. Using this setting, we show that linguistic cues and conversational patterns extracted from the first 20 seconds of a team discussion are predictive of whether it will be a wasteful or a productive one., Comment: To appear at NAACL-HLT 2016. 11pp, 5 fig. Data and other info available at http://vene.ro/constructive/
- Published
- 2016
31. Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-faith Online Discussions
- Author
-
Tan, Chenhao, Niculae, Vlad, Danescu-Niculescu-Mizil, Cristian, and Lee, Lillian
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Computation and Language ,Physics - Physics and Society - Abstract
Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their own opinions and reasoning, invite others to contest them, and acknowledge when the ensuing discussions change their original views. In this work, we study these interactions to understand the mechanisms behind persuasion. We find that persuasive arguments are characterized by interesting patterns of interaction dynamics, such as participant entry-order and degree of back-and-forth exchange. Furthermore, by comparing similar counterarguments to the same opinion, we show that language factors play an essential role. In particular, the interplay between the language of the opinion holder and that of the counterargument provides highly predictive cues of persuasiveness. Finally, since even in this favorable setting people may not be persuaded, we investigate the problem of determining whether someone's opinion is susceptible to being changed at all. For this more difficult task, we show that stylistic choices in how the opinion is expressed carry predictive power., Comment: 12 pages, 10 figures, to appear in Proceedings of WWW 2016, data and more at https://chenhaot.com/pages/changemyview.html (v2 made a minor correction on submission rules in ChangeMyView.)
- Published
- 2016
- Full Text
- View/download PDF
32. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game
- Author
-
Niculae, Vlad, Kumar, Srijan, Boyd-Graber, Jordan, and Danescu-Niculescu-Mizil, Cristian
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Social and Information Networks ,Physics - Physics and Society ,Statistics - Machine Learning - Abstract
Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal., Comment: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal/
- Published
- 2015
33. QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns
- Author
-
Niculae, Vlad, Suen, Caroline, Zhang, Justine, Danescu-Niculescu-Mizil, Cristian, and Leskovec, Jure
- Subjects
Computer Science - Computation and Language ,Computer Science - Social and Information Networks ,Physics - Physics and Society - Abstract
Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent of systematicity remains widely disputed. In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. We apply this framework to a massive dataset of news articles spanning the six years of Obama's presidency and all of his speeches, and reveal that a systematic pattern does indeed emerge from the outlet's quoting behavior. Moreover, we show that this pattern can be successfully exploited in an unsupervised prediction setting, to determine which new quotes an outlet will select to broadcast. By encoding bias patterns in a low-rank space we provide an analysis of the structure of political media coverage. This reveals a latent media bias space that aligns surprisingly well with political ideology and outlet type. A linguistic analysis exposes striking differences across these latent dimensions, showing how the different types of media outlets portray different realities even when reporting on the same events. For example, outlets mapped to the mainstream conservative side of the latent space focus on quotes that portray a presidential persona disproportionately characterized by negativity., Comment: To appear in the Proceedings of WWW 2015. 11pp, 10 fig. Interactive visualization, data, and other info available at http://snap.stanford.edu/quotus/
- Published
- 2015
34. API design for machine learning software: experiences from the scikit-learn project
- Author
-
Buitinck, Lars, Louppe, Gilles, Blondel, Mathieu, Pedregosa, Fabian, Mueller, Andreas, Grisel, Olivier, Niculae, Vlad, Prettenhofer, Peter, Gramfort, Alexandre, Grobler, Jaques, Layton, Robert, Vanderplas, Jake, Joly, Arnaud, Holt, Brian, and Varoquaux, Gaël
- Subjects
Computer Science - Learning ,Computer Science - Mathematical Software - Abstract
Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.
- Published
- 2013
35. Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens
- Author
-
Stap, David, primary, Niculae, Vlad, additional, and Monz, Christof, additional
- Published
- 2023
- Full Text
- View/download PDF
36. Romanian Syllabication Using Machine Learning
- Author
-
Dinu, Liviu P., Niculae, Vlad, Sulea, Octavia-Maria, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Habernal, Ivan, editor, and Matoušek, Václav, editor
- Published
- 2013
- Full Text
- View/download PDF
37. Aspects regarding the construction of safe earthquake staircases
- Author
-
Pirvanus Ana-Maria, Niculae Vlad Stefan, and Stoica Daniel
- Subjects
General Medicine - Published
- 2021
38. On Target Representation in Continuous-output Neural Machine Translation
- Author
-
Tokarchuk, Evgeniia, primary and Niculae, Vlad, additional
- Published
- 2022
- Full Text
- View/download PDF
39. Learning Binary Decision Trees by Argmin Differentiation
- Author
-
Zantedeschi, Valentina, Kusner, Matt J, Niculae, Vlad, The Inria London Programme (Inria-London), Computer science department [University College London] (UCL-CS), University College of London [London] (UCL)-University College of London [London] (UCL)-Institut National de Recherche en Informatique et en Automatique (Inria), MOdel for Data Analysis and Learning (MODAL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Paul Painlevé - UMR 8524 (LPP), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille)-Université de Lille, Sciences et Technologies, University College of London [London] (UCL), University of Amsterdam [Amsterdam] (UvA), Department of Computer science [University College of London] (UCL-CS), Laboratoire Paul Painlevé (LPP), Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Evaluation des technologies de santé et des pratiques médicales - ULR 2694 (METRICS), Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-Université de Lille-Centre Hospitalier Régional Universitaire [Lille] (CHRU Lille)-École polytechnique universitaire de Lille (Polytech Lille), and Zantedeschi, Valentina
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,self-supervised learning ,argmin differentiation ,Machine Learning (stat.ML) ,implicit layer ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,binary decision trees ,Machine Learning (cs.LG) - Abstract
International audience; We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters(i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously usingargmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass throughthe program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This meansthat our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees.
- Published
- 2021
40. Sparse And Structured Visual Attention
- Author
-
Martins, Pedro Henrique, primary, Niculae, Vlad, additional, Marinho, Zita, additional, and Martins, Andre F. T., additional
- Published
- 2021
- Full Text
- View/download PDF
41. Aspects regarding the use of the Tuned Mass Dampers
- Author
-
Zainulabdeen K., Abdulfattah, primary, Niculae, Vlad Stefan, additional, and Stoica, Daniel, additional
- Published
- 2021
- Full Text
- View/download PDF
42. Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
- Author
-
Correia, Gon��alo M., Niculae, Vlad, Aziz, Wilker, and Martins, Andr�� F. T.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations., Accepted for spotlight presentation at NeurIPS 2020
- Published
- 2020
43. Romanian Syllabication Using Machine Learning
- Author
-
Dinu, Liviu P., primary, Niculae, Vlad, additional, and Sulea, Octavia-Maria, additional
- Published
- 2013
- Full Text
- View/download PDF
44. Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning
- Author
-
Mihaylova, Tsvetomila, primary, Niculae, Vlad, additional, and Martins, André F. T., additional
- Published
- 2020
- Full Text
- View/download PDF
45. Aspecte referitoare la construcția de scări sigure pentru cutremur.
- Author
-
Pîrvănuş, Ana- Maria, Niculae, Vlad Ştefan, and Stoica, Daniel
- Subjects
REINFORCED concrete ,STAIRS ,CASE studies ,PALEOSEISMOLOGY - Abstract
Copyright of Romanian Journal of Civil Engineering / Revista Română de Inginerie Civilă is the property of Matrix Rom and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
46. Learning Deep Models with Linguistically-Inspired Structure
- Author
-
Niculae, Vlad
- Published
- 2018
- Full Text
- View/download PDF
47. Latent Structure Models for Natural Language Processing
- Author
-
Martins, André F. T., primary, Mihaylova, Tsvetomila, additional, Nangia, Nikita, additional, and Niculae, Vlad, additional
- Published
- 2019
- Full Text
- View/download PDF
48. Sparse Sequence-to-Sequence Models
- Author
-
Peters, Ben, primary, Niculae, Vlad, additional, and Martins, André F. T., additional
- Published
- 2019
- Full Text
- View/download PDF
49. Adaptively Sparse Transformers
- Author
-
Correia, Gonçalo M., primary, Niculae, Vlad, additional, and Martins, André F. T., additional
- Published
- 2019
- Full Text
- View/download PDF
50. Aspecte privind utilizarea dispozitivelor de amortizare cu masa acordata.
- Author
-
Abdulfattah Abdulfattah, Zainulabdeen K., Niculae, Vlad Stefan, and Stoica, Daniel
- Subjects
TUNED mass dampers ,REINFORCED concrete ,CASE studies ,SKYSCRAPERS ,TALL buildings - Abstract
Copyright of Romanian Journal of Civil Engineering / Revista Română de Inginerie Civilă is the property of Matrix Rom and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.