Author: "Hyvarinen, Aapo" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hyvarinen, Aapo"' showing total 91 results

Start Over Author "Hyvarinen, Aapo"

91 results on '"Hyvarinen, Aapo"'

1. Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond

Author: Chehab, Omar, Hyvarinen, Aapo, and Risteski, Andrej
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces. First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. Third, we find that the arithmetic path, while rarely used, can offer optimality properties over the universally-used geometric path. In fact, in a particular limit, the optimal path is arithmetic. Based on this theory, we finally propose a two-step estimator to approximate the optimal path in an efficient way.
Published: 2023

2. Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning

Author: Hyvarinen, Aapo, Khemakhem, Ilyes, and Morioka, Hiroshi
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: A central problem in unsupervised deep learning is how to find useful representations of high-dimensional data, sometimes called "disentanglement". Most approaches are heuristic and lack a proper theoretical foundation. In linear representation learning, independent component analysis (ICA) has been successful in many applications areas, and it is principled, i.e., based on a well-defined probabilistic model. However, extension of ICA to the nonlinear case has been problematic due to the lack of identifiability, i.e., uniqueness of the representation. Recently, nonlinear extensions that utilize temporal structure or some auxiliary information have been proposed. Such models are in fact identifiable, and consequently, an increasing number of algorithms have been developed. In particular, some self-supervised algorithms can be shown to estimate nonlinear ICA, even though they have initially been proposed from heuristic perspectives. This paper reviews the state-of-the-art of nonlinear ICA theory and algorithms., Comment: Revised version, to appear in Patterns
Published: 2023

3. Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

Author: Chehab, Omar, Gramfort, Alexandre, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Self-supervised learning is an increasingly popular approach to unsupervised learning, achieving state-of-the-art results. A prevalent approach consists in contrasting data points and noise points within a classification task: this requires a good noise distribution which is notoriously hard to specify. While a comprehensive theory is missing, it is widely assumed that the optimal noise distribution should in practice be made equal to the data distribution, as in Generative Adversarial Networks (GANs). We here empirically and theoretically challenge this assumption. We turn to Noise-Contrastive Estimation (NCE) which grounds this self-supervised task as an estimation problem of an energy-based model of the data. This ties the optimality of the noise distribution to the sample efficiency of the estimator, which is rigorously defined as its asymptotic variance, or mean-squared error. In the special case where the normalization constant only is unknown, we show that NCE recovers a family of Importance Sampling estimators for which the optimal noise is indeed equal to the data distribution. However, in the general case where the energy is also unknown, we prove that the optimal noise density is the data density multiplied by a correction term based on the Fisher score. In particular, the optimal noise distribution is different from the data distribution, and is even from a different family. Nevertheless, we soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's., Comment: arXiv admin note: text overlap with arXiv:2203.01110
Published: 2023

4. The Optimal Noise in Noise-Contrastive Learning Is Not What You Think

Author: Chehab, Omar, Gramfort, Alexandre, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion. This setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.
Published: 2022

5. Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA

Author: Hälvä, Hermanni, Corff, Sylvain Le, Lehéricy, Luc, So, Jonathan, Zhu, Yongjie, Gassiat, Elisabeth, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood., Comment: Accepted for publication at NeurIPS 2021
Published: 2021

6. Autoregressive flow-based causal discovery and inference

Author: Monti, Ricardo Pio, Khemakhem, Ilyes, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We posit that autoregressive flow models are well-suited to performing a range of causal inference tasks - ranging from causal discovery to making interventional and counterfactual predictions. In particular, we exploit the fact that autoregressive architectures define an ordering over variables, analogous to a causal ordering, in order to propose a single flow architecture to perform all three aforementioned tasks. We first leverage the fact that flow models estimate normalized log-densities of data to derive a bivariate measure of causal direction based on likelihood ratios. Whilst traditional measures of causal direction often require restrictive assumptions on the nature of causal relationships (e.g., linearity),the flexibility of flow models allows for arbitrary causal dependencies. Our approach compares favourably against alternative methods on synthetic data as well as on the Cause-Effect Pairs bench-mark dataset. Subsequently, we demonstrate that the invertible nature of flows naturally allows for direct evaluation of both interventional and counterfactual predictions, which require marginalization and conditioning over latent variables respectively. We present examples over synthetic data where autoregressive flows, when trained under the correct causal ordering, are able to make accurate interventional and counterfactual predictions, Comment: 6 pages, 3 figures. Accepted at the 2nd ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models
Published: 2020

7. Interpretable brain age prediction using linear latent variable models of functional connectivity

Author: Monti, Ricardo Pio, Gibberd, Alex, Roy, Sandipan, Nunes, Matt, Lorenz, Romy, Leech, Robert, Ogawa, Takeshi, Kawanabe, Motoaki, and Hyvarinen, Aapo
Subjects: Statistics - Applications, Quantitative Biology - Neurons and Cognition
Abstract: Neuroimaging-driven prediction of brain age, defined as the predicted biological age of a subject using only brain imaging data, is an exciting avenue of research. In this work we seek to build models of brain age based on functional connectivity while prioritizing model interpretability and understanding. This way, the models serve to both provide accurate estimates of brain age as well as allow us to investigate changes in functional connectivity which occur during the ageing process. The methods proposed in this work consist of a two-step procedure: first, linear latent variable models, such as PCA and its extensions, are employed to learn reproducible functional connectivity networks present across a cohort of subjects. The activity within each network is subsequently employed as a feature in a linear regression model to predict brain age. The proposed framework is employed on the data from the CamCAN repository and the inferred brain age models are further demonstrated to generalize using data from two open-access repositories: the Human Connectome Project and the ATR Wide-Age-Range., Comment: 21 pages, 11 figures
Published: 2019
Full Text: View/download PDF

8. Information criteria for non-normalized models

Author: Matsuda, Takeru, Uehara, Masatoshi, and Hyvarinen, Aapo
Subjects: Mathematics - Statistics Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model selection methods for general non-normalized models have not been proposed so far. In this study, we develop information criteria for non-normalized models estimated by NCE or score matching. They are approximately unbiased estimators of discrepancy measures for non-normalized models. Simulation results and applications to real data demonstrate that the proposed criteria enable selection of the appropriate non-normalized model in a data-driven manner.
Published: 2019

9. Causal Discovery with General Non-Linear Relationships Using Non-Linear ICA

Author: Monti, Ricardo Pio, Zhang, Kun, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we propose a framework through which we can perform causal discovery in the presence of general non-linear relationships. The proposed method is based on recent progress in non-linear independent component analysis and exploits the non-stationarity of observations in order to recover the underlying sources or latent disturbances. We show rigorously that in the case of bivariate causal discovery, such non-linear ICA can be used to infer the causal direction via a series of independence tests. We further propose an alternative measure of causal direction based on asymptotic approximations to the likelihood ratio, as well as an extension to multivariate causal discovery. We demonstrate the capabilities of the proposed method via a series of simulation studies and conclude with an application to neuroimaging data.
Published: 2019

10. Neural Empirical Bayes

Author: Saremi, Saeed and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We unify $\textit{kernel density estimation}$ and $\textit{empirical Bayes}$ and address a set of problems in unsupervised learning with a geometric interpretation of those methods, rooted in the $\textit{concentration of measure}$ phenomenon. Kernel density is viewed symbolically as $X\rightharpoonup Y$ where the random variable $X$ is smoothed to $Y= X+N(0,\sigma^2 I_d)$, and empirical Bayes is the machinery to denoise in a least-squares sense, which we express as $X \leftharpoondown Y$. A learning objective is derived by combining these two, symbolically captured by $X \rightleftharpoons Y$. Crucially, instead of using the original nonparametric estimators, we parametrize $\textit{the energy function}$ with a neural network denoted by $\phi$; at optimality, $\nabla \phi \approx -\nabla \log f$ where $f$ is the density of $Y$. The optimization problem is abstracted as interactions of high-dimensional spheres which emerge due to the concentration of isotropic gaussians. We introduce two algorithmic frameworks based on this machinery: (i) a "walk-jump" sampling scheme that combines Langevin MCMC (walks) and empirical Bayes (jumps), and (ii) a probabilistic framework for $\textit{associative memory}$, called NEBULA, defined \`{a} la Hopfield by the $\textit{gradient flow}$ of the learned energy to a set of attractors. We finish the paper by reporting the emergence of very rich "creative memories" as attractors of NEBULA for highly-overlapping spheres., Comment: 23 pages, 10 figures
Published: 2019

11. Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

Author: Hyvarinen, Aapo, Sasaki, Hiroaki, and Turner, Richard E.
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages., Comment: Camera-ready version of article accepted for AISTATS2019
Published: 2018

12. Estimation of Non-Normalized Mixture Models and Clustering Using Deep Representation

Author: Matsuda, Takeru and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We develop a general method for estimating a finite mixture of non-normalized models. Here, a non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized models without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization constant. The proposed method is derived by extending noise contrastive estimation (NCE), which estimates non-normalized models by discriminating between the observed data and some artificially generated noise. We also propose an extension of NCE with multiple noise distributions. Then, based on the observation that conventional classification learning with neural networks is implicitly assuming an exponential family as a generative model, we introduce a method for clustering unlabeled data by estimating a finite mixture of distributions in an exponential family. Estimation of this mixture model is attained by the proposed extensions of NCE where the training data of neural networks are used as noise. Thus, the proposed method provides a probabilistically principled clustering method that is able to utilize a deep representation. Application to image clustering using a deep neural network gives promising results.
Published: 2018

13. Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Author: Hyvarinen, Aapo and Morioka, Hiroshi
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.
Published: 2016

14. A direct method for estimating a causal ordering in a linear non-Gaussian acyclic model

Author: Shimizu, Shohei, Hyvarinen, Aapo, and Kawahara, Yoshinobu
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it was shown that use of non-Gaussianity identifies a causal ordering of variables in a linear acyclic model without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model., Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
Published: 2014

15. Bridging Information Criteria and Parameter Shrinkage for Model Selection

Author: Zhang, Kun, Peng, Heng, Chan, Laiwan, and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: Model selection based on classical information criteria, such as BIC, is generally computationally demanding, but its properties are well studied. On the other hand, model selection based on parameter shrinkage by $\ell_1$-type penalties is computationally efficient. In this paper we make an attempt to combine their strengths, and propose a simple approach that penalizes the likelihood with data-dependent $\ell_1$ penalties as in adaptive Lasso and exploits a fixed penalization parameter. Even for finite samples, its model selection results approximately coincide with those based on information criteria; in particular, we show that in some special cases, this approach and the corresponding information criterion produce exactly the same model. One can also consider this approach as a way to directly determine the penalization parameter in adaptive Lasso to achieve information criteria-like model selection. As extensions, we apply this idea to complex models including Gaussian mixture model and mixture of factor analyzers, whose model selection is traditionally difficult to do; by adopting suitable penalties, we provide continuous approximators to the corresponding information criteria, which are easy to optimize and enable efficient model selection., Comment: 16 pages, 3 figures
Published: 2013

16. ParceLiNGAM: A causal ordering method robust against latent confounders

Author: Tashiro, Tatsuya, Shimizu, Shohei, Hyvarinen, Aapo, and Washio, Takashi
Subjects: Statistics - Machine Learning
Abstract: We consider learning a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. The key idea is to detect latent confounders by testing independence between estimated external influences and find subsets (parcels) that include variables that are not affected by latent confounders. We demonstrate the effectiveness of our method using artificial data and simulated brain imaging data., Comment: A revised version of this was accepted in Neural Computation. 18 pages and 5 figures. arXiv admin note: substantial text overlap with arXiv:1204.1795
Published: 2013

17. Discovery of non-gaussian linear causal models using ICA

Author: Shimizu, Shohei, Hyvarinen, Aapo, Kano, Yutaka, and Hoyer, Patrik O.
Subjects: Computer Science - Learning, Computer Science - Mathematical Software, Statistics - Machine Learning
Abstract: In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis (ICA), and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data., Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)
Published: 2012

18. Causal discovery of linear acyclic models with arbitrary distributions

Author: Hoyer, Patrik O., Hyvarinen, Aapo, Scheines, Richard, Spirtes, Peter L., Ramsey, Joseph, Lacerda, Gustavo, and Shimizu, Shohei
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning
Abstract: An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations., Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)
Published: 2012

19. On the Identifiability of the Post-Nonlinear Causal Model

Author: Zhang, Kun and Hyvarinen, Aapo
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: By taking into account the nonlinear effect of the cause, the inner noise effect, and the measurement distortion effect in the observed variables, the post-nonlinear (PNL) causal model has demonstrated its excellent performance in distinguishing the cause from effect. However, its identifiability has not been properly addressed, and how to apply it in the case of more than two variables is also a problem. In this paper, we conduct a systematic investigation on its identifiability in the two-variable case. We show that this model is identifiable in most cases; by enumerating all possible situations in which the model is not identifiable, we provide sufficient conditions for its identifiability. Simulations are given to support the theoretical results. Moreover, in the case of more than two variables, we show that the whole causal structure can be found by applying the PNL causal model to each structure in the Markov equivalent class and testing if the disturbance is independent of the direct causes for each variable. In this way the exhaustive search over all possible causal structures is avoided., Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
Published: 2012

20. Estimation of causal orders in a linear non-Gaussian acyclic model: a method robust against latent confounders

Author: Tashiro, Tatsuya, Shimizu, Shohei, Hyvarinen, Aapo, and Washio, Takashi
Subjects: Statistics - Machine Learning
Abstract: We consider to learn a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. We demonstrate the effectiveness of our method using artificial data., Comment: 8 pages, 2 figures
Published: 2012

21. Source Separation and Higher-Order Causal Analysis of MEG and EEG

Author: Zhang, Kun and Hyvarinen, Aapo
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Separation of the sources and analysis of their connectivity have been an important topic in EEG/MEG analysis. To solve this problem in an automatic manner, we propose a two-layer model, in which the sources are conditionally uncorrelated from each other, but not independent; the dependence is caused by the causality in their time-varying variances (envelopes). The model is identified in two steps. We first propose a new source separation technique which takes into account the autocorrelations (which may be time-varying) and time-varying variances of the sources. The causality in the envelopes is then discovered by exploiting a special kind of multivariate GARCH (generalized autoregressive conditional heteroscedasticity) model. The resulting causal diagram gives the effective connectivity between the separated sources; in our experimental results on MEG data, sources with similar functions are grouped together, with negative influences between groups, and the groups are connected via some interesting sources., Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)
Published: 2012

22. A Family of Computationally Efficient and Simple Estimators for Unnormalized Statistical Models

Author: Pihlaja, Miika, Gutmann, Michael, and Hyvarinen, Aapo
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: We introduce a new family of estimators for unnormalized statistical models. Our family of estimators is parameterized by two nonlinear functions and uses a single sample from an auxiliary distribution, generalizing Maximum Likelihood Monte Carlo estimation of Geyer and Thompson (1992). The family is such that we can estimate the partition function like any other parameter in the model. The estimation is done by optimizing an algebraically simple, well defined objective function, which allows for the use of dedicated optimization methods. We establish consistency of the estimator family and give an expression for the asymptotic covariance matrix, which enables us to further analyze the influence of the nonlinearities and the auxiliary density on estimation performance. Some estimators in our family are particularly stable for a wide range of auxiliary densities. Interestingly, a specific choice of the nonlinearity establishes a connection between density estimation and classification by nonlinear logistic regression. Finally, the optimal amount of auxiliary samples relative to the given amount of the data is considered from the perspective of computational efficiency., Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)
Published: 2012

23. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model

Author: Shimizu, Shohei, Inazumi, Takanori, Sogawa, Yasuhiro, Hyvarinen, Aapo, Kawahara, Yoshinobu, Washio, Takashi, Hoyer, Patrik O., and Bollen, Kenneth
Subjects: Statistics - Machine Learning
Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, it was shown that use of non-Gaussianity identifies the full structure of a linear acyclic model, i.e., a causal ordering of variables and their connection strengths, without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering and connection strengths based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model., Comment: A revised version of this was accepted in Journal of Machine Learning Research
Published: 2011

24. Finding Exogenous Variables in Data with Many More Variables than Observations

Author: Shimizu, Shohei, Washio, Takashi, Hyvarinen, Aapo, and Imoto, Seiya
Subjects: Statistics - Machine Learning
Abstract: Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p>n). In this paper, we propose a method to find exogenous variables in a linear non-Gaussian causal model, which requires much smaller sample sizes than conventional methods and works even when p>>n. The key idea is to identify which variables are exogenous based on non-Gaussianity instead of estimating the entire structure of the model. Exogenous variables work as triggers that activate a causal chain in the model, and their identification leads to more efficient experimental designs and better understanding of the causal mechanism. We present experiments with artificial data and real-world gene expression data to evaluate the method., Comment: A revised version of this was published in Proc. ICANN2010
Published: 2009
Full Text: View/download PDF

25. Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks

Author: Vicente, M. Asuncion, Hoyer, Patrik O., and Hyvarinen, Aapo
Subjects: Machine vision -- Analysis, Object recognition (Computers) -- Analysis, Pattern recognition -- Analysis, Principal components analysis -- Methods
Abstract: Recently, a number of empirical studies have compared the performance of PCA and ICA as feature extraction methods in appearance-based object recognition systems, with mixed and seemingly contradictory results. In this paper, we briefly describe the connection between the two methods and argue that whitened PCA may yield identical results to ICA in some cases. Furthermore, we describe the specific situations in which ICA might significantly improve on PCA. Index Terms--Computer vision, object recognition, principal component analysis, independent component analysis.
Published: 2007

26. Learning with self-supervision on EEG data

Author: Gramfort, Alexandre, primary, Banville, Hubert, additional, Chehab, Omar, additional, Hyvarinen, Aapo, additional, and Engemann, Denis, additional
Published: 2021
Full Text: View/download PDF

27. Independent component analysis of fMRI group studies by self-organizing clustering

Author: Esposito, Fabrizio, Scarabino, Tommaso, Hyvarinen, Aapo, Himberg, Johan, Formisano, Elia, Comani, Silvia, Tedeschi, Gioacchino, Goebel, Rainer, Seifritz, Erich, and Di Salle, Francesco
Published: 2005
Full Text: View/download PDF

28. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables

Author: Hyvarinen, Aapo
Subjects: Monte Carlo method -- Usage, Estimation theory -- Usage, Neural networks -- Design and construction, Neural network, Business, Computers, Electronics, Electronics and electrical industries
Abstract: Score matching (SM) and contrastive divergence (CD) are two recently proposed methods for estimation of nonnormalized statistical methods without computation of the normalization constant (partition function). Although they are based on very different approaches, we show in this letter that they are equivalent in a special case: in the limit of infinitesimal noise in a specific Monte Carlo method. Further, we show how these methods can be interpreted as approximations of pseudolikelihood. Index Terms--Normalization constant, partition function, statistical estimation.
Published: 2007

29. Self-Supervised Representation Learning from Electroencephalography Signals

Author: Banville, Hubert, primary, Albuquerque, Isabela, additional, Hyvarinen, Aapo, additional, Moffat, Graeme, additional, Engemann, Denis-Alexander, additional, and Gramfort, Alexandre, additional
Published: 2019
Full Text: View/download PDF

30. Decoding emotional valence from electroencephalographic rhythmic activity

Author: Celikkanat, Hande, primary, Moriya, Hiroki, additional, Ogawa, Takeshi, additional, Kauppi, Jukka-Pekka, additional, Kawanabe, Motoaki, additional, and Hyvarinen, Aapo, additional
Published: 2017
Full Text: View/download PDF

31. Characterizing Variability of Modular Brain Connectivity with Constrained Principal Component Analysis

Author: University of Helsinki, Department of Computer Science, Hirayama, Jun-ichiro, Hyvarinen, Aapo, Kiviniemi, Vesa, Kawanabe, Motoaki, Yamashita, Okito, University of Helsinki, Department of Computer Science, Hirayama, Jun-ichiro, Hyvarinen, Aapo, Kiviniemi, Vesa, Kawanabe, Motoaki, and Yamashita, Okito
Abstract: Characterizing the variability of resting-state functional brain connectivity across subjects and/or over time has recently attracted much attention. Principal component analysis (PCA) serves as a fundamental statistical technique for such analyses. However, performing PCA on high-dimensional connectivity matrices yields complicated "eigenconnectivity" patterns, for which systematic interpretation is a challenging issue. Here, we overcome this issue with a novel constrained PCA method for connectivity matrices by extending the idea of the previously proposed orthogonal connectivity factorization method. Our new method, modular connectivity factorization (MCF), explicitly introduces the modularity of brain networks as a parametric constraint on eigenconnectivity matrices. In particular, MCF analyzes the variability in both intra-and inter-module connectivities, simultaneously finding network modules in a principled, data-driven manner. The parametric constraint provides a compact module based visualization scheme with which the result can be intuitively interpreted. We develop an optimization algorithm to solve the constrained PCA problem and validate our method in simulation studies and with a resting-state functional connectivity MRI dataset of 986 subjects. The results show that the proposed MCF method successfully reveals the underlying modular eigenconnectivity patterns in more general situations and is a promising alternative to existing methods.
Published: 2016

32. Fast and Robust Fixed-Point Algorithms for Independent Component Analysis

Author: Hyvarinen, Aapo
Subjects: Principal components analysis -- Methods, Statistics -- Models, Algorithms -- Usage, Business, Computers, Electronics, Electronics and electrical industries
Abstract: Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible. In this paper, we use a combination of two different approaches for linear ICA: Comon's information-theoretic approach and the projection pursuit approach. Using maximum entropy approximations of differential entropy, we introduce a family of new contrast (objective) functions for ICA. These contrast functions enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions. The statistical properties of the estimators based on such contrast functions are analyzed under the assumption of the linear mixture model, and it is shown how to choose contrast functions that are robust and/or of minimum variance. Finally, we introduce simple fixed-point algorithms for practical optimization of the contrast functions. These algorithms optimize the contrast functions very fast and reliably.
Published: 1999

33. DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model

Author: Hoyer, Patrik O., Inazumi, Takanori, Bollen, Kenneth, Washio, Takashi, Hyvarinen, Aapo, Kawahara, Yoshinobu, Sogawa, Yasuhiro, Hoyer, Patrik, and Shimizu, Shohei
Abstract: Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, it was shown that use of non-Gaussianity identifies the full structure of a linear acyclic model, i.e., a causal ordering of variables and their connection strengths, without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering and connection strengths based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model.
Published: 2011
Full Text: View/download PDF

34. Independent component analysis with an inverse problem motivated penalty term

Author: Puuronen, Jouni, primary and Hyvarinen, Aapo, additional
Published: 2015
Full Text: View/download PDF

35. Learning Lp spherical potentials for Markov Random Field models of natural images

Author: Hyvarinen Aapo
Subjects: Random field, Markov random field, Computer science, business.industry, General Neuroscience, Natural (music), Pattern recognition, Artificial intelligence, Markov model, business, Algorithm
Published: 2010

36. Learning Natural Image Structure with a Horizontal Product Model

Author: Hyvarinen Aapo
Subjects: Cellular and Molecular Neuroscience, Developmental Neuroscience, Cognitive Neuroscience, Neuroscience (miscellaneous)
Published: 2009

37. Simultaneous blind separation and clustering of coactivated EEG/MEG sources for analyzing spontaneous brain activity

Author: Hirayama, Jun-ichiro, primary, Ogawa, Takeshi, additional, and Hyvarinen, Aapo, additional
Published: 2014
Full Text: View/download PDF

38. Dynamic connectivity factorization: Interpretable decompositions of non-stationarity

Author: Hyvarinen, Aapo, primary, Hirayama, Junichiro, additional, and Kawanabe, Motoaki, additional
Published: 2014
Full Text: View/download PDF

39. Independent component analysis with an inverse problem motivated penalty term.

Author: Puuronen, Jouni and Hyvarinen, Aapo
Published: 2015
Full Text: View/download PDF

40. Natural image statistics: Energy-based models estimated by score matching

Author: Koster, Urs, primary and Hyvarinen, Aapo, additional
Published: 2009
Full Text: View/download PDF

41. On the learning of nonlinear visual features from natural images by optimizing response energies

Author: Lindgren, Jussi T., primary and Hyvarinen, Aapo, additional
Published: 2008
Full Text: View/download PDF

42. Unsupervised learning of dependencies between local luminance and contrast in natural images

Author: Lindgren, Jussi T., primary, Hurri, Jarmo, additional, and Hyvarinen, Aapo, additional
Published: 2008
Full Text: View/download PDF

43. Equivalence of Some Common Linear Feature Extraction Techniques for Appearance-Based Object Recognition Tasks

Author: Asuncion Vicente, M., primary, Hoyer, Patrik O., additional, and Hyvarinen, Aapo, additional
Published: 2007
Full Text: View/download PDF

44. Learning a selectivity-invariance-selectivity feature extraction architecture for images.

Author: Gutmann, Michael U. and Hyvarinen, Aapo
Abstract: Selectivity and invariance are thought to be important ingredients in biological or artificial visual systems. A fundamental problem is, however, to know what the visual system should be selective to and what to be invariant to. Building a statistical model of images, we learn here a three-layer feature extraction system where the selectivity and invariance emerges from the properties of the images. [ABSTRACT FROM PUBLISHER]
Published: 2012

45. Blind signal separation and independent component analysis

Author: Amari, S.-I, primary, Hyvarinen, Aapo, additional, Lee, Soo-Young, additional, Lee, Te-Won, additional, and Sánchez A, V.David, additional
Published: 2002
Full Text: View/download PDF

46. Blind source separation by nonstationarity of variance: a cumulant-based approach

Author: Hyvarinen, Aapo
Subjects: Signal processing -- Research, Signals and signaling -- Analysis, Business, Computers, Electronics, Electronics and electrical industries
Abstract: Blind separation of source signals usually relies either on the nongaussianity of the signals or on their linear autocorrelations. A third approach was introduced by Matsuoka et al., who showed that source separation can be performed by using the nonstationarity of the signals, in particular the nonstationarity of their variances. In this paper, we show how to interpret the nonstationarity due to a smoothly changing variance in terms of higher order cross-cumulants. This is based on considering the time-correlation of the squares (energies) of the signals and leads to a simple optimization criterion. Using this criterion, we construct a fixed-point algorithm that is computationally very efficient. Index Terms--Blind source separation, cumulants, independent component analysis, nonstationarity, statistical signal processing.
Published: 2001

47. Fast and robust deflationary separation of complex valued signals.

Author: Bingham, Ella and Hyvarinen, Aapo
Published: 2000

48. Independent component analysis by general nonlinear Hebbian-like learning rules

Author: Hyvarinen, Aapo, primary and Oja, Erkki, additional
Published: 1999
Full Text: View/download PDF

49. Emergence of Phase-and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces.

Author: Hyvarinen, Aapo, Hoyer, Patrik, and Mel, Bartlett
Subjects: *INVARIANTS (Mathematics), *INVARIANT subspaces
Abstract: Olshausen and Field (1996) applied the principle of independence maximization by sparse coding to extract features from natural images. This leads to the emergence of oriented linear filters that have simultaneous localization in space and in frequency, thus resembling Gabor functions and simple cell receptive fields. In this article, we show that the same principle of independence maximization can explain the emergence of phase- and shift-invariant features, similar to those found in complex cells. This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces (instead of the independence of simple linear filter outputs). The norms of the projections on such "independent feature subspaces" then indicate the values of invariant features. [ABSTRACT FROM AUTHOR]
Published: 2000
Full Text: View/download PDF

50. A fast fixed-point algorithm for independent component analysis.

Author: Hyvarinen, Aapo and Oja, Erkki
Subjects: *FIXED point theory
Abstract: Introduces a novel fast algorithm for independent component analysis, which can be used for blind source separation and feature extraction. Transformation of a neural network learning rule into a fixed-point iteration; Introduction and analysis of the algorithm; Review of kurtosis minimization-maximization and its relation to neural network type learning rules.
Published: 1997
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

91 results on '"Hyvarinen, Aapo"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources