1. Bayesian mean-parameterized nonnegative binary matrix factorization
- Author
-
Alberto Lumbreras, Cédric Févotte, Louis Filstroff, Criteo AI Lab, Criteo [Paris], Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Centre National de la Recherche Scientifique (CNRS), and ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Networks and Communications ,Computer science ,Posterior probability ,Machine Learning (stat.ML) ,02 engineering and technology ,[STAT.OT]Statistics [stat]/Other Statistics [stat.ML] ,Data matrix (multivariate statistics) ,Machine Learning (cs.LG) ,Non-negative matrix factorization ,symbols.namesake ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Factorization ,Statistics - Machine Learning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Logical matrix ,Missing data ,Computer Science Applications ,Binary data ,symbols ,020201 artificial intelligence & image processing ,Algorithm ,Information Systems ,Gibbs sampling - Abstract
International audience; Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonneg-ative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the [0, 1] range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parameterization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonpara-metric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.
- Published
- 2020