51. End-to-end Sinkhorn Autoencoder with noise generator
- Author
-
Przemsysaw Spurek, Jan Dubinski, Piotr Nowak, Tomasz Trzcinski, Kamil Rafal Deja, and Sandro Christian Wenzel
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,General Computer Science ,Computer science ,Monte Carlo method ,cs.LG ,Machine Learning (stat.ML) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Data modeling ,Machine Learning (cs.LG) ,Noise generator ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,computer simulation ,General Materials Science ,Mathematical Physics and Mathematics ,0105 earth and related environmental sciences ,Artificial neural network ,business.industry ,generative modeling ,General Engineering ,Computer simulation ,Autoencoder ,stat.ML ,Computing and Computers ,machine learning ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Noise (video) ,Artificial intelligence ,business ,lcsh:TK1-9971 ,computer ,MNIST database - Abstract
In this work, we propose a novel end-to-end Sinkhorn Autoencoder with a noise generator for efficient data collection simulation. Simulating processes that aim at collecting experimental data is crucial for multiple real-life applications, including nuclear medicine, astronomy, and high energy physics. Contemporary methods, such as Monte Carlo algorithms, provide high-fidelity results at a price of high computational cost. Multiple attempts are taken to reduce this burden, e.g. using generative approaches based on Generative Adversarial Networks or Variational Autoencoders. Although such methods are much faster, they are often unstable in training and do not allow sampling from an entire data distribution. To address these shortcomings, we introduce a novel method dubbed end-to-end Sinkhorn Autoencoder, that leverages the Sinkhorn algorithm to explicitly align distribution of encoded real data examples and generated noise. More precisely, we extend autoencoder architecture by adding a deterministic neural network trained to map noise from a known distribution onto autoencoder latent space representing data distribution. We optimise the entire model jointly. Our method outperforms co mpeting approaches on a challenging dataset of simulation data from Zero Degree Calorimeters of ALICE experiment in LHC. as well as standard benchmarks, such as MNIST and CelebA. In this work, we propose a novel end-to-end sinkhorn autoencoder with noise generator for efficient data collection simulation. Simulating processes that aim at collecting experimental data is crucial for multiple real-life applications, including nuclear medicine, astronomy and high energy physics. Contemporary methods, such as Monte Carlo algorithms, provide high-fidelity results at a price of high computational cost. Multiple attempts are taken to reduce this burden, e.g. using generative approaches based on Generative Adversarial Networks or Variational Autoencoders. Although such methods are much faster, they are often unstable in training and do not allow sampling from an entire data distribution. To address these shortcomings, we introduce a novel method dubbed end-to-end Sinkhorn Autoencoder, that leverages sinkhorn algorithm to explicitly align distribution of encoded real data examples and generated noise. More precisely, we extend autoencoder architecture by adding a deterministic neural network trained to map noise from a known distribution onto autoencoder latent space representing data distribution. We optimise the entire model jointly. Our method outperforms competing approaches on a challenging dataset of simulation data from Zero Degree Calorimeters of ALICE experiment in LHC. as well as standard benchmarks, such as MNIST and CelebA.
- Published
- 2021