Start Over

Fused adjacency matrices to enhance information extraction: The beer benchmark.

Authors :: Cavallini, Nicola
Savorani, Francesco
Bro, Rasmus
Cocchi, Marina
Source :: Analytica Chimica Acta. Jul2019, Vol. 1061, p70-83. 14p.
Publication Year :: 2019
Abstract: Abstract Multivariate exploratory data analysis allows revealing patterns and extracting information from complex multivariate data sets. However, highly complex data may not show evident groupings or trends in the principal component space, e.g. because the variation of the variables are not grouped but rather continuous. In these cases, classical exploratory methods may not provide satisfactory results when the aim is to find distinct groupings in the data. To enhance information extraction in such situations, we propose a novel approach inspired by the concept of combining weak classifiers, but in the unsupervised context. The approach is based on the fusion of several adjacency matrices obtained by different distance measures on data from different analytical platforms. This paper is intended to present and discuss the potential of the approach through a benchmark data set of beer samples. The beer data were acquired using three spectroscopic techniques: Visible, near-Infrared and Nuclear Magnetic Resonance. The results of fusing the three data sets via the proposed approach are compared with those from the single data blocks (Visible, NIR and NMR) and from a standard mid-level data fusion methodology. It is shown that, with the suggested approach, groupings related to beer style and other features are efficiently recovered, and generally more evident. Graphical abstract Image 1 Highlights • A new approach to enhance information extraction from highly complex datasets is proposed. • The approach is based on the fusion of adjacency matrices obtained from different clustering strategies. • Information extracted from different data blocks is fused, so the approach can also be a method for high-level data fusion. • Visible, NIR and NMR data of beer samples are used as a benchmark for testing the approach. • The approach can highlight groups in a better way than the single-block and mid-level data-fusion approaches. [ABSTRACT FROM AUTHOR]