Back to Search
Start Over
A general framework for association analysis of heterogeneous data
- Source :
- Ann. Appl. Stat. 12, no. 3 (2018), 1700-1726
- Publication Year :
- 2018
- Publisher :
- Institute of Mathematical Statistics, 2018.
-
Abstract
- Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional data with continuous measurements. Motivated by the Computer Audition Lab 500-song (CAL500) music annotation study, we develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two data sets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is devised for model fitting, and several variants are developed to expedite computation and achieve variable selection. The application to the CAL500 data sheds light on the relationship between acoustic features and semantic annotations, and provides effective means for automatic music annotation and retrieval.
- Subjects :
- FOS: Computer and information sciences
Statistics and Probability
Computer science
Association (object-oriented programming)
joint and individual structure
Exponential family
Feature selection
computer.software_genre
01 natural sciences
matrix decomposition
Matrix decomposition
Methodology (stat.ME)
Iteratively reweighted least squares
010104 statistics & probability
Permutation
inter-battery factor analysis
0101 mathematics
Statistics - Methodology
association coefficient
Computer audition
010401 analytical chemistry
0104 chemical sciences
generalized linear model
Modeling and Simulation
Data mining
Statistics, Probability and Uncertainty
computer
Random variable
Subjects
Details
- ISSN :
- 19326157
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- The Annals of Applied Statistics
- Accession number :
- edsair.doi.dedup.....8d17a0be9670f48c1a74b9499400be58
- Full Text :
- https://doi.org/10.1214/17-aoas1127