Back to Search Start Over

More data trumps smarter algorithms: comparing pointwise mutual information with latent semantic analysis.

Authors :
Recchia G
Jones MN
Source :
Behavior research methods [Behav Res Methods] 2009 Aug; Vol. 41 (3), pp. 647-56.
Publication Year :
2009

Abstract

Computational models of lexical semantics, such as latent semantic analysis, can automatically generate semantic similarity measures between words from statistical redundancies in text. These measures are useful for experimental stimulus selection and for evaluating a model's cognitive plausibility as a mechanism that people might use to organize meaning in memory. Although humans are exposed to enormous quantities of speech, practical constraints limit the amount of data that many current computational models can learn from. We follow up on previous work evaluating a simple metric of pointwise mutual information. Controlling for confounds in previous work, we demonstrate that this metric benefits from training on extremely large amounts of data and correlates more closely with human semantic similarity ratings than do publicly available implementations of several more complex models. We also present a simple tool for building simple and scalable models from large corpora quickly and efficiently.

Details

Language :
English
ISSN :
1554-351X
Volume :
41
Issue :
3
Database :
MEDLINE
Journal :
Behavior research methods
Publication Type :
Academic Journal
Accession number :
19587174
Full Text :
https://doi.org/10.3758/BRM.41.3.647