201. Unsupervised Contextual Keyword Relevance Learning and Measurement using PLSA
- Author
-
Dalou Kalaivendhan, M. Venkateswarlu, and S. Sudarsun
- Subjects
Text corpus ,Probabilistic latent semantic analysis ,business.industry ,Computer science ,Document classification ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Probabilistic logic ,Context (language use) ,computer.software_genre ,Weighting ,ComputingMethodologies_PATTERNRECOGNITION ,Unsupervised learning ,Relevance (information retrieval) ,Artificial intelligence ,Data mining ,business ,computer ,Natural language processing - Abstract
In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification.
- Published
- 2006
- Full Text
- View/download PDF