1. Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings
- Author
-
Kathleen R. McKeown, Muthu Kumar Chandrasekaran, Kailash Karthik Saravanakumar, and Miguel Ballesteros
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer science ,Event (computing) ,Computer Science - Artificial Intelligence ,I.2.7 ,Linear classifier ,computer.software_genre ,Similitude ,Computer Science - Information Retrieval ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence (cs.AI) ,Similarity (network science) ,Classifier (linguistics) ,Data mining ,Cluster analysis ,Adaptation (computer science) ,computer ,Computation and Language (cs.CL) ,Information Retrieval (cs.IR) ,Transformer (machine learning model) - Abstract
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents., Comment: To appear in Proceedings of The 16th Conference of the European Chapter of the Association for Computational Linguistics
- Published
- 2021
- Full Text
- View/download PDF