1. OUP accepted manuscript
- Author
-
Junhui Wang, Xiaotong Shen, Annie Qu, and Yiwen Sun
- Subjects
Statistics and Probability ,Multi-label classification ,Similarity (geometry) ,Applied Mathematics ,General Mathematics ,Computation ,Construct (python library) ,computer.software_genre ,Agricultural and Biological Sciences (miscellaneous) ,Scalability ,Feature (machine learning) ,Key (cryptography) ,Pairwise comparison ,Data mining ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,computer ,Mathematics - Abstract
SummaryAutomatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.
- Published
- 2017
- Full Text
- View/download PDF