Back to Search
Start Over
Large-scale Vietnamese point-of-interest classification using weak labeling.
- Source :
-
Frontiers in artificial intelligence [Front Artif Intell] 2022 Dec 09; Vol. 5, pp. 1020532. Date of Electronic Publication: 2022 Dec 09 (Print Publication: 2022). - Publication Year :
- 2022
-
Abstract
- Point-of-Interests (POIs) represent geographic location by different categories (e.g., touristic places, amenities, or shops) and play a prominent role in several location-based applications. However, the majority of POIs category labels are crowd-sourced by the community, thus often of low quality. In this paper, we introduce the first annotated dataset for the POIs categorical classification task in Vietnamese. A total of 750,000 POIs are collected from WeMap, a Vietnamese digital map. Large-scale hand-labeling is inherently time-consuming and labor-intensive, thus we have proposed a new approach using weak labeling. As a result, our dataset covers 15 categories with 275,000 weak-labeled POIs for training, and 30,000 gold-standard POIs for testing, making it the largest compared to the existing Vietnamese POIs dataset. We empirically conduct POI categorical classification experiments using a strong baseline (BERT-based fine-tuning) on our dataset and find that our approach shows high efficiency and is applicable on a large scale. The proposed baseline gives an F1 score of 90% on the test dataset, and significantly improves the accuracy of WeMap POI data by a margin of 37% (from 56 to 93%).<br />Competing Interests: Author LH was employed by FIMO. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.<br /> (Copyright © 2022 Tran, Le, Pham, Luu and Bui.)
Details
- Language :
- English
- ISSN :
- 2624-8212
- Volume :
- 5
- Database :
- MEDLINE
- Journal :
- Frontiers in artificial intelligence
- Publication Type :
- Academic Journal
- Accession number :
- 36568578
- Full Text :
- https://doi.org/10.3389/frai.2022.1020532