Back to Search
Start Over
Postal address extraction from the web: a comprehensive survey.
- Source :
- Artificial Intelligence Review; Feb2022, Vol. 55 Issue 2, p1085-1120, 36p
- Publication Year :
- 2022
-
Abstract
- The Web is a source of information for Location-Based Service (LBS) applications. These applications lack postal addresses for the user's Point of Interests (POIs) such as schools, hospitals, restaurants, etc., as these locations are annotated manually by using the yellow pages or by the location owners (users/companies). Our study in this paper confirms that Google Maps, a common LBS application, only contains about 32.5 % of the public schools that are registered officially in the documents provided by the Directorate of Education in Egypt. However, the remaining missed school addresses could be fished from the Web (e.g., social media). To the best of our knowledge, no prior survey has been published to compare the previous Web postal address extraction approaches. Additionally, all proposed approaches for address extraction are local (could be working in specific countries/locations with particular languages) and could not be used or even adapted to work in other countries/locations with other languages. Furthermore, the problem of Web postal address extraction is not addressed in many countries such as Arab countries (e.g. Egypt). This paper discusses the issue of address extraction, highlights and compares the recently used techniques in extracting addresses from Web pages. In addition, it investigates the discrepancy of knowledge among existing systems. Moreover, it provides a comprehensive review of the geographical Gazetteers used in the Web postal address approaches and compares their data quality dimensions. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02692821
- Volume :
- 55
- Issue :
- 2
- Database :
- Complementary Index
- Journal :
- Artificial Intelligence Review
- Publication Type :
- Academic Journal
- Accession number :
- 155185698
- Full Text :
- https://doi.org/10.1007/s10462-021-09983-1