Back to Search Start Over

Using Dynamic Pruned N-Gram Model for Identifying the Gender of the User.

Authors :
Ali, Noaman M.
Alshahrani, Abdullah
Alghamdi, Ahmed M.
Novikov, Boris
Source :
Applied Sciences (2076-3417); Jul2022, Vol. 12 Issue 13, pN.PAG-N.PAG, 16p
Publication Year :
2022

Abstract

Organizations analyze customers' personal data to understand and model their behavior. Identifying customers' gender is a significant factor in analyzing markets that help plan the promotional campaigns, determine target customers and provide relevant offers. Several techniques were developed to analyze different types of data, including text, image, speech, and biometrics, to identify the gender of the user. The method of synthesis of the profile name differs from one customer to another. Using numerical substitutions of specific letters, known as Leet language, impedes the gender identification task. Moreover, using acronyms, misspellings, and adjacent names impose additional challenges. Towards this goal, this work uses the customers' profile names associated with submitted reviews to recognize the customers' gender. First, we create datasets of profile names extracted from the customers' reviews. Secondly, we introduce a dynamic pruned n-gram model for identifying the gender of the user. It starts with data segmentation to handle adjacent parts, followed by data conversion and cleaning to fix the use of Leet language. Feature selection through a dynamic pruned n-gram model is the next step with the recurrent misspelling correction using fuzzy matching. We evaluate the proposed approach on the real data collected from active web resources. The obtained results demonstrate its validity and reliability. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
12
Issue :
13
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
157914741
Full Text :
https://doi.org/10.3390/app12136378