Back to Search Start Over

Gender classification based on linguistic analysis: A review.

Authors :
Ali, Haneen Tamim Abd
Nasrawi, Dhamyaa A.
Source :
AIP Conference Proceedings. 2024, Vol. 3220 Issue 1, p1-14. 14p.
Publication Year :
2024

Abstract

Gender classification refers to the process of categorizing individuals into one of two gender categories: male or female, typically based on observable characteristics or information. This classification can be done through various methods, including biological and social. In recent years, gender classification has become a topic of increasing interest and debate due to evolving societal understandings of gender. The current survey will study the connection between language use and gender to categorize gender automatically based on text and linguistic style. It provides an in-depth analysis of gender classification based on linguistic patterns in written text. It explores the relationship between linguistic patterns and gender classification, highlighting the various approaches, challenges, and future directions in this field. It also covers various datasets that classify people by gender, including official papers, emails, and social media messages. This survey divides the selected studies into three parts: handwritten, names, and text. However, the most focused part is text based on linguistic analysis. The findings show that the most used dataset is Twitter. Many studies use English, Arabic, and other languages such as Portuguese, Chinese, Spanish, Russian, Brazilian, and German. Moreover, the feature frequently used in studies is the Bag of Words (BOW). Also, the methodology used in many studies is machine learning techniques; however, few use deep learning. Finally, the important metrics are accuracy and F1-score. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
3220
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
180170115
Full Text :
https://doi.org/10.1063/5.0234674