1. Gender classification based on linguistic analysis: A review.
- Author
-
Ali, Haneen Tamim Abd and Nasrawi, Dhamyaa A.
- Subjects
- *
LINGUISTICS , *LINGUISTIC analysis , *DEEP learning , *PORTUGUESE language , *CHINESE language - Abstract
Gender classification refers to the process of categorizing individuals into one of two gender categories: male or female, typically based on observable characteristics or information. This classification can be done through various methods, including biological and social. In recent years, gender classification has become a topic of increasing interest and debate due to evolving societal understandings of gender. The current survey will study the connection between language use and gender to categorize gender automatically based on text and linguistic style. It provides an in-depth analysis of gender classification based on linguistic patterns in written text. It explores the relationship between linguistic patterns and gender classification, highlighting the various approaches, challenges, and future directions in this field. It also covers various datasets that classify people by gender, including official papers, emails, and social media messages. This survey divides the selected studies into three parts: handwritten, names, and text. However, the most focused part is text based on linguistic analysis. The findings show that the most used dataset is Twitter. Many studies use English, Arabic, and other languages such as Portuguese, Chinese, Spanish, Russian, Brazilian, and German. Moreover, the feature frequently used in studies is the Bag of Words (BOW). Also, the methodology used in many studies is machine learning techniques; however, few use deep learning. Finally, the important metrics are accuracy and F1-score. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF