Back to Search Start Over

Machine-Learning-Based Gender Distribution Prediction from Anonymous News Comments: The Case of Korean News Portal.

Authors :
Suh, Jong Hwan
Source :
Sustainability (2071-1050); Aug2022, Vol. 14 Issue 16, p9939-9939, 17p
Publication Year :
2022

Abstract

Anonymous news comment data from a news portal in South Korea, naver.com, can help conduct gender research and resolve related issues for sustainable societies. Nevertheless, only a small portion of gender information (i.e., gender distribution) is open to the public, and therefore, it has rarely been considered for gender research. Hence, this paper aims to resolve the matter of incomplete gender information and make the anonymous news comment data usable for gender research as new social media big data. This paper proposes a machine-learning-based approach for predicting the gender distribution (i.e., male and female rates) of anonymous news commenters for a news article. Initially, the big data of news articles and their anonymous news comments were collected and divided into labeled and unlabeled datasets (i.e., with and without gender information). The word2vec approach was employed to represent a news article by the characteristics of the news comments. Then, using the labeled dataset, various prediction techniques were evaluated for predicting the gender distribution of anonymous news commenters for a labeled news article. As a result, the neural network was selected as the best prediction technique, and it could accurately predict the gender distribution of anonymous news commenters of the labeled news article. Thus, this study showed that a machine-learning-based approach can overcome the incomplete gender information problem of anonymous social media users. Moreover, when the gender distributions of the unlabeled news articles were predicted using the best neural network model, trained with the labeled dataset, their distribution turned out different from the labeled news articles. The result indicates that using only the labeled dataset for gender research can result in misleading findings and distorted conclusions. The predicted gender distributions for the unlabeled news articles can help to better understand anonymous news commenters as humans for sustainable societies. Eventually, this study provides a new way for data-driven computational social science with incomplete and anonymous social media big data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20711050
Volume :
14
Issue :
16
Database :
Complementary Index
Journal :
Sustainability (2071-1050)
Publication Type :
Academic Journal
Accession number :
158946982
Full Text :
https://doi.org/10.3390/su14169939