Back to Search
Start Over
An Empirical Study on Authorship Verification for Low Resource Language Using Hyper-Tuned CNN Approach
- Source :
- IEEE Access, Vol 11, Pp 80403-80415 (2023)
- Publication Year :
- 2023
- Publisher :
- IEEE, 2023.
-
Abstract
- Authorship verification is a crucial process employed to determine the authorship of a given text by analyzing distinct aspects of the writer’s style, such as vocabulary, syntax, and punctuation. This process has gained significant research attention in various domains, including intellectual property rights, plagiarism detection, cybercrime investigations, copyright infringement, and forensics. While extensive studies have been conducted on multiple languages worldwide, encompassing Western European languages like Italian and Spanish, as well as Asian languages such as Bengali and Chinese, the investigation of authorship verification in Urdu has been comparatively limited, despite its status as a prominent South Asian language. This limitation can be attributed to the intricate and distinctive morphology of Urdu, which necessitates specific methodologies that cannot be directly applied in the same manner as other languages. To bridge this gap, we propose an innovative approach for authorship verification in Urdu, leveraging Convolutional Neural Networks (CNNs) with three distinct hyper-tuned parameters: ADAM, SGD, and RMSProp. To facilitate the development of this approach, we have curated a new corpus called UAVC-22, specifically tailored for Urdu authorship verification. This corpus offers enhanced robustness in terms of authors’ classes and unique words. We have developed 9 authorship verification models, utilizing three different text embedding techniques, namely Word2Vec, GloVe, and FastText, we have performed a comparative analysis with traditional machine learning models such as Support Vector Machines (SVM) and Random Forest to assess the superiority and efficacy of the CNN-based approach. The optimized CNN-ADAM model with FastText achieved the highest accuracy of 98% for the Urdu dataset UAVC-22.
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 11
- Database :
- Directory of Open Access Journals
- Journal :
- IEEE Access
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.72671f0915b0435c8c692403f3ce40a4
- Document Type :
- article
- Full Text :
- https://doi.org/10.1109/ACCESS.2023.3299565