Back to Search
Start Over
Sentence Classification Using N-Grams in Urdu Language Text
- Source :
- Scientific Programming, Vol 2021 (2021)
- Publication Year :
- 2021
- Publisher :
- Hindawi Limited, 2021.
-
Abstract
- The usage of local languages is being common in social media and news channels. The people share the worthy insights about various topics related to their lives in different languages. A bulk of text in various local languages exists on the Internet that contains invaluable information. The analysis of such type of stuff (local language’s text) will certainly help improve a number of Natural Language Processing (NLP) tasks. The information extracted from local languages can be used to develop various applications to add new milestone in the field of NLP. In this paper, we presented an applied research task, “multiclass sentence classification for Urdu language text at sentence level existing on the social networks, i.e., Twitter, Facebook, and news channels by using N-grams features.” Our dataset consists of more than 1,00000 instances of twelve (12) different types of topics. A famous machine learning classifier Random Forest is used to classify the sentences. It showed 80.15%, 76.88%, and 64.41% accuracy for unigram, bigram, and trigram features, respectively.
- Subjects :
- Learning classifier system
Article Subject
business.industry
Computer science
Bigram
computer.software_genre
language.human_language
Computer Science Applications
QA76.75-76.765
language
The Internet
Social media
Trigram
Computer software
Local language
Urdu
Artificial intelligence
business
computer
Software
Natural language processing
Sentence
Subjects
Details
- ISSN :
- 1875919X and 10589244
- Volume :
- 2021
- Database :
- OpenAIRE
- Journal :
- Scientific Programming
- Accession number :
- edsair.doi.dedup.....fe6be4684cd459af3b4e51d0a0d44dfd