43 results on '"Stemmer"'
Search Results
2. Classification of Offensive Tweet in Marathi Language Using Machine Learning Models
- Author
-
Kumari, Archana, Garge, Archana, Raj, Priyanshu, Kumar, Gunjan, Singh, Jyoti Prakash, Alryalat, Mohammad, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Dasgupta, Kousik, editor, Mukhopadhyay, Somnath, editor, Mandal, Jyotsna K., editor, and Dutta, Paramartha, editor
- Published
- 2024
- Full Text
- View/download PDF
3. Building a Multilevel Inflection Handling Stemmer to Improve Search Effectiveness for Urdu Language
- Author
-
Abdul Jabbar, Sajid Iqbal, Abdullah Abdulrhman Alaulamie, and Manzoor Ilahi
- Subjects
Stemmer ,information retrieval ,Urdu stemmer ,lemmatizer ,natural language processing ,text mining ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Stemming is an essential step in various Natural Language Processing (NLP) applications and is used to reduce different variants of the query words to a standard form to avoid the vocabulary mismatch issue in Information Retrieval (IR) systems. Due to specific grammatical rules and complex morphological structures, finding an effective stemming algorithm in Urdu is a challenging task. Although, several stemming algorithms have been proposed for the Urdu text stemming; however, none of them extract the stem from multilevel inflected forms. In this context, according to the best of our knowledge, this is a first effort towards the proposition and evaluation of a novel Urdu Text Stemmer (UTS) that can deal with multi-level inflection forms in Urdu text. The experimental evaluation of the proposed scheme has been conducted on the text-based and word-based custom-developed corpus. The proposed stemming technique is rigorously evaluated and compared with state-of-the-art stemming algorithms. Experimental results demonstrate that UTS outperforms existing Urdu stemmers and achieves an accuracy of 94.92% and 91.8% on word corpus and text corpus, respectively. We also evaluated our proposed system in an Information Retrieval application for Urdu, using the Collection for Urdu Retrieval Evaluation (CURE) dataset. Our approach for information retrieval outperformed and improved both recall and precision metrics.
- Published
- 2024
- Full Text
- View/download PDF
4. From the Jones-Plug to the Amphora: Could Stemmer´s Theory of Language Acquisition Complement Skinner´s Theory of Listener Behavior?
- Author
-
Laporte, Fábio Freire and de Melo, Raquel Maria
- Published
- 2024
- Full Text
- View/download PDF
5. Develop a Marathi Lemmatizer for Common Nouns and Simple Tenses of Verbs
- Author
-
Kadam, Deepali Prakash, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Das, Swagatam, editor, Saha, Snehanshu, editor, Coello Coello, Carlos A., editor, and Bansal, Jagdish Chand, editor
- Published
- 2023
- Full Text
- View/download PDF
6. Detection of Spam in SMS Using Machine Learning Algorithms
- Author
-
Terli, Niharika, Chintakayala, Pavan, Angaluri, Venu Madhavi, Sodagudi, Suhasini, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So-In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2023
- Full Text
- View/download PDF
7. Multi Rule-based and Corpus-based for Sundanese Stemmer
- Author
-
Ade Sutedi, Muhammad Rikza Nasrulloh, and Rickard Elsen
- Subjects
corpus-based ,multi rule-based ,stemmer ,sundanese ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.
- Published
- 2022
- Full Text
- View/download PDF
8. Cybercrime Detection Using Live Sentiment Analysis
- Author
-
Gambhir, Balvinder Singh, Habibkar, Jatin, Sohrot, Anjesh, Dhumal, Rashmi, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Gupta, Deepak, editor, Goswami, Rajat Subhra, editor, Banerjee, Subhasish, editor, Tanveer, M., editor, and Pachori, Ram Bilas, editor
- Published
- 2022
- Full Text
- View/download PDF
9. Inflectional and Derivational Hybrid Stemmer for Sentiment Analysis: A Case Study with Marathi Tweets
- Author
-
Patil, Rupali S., Kolhe, Satish R., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Santosh, KC, editor, Hegadi, Ravindra, editor, and Pal, Umapada, editor
- Published
- 2022
- Full Text
- View/download PDF
10. Designing Stemmer for Afaraf Text Using Rule Based Approach
- Author
-
Ebrahim, Kelil Ali, Saidhbi, Shaik, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Saini, H. S., editor, Sayal, Rishi, editor, Govardhan, A., editor, and Buyya, Rajkumar, editor
- Published
- 2022
- Full Text
- View/download PDF
11. Sanskrit Stemmer Design: A Literature Perspective
- Author
-
Nair, Jayashree, Nair, Sooraj S., Abhishek, U., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Khanna, Ashish, editor, Gupta, Deepak, editor, Bhattacharyya, Siddhartha, editor, Hassanien, Aboul Ella, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
- Published
- 2022
- Full Text
- View/download PDF
12. Effect of Stemming on Hindi Text Classification.
- Author
-
Pimpalshende, Anjusha, Singh, Preety, and Potnurwar, Archana
- Subjects
TEXT summarization ,INFORMATION retrieval ,ORAL communication ,ELECTRONIC records ,SUFFIXES & prefixes (Grammar) ,TEXT processing (Computer science) ,PARSING (Computer grammar) - Abstract
Text classification is very useful to search large amount of textual data available online by dividing it into smaller relevant units. Now a day's large amount of digital documents are available in Indian languages. Designing text classifiers in Indian languages is one of the research areas so that people can search and read required documents in their local languages. In proposed work tried to design Text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers. Stemming is a process to convert words in any language to its base or root words. Stemmers are used for written documents not for spoken languages. Performance of many applications such as text summarization, Information Retrieval (IR) system, text classification systems, syntactic parsing can be improved by applying stemmers. Stemmer eliminates suffix or prefix of the word and form original root word. These root words helps in the preprocessing step required in many algorithms. We applied various stemmers on Hindi text classification models. Experiments and results show that performance of the classifiers is improved by applying stemmers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
13. Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization
- Author
-
Suyanto Suyanto, Andi Sunyoto, Rezza Nafi Ismail, Ema Rachmawati, and Warih Maharani
- Subjects
grapheme-to-phoneme conversion ,Indonesian language ,n-gram ,Phonotactic rules ,Stemmer ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
A phonemicization or grapheme-to-phoneme conversion (G2P) is a process of converting a word into its pronunciation. It is one of the essential components in speech synthesis, speech recognition, and natural language processing. The deep learning (DL)-based state-of-the-art G2P model generally gives low phoneme error rate (PER) as well as word error rate (WER) for high-resource languages, such as English and European, but not for low-resource languages. Therefore, some conventional machine learning (ML)-based G2P models incorporated with specific linguistic knowledge are preferable for low-resource languages. However, these models are poor for several low-resource languages because of various issues. For instance, an Indonesian G2P model works well for roots but gives a high PER for derivatives. Most errors come from the ambiguities of some roots and derivative words containing four prefixes: 〈ber〉, 〈meng〉, 〈peng〉, and 〈ter〉. In this research, an Indonesian G2P model based on n-gram combined with stemmer and phonotactic rules (NGTSP) is proposed to solve those problems. An investigation based on 5-fold cross-validation, using 50 k Indonesian words, informs that the proposed NGTSP gives a much lower PER of 0.78% than the state-of-the-art Transformer-based G2P model (1.14%). Besides, it also provides a much faster processing time.
- Published
- 2022
- Full Text
- View/download PDF
14. Rule Based Part of Speech Tagger for Arabic Question Answering System
- Author
-
Al-azani, Samah Ali, Namrata Mahender, C., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Bindhu, V., editor, Tavares, João Manuel R. S., editor, Boulogeorgos, Alexandros-Apostolos A., editor, and Vuppalapati, Chandrasekar, editor
- Published
- 2021
- Full Text
- View/download PDF
15. Designing and Development of Stemmer of Dogri Using Unsupervised Learning
- Author
-
Gupta, Parul, Jamwal, Shubhnandan S., Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Marriwala, Nikhil, editor, Tripathi, C. C, editor, Jain, Shruti, editor, and Mathapathi, Shivakumar, editor
- Published
- 2021
- Full Text
- View/download PDF
16. Information Retrieval Based on Telugu Cross-Language Transliteration
- Author
-
Narla, Swapna, Koppula, Vijaya Kumar, SuryaNarayana, G., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Swain, Debabala, editor, Pattnaik, Prasant Kumar, editor, and Athawale, Tushar, editor
- Published
- 2021
- Full Text
- View/download PDF
17. Using Natural Language Processing to Translate Plain Text into Pythonic Syntax in Kannada
- Author
-
Rao, Vinay, G. B., Sanjana, Guntnur, Sundar, N., Navya Priya, Reddy, Sanjana, K. R., Pavan, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Arai, Kohei, editor, Kapoor, Supriya, editor, and Bhatia, Rahul, editor
- Published
- 2021
- Full Text
- View/download PDF
18. Linguistic analyzer and its types
- Author
-
Kakhramonovna, Gulyamova Shakhnoza
- Published
- 2021
- Full Text
- View/download PDF
19. NLIIRS: A Question and Answering System for Unstructured Information Using Annotated Text Segment Comparison
- Author
-
Sarathy, Banerjee Partha, Baisakhi, Chakraborty, Deepak, Tripathi, Hardik, Gupta, Kumar, Sourabh S., Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Nath, Vijay, editor, and Mandal, J. K., editor
- Published
- 2020
- Full Text
- View/download PDF
20. Stemmer og stereotyper: Det komplekse samspil mellem kønnede og statiske og dynamiske markører i stemmens etostildeling
- Author
-
Fisker, Thore Keitum, Berg, Kristine Marie, Fjord, Agnete Bastrup, Fisker, Thore Keitum, Berg, Kristine Marie, and Fjord, Agnete Bastrup
- Abstract
English abstract: This thesis examines the signifigance of gendered vocal cues in the attribution of ethos in voices. Drawing upon the masculinity hypothesis and arousal hypothesis as delinated in Alice Zoghaib (2019) it investigates the complex interplay between dynamic and static vocal cues of the voice and their relation to gender, gendered cues in the attribution of ethos. The masculinity hypothesis suggests that voices perceived as low, dull, and smooth are associated with more masculine traits, enhancing perceptions of competence and attitude towards the speaker but decreasing warmth, regardless of the speaker's gender. Conversely, the arousal hypothesis posits that high, bright, and rough voices generate more energy and therefore similarly affecting perceptions of competence and attitude but also reducing warmth. Furthermore it integrates and build upon the results of Vestgård Sørensen (2010). Utilizing a combination of empirical data gathered through focus group discussions and theoretical analysis this thesis highlight how ethos based on the sound of voices are influenced by both vocal qualities and gendered expectations. Additionally it suggests that static cues are not adequate to describe this proces. The findings underscore the nuanced ways in which voice contributes to rhetorical effectiveness and the perpetuation of gender stereotypes in communication contexts.
- Published
- 2024
21. Simultaneous Removal of Prefix and Suffix
- Author
-
Pawan Tamta and B. P. Pande
- Subjects
information retrieval (ir) ,stemmer ,conflation ,n-gram ,potential-stem ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.
- Published
- 2020
- Full Text
- View/download PDF
22. DGMS: Dataset Generator Based on Malay Stemmer Algorithm
- Author
-
Abdullah, Zailani, Mohamad, Siti Zaharah, Zulkifli, Norul Syazawini, Herawan, Tutut, Hamdan, Abdul Razak, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martin, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Abawajy, Jemal H., editor, Othman, Mohamed, editor, Ghazali, Rozaida, editor, Deris, Mustafa Mat, editor, Mahdin, Hairulnizam, editor, and Herawan, Tutut, editor
- Published
- 2019
- Full Text
- View/download PDF
23. Efficient Mining of Positive and Negative Itemsets Using K-Means Clustering to Access the Risk of Cancer Patients
- Author
-
Asha, Pandian, Albert Mayan, J., Canessane, Aroul, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Zelinka, Ivan, editor, Senkerik, Roman, editor, Panda, Ganapati, editor, and Lekshmi Kanthan, Padma Suresh, editor
- Published
- 2018
- Full Text
- View/download PDF
24. Development of Stemmer for Afar-af text: A Hybrid Approach.
- Author
-
Ebrahim, Kelil Ali
- Subjects
MORPHOLOGY (Grammar) ,WORD formation (Grammar) ,AUTOMATIC speech recognition ,INFLECTION (Grammar) ,EMPLOYEE reviews ,NATURAL language processing ,SEARCH engines - Abstract
Utmost natural language processing systems practices stemmer as a distinct module in their architecture. Specially, it is crucial for developing, machine translator, speech recognizer and search engines. In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their root, stem or base form. In this article, a stemming system for Afar-af is presented. This system takes as input a word/terms and removes its affixes (suffix, prefix) rendering to a rule based algorithm. This stemmer is not adequate to describe every rule applied in Afar-af word formation. Consequently, N-gram is combined with the rule to handle cases that are not covered by rule in the hybrid approach of this stemmer. The algorithm follows the well-known Porter algorithm for the English language and it is advanced according to the grammatical rules of the Afar, language. Afar-af morphology was studied and defined in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed Afar-af words are very rich in morphology and requires an operative stemming algorithm, which can regulate diverse morphological arrangements that are associated with words. An evaluation of the system indicates that the algorithms accuracy works with better performance than other earlier stemming algorithms for Afar-af giving accuracy of 98.73 percent. Furthermore, Possible extensions of the planned work and advance evaluation approaches are briefly reviewed. [ABSTRACT FROM AUTHOR]
- Published
- 2021
25. A RULE-BASED STEMMER FOR PUNJABI ADJECTIVES.
- Author
-
Kaur, Harmanjeet and Buttar, Preetpal Kaur
- Subjects
ADJECTIVES (Grammar) ,NOUNS ,DATABASES ,LANGUAGE & languages ,ALGORITHMS - Abstract
This research work is concerned with the development of a rule-based stemmer for stemming of adjectives in the Punjabi language. Stemming is a method of deriving the root word from the inflected word. The proposed Punjabi Adjective Stemmer (PAS) uses a rule-based approach for converting the inflected Punjabi adjectives to their root forms. A database containing valid root adjectives occurring in the Punjabi language has been created. This database stores 1,762 Punjabi root adjectives. When an adjective word is fed to PAS as an input, first it compares the input word with the root database to determine whether the input adjective is a root adjective or an inflected one. If the input adjective is a root adjective, then no stemming is required and the input adjective is returned as the output. Otherwise, the inflected input adjective is sent to the suffix-stripping algorithm to get the corresponding root adjective. The suffix-stripping algorithm uses a set of predefined rules. India is a linguistically rich country with 22 languages recognized officially. But the computational resources developed for these languages are very scarce. Most of the stemmers developed for Punjabi language so far concentrated on nouns and proper names. PAS is the only stemmer developed so far for specifically addressing the problem of stemming of Punjabi adjectives. PAS has an overall accuracy of 88.76%. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
26. Simultaneous Removal of Prefix and Suffix.
- Author
-
Tamta, Pawan and Pande, B. P.
- Subjects
SUFFIXES & prefixes (Grammar) ,ENGLISH language ,HYPOTHESIS ,INFORMATION retrieval ,ENGLISH word formation - Abstract
This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N -grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N -grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based
- Author
-
Anita Guterres, Gunawan, and Joan Santoso
- Subjects
Bahasa Tetun ,Stemmer ,Information technology ,T58.5-58.64 ,Computer software ,QA76.75-76.765 - Abstract
Stemming adalah proses yang sangat penting untuk mencari kata dasar dari sebuah kata derivatif. Inti dari proses stemming adalah menghilangkan imbuhan pada suatu kata. Stemming sangat dibutuhkan untuk proses information retrieval system. Algoritma pada proses stemming bisa berbeda-beda pada setiap bahasa di berbeda negara. Data yang digunakan adalah 176 kata dasar dalam bahasa Tetun yang merupakan bahasa asli warga negara Timor Leste. Penelitian ini bertujuan untuk merancang algoritma baru yang tepat untuk stemming bahasa Tetun. Tahap awal stemming bahasa Tetun adalah proses filterisasi untuk menghilangkan tanda baca, angka, dan kata yang tidak penting. Lalu tahap tokenisasi untuk membuat variabel yang terdiri dari satu kata. Lalu setiap kata melalui proses stemming untuk menghilangkan imbuhan awalan, akhiran, dan konfiks. Analisis dilakukan berdasarkan kasus error stemming seperti overstemming, understemming, unchanged, dan spelling exception. Hasil uji coba yang didapatkan adalah algoritma stemming bahasa Tetun menghasilkan akurasi sebesar 90.52%.
- Published
- 2019
- Full Text
- View/download PDF
28. An Enhanced Rule Based Arabic Morphological Analyzer Based on Proposed Assessment Criteria
- Author
-
Maabid, Abdelmawgoud Mohamed, Elghazaly, Tarek, Ghaith, Mervat, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Tan, Ying, editor, Shi, Yuhui, editor, Buarque, Fernando, editor, Gelbukh, Alexander, editor, Das, Swagatam, editor, and Engelbrecht, Andries, editor
- Published
- 2015
- Full Text
- View/download PDF
29. Automatic Text Summarisation Using an Advanced Stemmer Algorithm: A Case Study of the Xhosa Language.
- Author
-
Ndyalivana, Zukile and Shibeshi, Zelalem
- Subjects
- *
NATURAL language processing , *INFORMATION overload , *CASE studies , *INFORMATION retrieval - Abstract
In today's world, digital content is becoming significantly abundant. Finding ways to come up with a tool that can aid with this is of fundamental importance. People are faced with what is referred to as information overload. A tool that can make a summary of a text without losing its message, coherence and cohesion is vital. We live in a digital age and that technology saves us time. This means that users can only focus on points they are interested in. This is one of the research areas in natural language processing/information retrieval which this work tries to contribute to. It tries to contextualise the tools and technologies that are developed for other languages to automatically summarise textual Xhosa news articles. The work specifically aims to develop a text summariser for textual Xhosa news articles based on extraction methods. In doing so, it examines the literature to try to understand the techniques and technologies used to analyse the contents of a written text in order to transform and synthesise it. The study also examines the phonology and morphology of the Xhosa language, and finally, designs, implements, and tests an extraction-based automatic news article for the Xhosa language. Two approaches were used to extract relevant sentences: term frequency and sentence position. The Xhosa summariser is evaluated using a test set. This study has employed both subjective and objective evaluation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
30. GUJARATI TEXT MORPHOLOGICAL ANALYZER USING RULE BASED CLASSIFIER.
- Author
-
Shah, Neepa and Boradia, Aneri
- Subjects
DATA scrubbing ,MINING methodology ,ALGORITHMS ,INFORMATION retrieval ,NATURAL language processing - Abstract
Data and information's are overloaded on the web and it is a primary problem faced by people and institutions today. Grouping out some useful information from the sentence given in the Gujarati language and remove suffix, prefix and unnecessary characters is the important process of data cleaning and make effective use of our database. These are challenging task for every Indian language due to its rich morphological variance. This paper presents a lightweight Morphological analyzer for Gujarati language using a rule based classifier. Searching a given word in the Gujarati text is called text mining in a Gujarati language but database of the Gujarati stems need to be strengthen so that searching become effective and easy. The source of text mining is the process of stemming. It is normally used in many types of applications such as Natural Language Processing (NLP), Information Retrieval (IR) and Text Mining (TM). In this paper we present a stemming algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2019
31. A cognitive inspired unsupervised language-independent text stemmer for Information retrieval.
- Author
-
Alotaibi, Fahd Saleh and Gupta, Vishal
- Subjects
- *
LANGUAGE & languages , *INFORMATION retrieval , *INFORMATION services , *INFORMATION-seeking strategies , *INFORMATION resources management - Abstract
Abstract In Information Retrieval systems, stemming handles the words that can occur in different morphological forms, and hence matches the terms of the documents and the queries that are related in meanings. In this article, we have proposed a cognitive inspired language-independent stemming that learns group of morphologically related words from the ambient corpus without any linguistic knowledge or human intervention and it behaves in a way the human brain works. The main idea of our proposed algorithm is to determine only those variants of the words from the ambient corpus that match the original intent of the query terms. We conducted ad-hoc retrieval experiments in a number of languages of varying morphological complexity using standard TREC, FIRE, and CLEF document collection. The results indicate that stemming improves the retrieval accuracy and the effectiveness of stemming algorithm increases with the increase in the morphological complexity of algorithm. The results also indicates that the performance of our proposed algorithm is better than the stemmers based on linguistic knowledge and other state-of-the-art statistical stemmers in almost all the languages under study. In multi-lingual setup these results are quite encouraging. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
32. Translation Rules for English to Hindi Machine Translation System: Homoeopathy Domain.
- Author
-
Dwivedi, Sanjay and Sukhadeve, Pramod
- Subjects
HOMEOPATHY ,TRANSLATIONS ,MACHINE learning ,ENGLISH grammar ,POLYGLOT dictionaries - Abstract
Rule based machine translation system embraces a set of grammar rules which are mandatory for the mapping of syntactic representations of a source language, on the target language. The system necessitates good linguistic knowledge to write rules and require of acquaintance source such as corpus and bilingual dictionary. In this paper, we have described the grammar rules intended for our English to Hindi machine translation system to translate the homoeopathic literatures, medical reports, prescription etc. The rules which have been written follow the transfer based approach for reordering of rules between two languages. The paper first discusses about our developed stemmer and its rules, further we discuss the Part of Speech tagging (PoS) rules for categorizing each word of the sentence grammatically and our developed homoeopathy corpus in English and Hindi of size 20085 and 20072 words respectively and at the last we discuss the agreement/translation rules for translating various homoeopathic sentences. [ABSTRACT FROM AUTHOR]
- Published
- 2015
33. A simple algorithm for the problem of suffix stripping.
- Author
-
Pande, B. P., Tamta, Pawan, and Dhami, H. S.
- Subjects
- *
ENGLISH language , *INFORMATION storage & retrieval systems , *MORPHOLOGY (Grammar) , *LINGUISTIC analysis , *SUFFIXES & prefixes (Grammar) , *NATURAL language processing - Abstract
Suffix stripping is a problem of removing morphological suffixes from a word to get the stem. We present suffix stripping as an unconstrained optimization problem. Free from linguistic or morphological knowledge, a simple algorithm is being developed. Superiority of the algorithm over an established technique for English language is being demonstrated. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
34. Comparative Study of Various Persian Stemmers in the Field of Information Retrieval.
- Author
-
Moghadam, Fatemeh Momenipour and Keyvanpour, MohammadReza
- Subjects
INFORMATION retrieval ,COMPARATIVE studies ,NATURAL language processing ,PERFORMANCE evaluation ,BOOSTING algorithms - Abstract
In linguistics, stemming is the operation of reducing words to their more general form, which is called the 'stem'. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is measured by them. Stemmers decrease the indexed file, increase the speed of information retrieval systems, and improve the performance of these systems by boosting precision and recall. There are few Persian stemmers and most of them work based on morphological rules. In this paper we carefully study Persian stemmers, which are classified into three main classes: structural stemmers, lookup table stemmers, and statistical stemmers. We describe the algorithms of each class carefully and present the weaknesses and strengths of each Persian stemmer. We also propose some metrics to compare and evaluate each stemmer by them. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
35. The Effect of Combining Different Semantic Relations on Arabic Text Classification.
- Author
-
Yousif, Suhad A., Samawi, Venus W., Elkabani, Islam, and Zantout, Rached
- Subjects
MATHEMATICAL combinations ,SEMANTICS ,MATHEMATICAL domains ,CLASSIFICATION algorithms ,COMPUTER science - Abstract
A massive amount of documents are being posted online every minute. The task of document classification requires extensive background work on the content of documents, where keyword-based matching alone may not be sufficient. Much research has been carried out in several languages that has revealed significant results. However, Arabic documents still pose a great challenge due to the nature of Arabic language. Extracting roots or stems from the breakdown of multiple Arabic words and phrases are an important task that must be completed before applying text classification. The research at hand proposes an algorithm for classifying Arabic-Text documents using semantic relations between words based on an Arabic thesaurus, mainly synonyms, hyperonyms and hyponyms. The experiments conducted in this study evaluated the results using F1-Measure and compared them to results obtained via other existing methods, such as utilizing stemmers and part-of-speech taggers, where it indicated an increment of more than 12.6% for the novel method using semantic relation over other methods. Arabic-WordNet was utilized as a thesaurus for indicating possible relations to be examined. The obtained results indicate that the domain of the semantic web reveals a variety of options for enhancing text classifications, which are highly competitive with current methods. Future work will include identifying best relations to be utilized among the available 20 relations. [ABSTRACT FROM AUTHOR]
- Published
- 2015
36. RFreeStem un raciniseur pour le malgache
- Author
-
Andonirina Andriamihasinoro, Josiane Mothe, Oihana Coustie, Olivier Teste, Université d'Antananarivo, Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, and Meunier, Romain
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Malagasy ,Racinisation ,Malgache Information systems ,Recherche d'information ,Raciniseur ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Traitement automatique des langues naturelles ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,stemming ,Langues peu outillées ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Systèmes d'information ,Information retrieval ,under-studied languages ,natural language processing ,stemmer - Abstract
Stemming is a step in text pre-processing that groups together words that are morphologically different but semantically similar, and which therefore, when used in a query in a search engine, should match similar or even identical documents. For many languages, stemmers are rule-based. For languages without tools, the stemming problem remains unsolved. This is the case of Malagasy. This paper analyzes the efficiency of a stemmer, RFreeStem, based on the statistical analysis of texts and without rules. We study the hyperparameters of this stemmer and their influence on the efficiency of the stemming for Malagasy by comparing it to an existing test collection containing manually obtained word roots., La racinisation est une étape dans le pré-traitement des textes qui regroupe des mots qui sont morphologiquement différents mais sémantiquement similaires, et qui donc, utilisés dans une requête, devraient correspondre à des résultats d'un moteur de recherche similaires voire identiques. Pour de nombreuses langues, les raciniseurs sont à base de règles. Pour des langues non outillées, le problème de racinisation demeure non résolu. C'est le cas du malgache. Cet article analyse l'efficacité d'un raciniseur, RFreeStem, basé sur l'analyse statistique des textes et sans règle. Nous étudions les hyperparamètres de ce raciniseur et leur influence sur l'efficacité du raciniseur pour le malgache en se comparant à une collection de test existante et contenant des racines obtenues manuellement.
- Published
- 2021
37. RFreeStem : Une méthode de racinisation indépendante de la langue et sans règle
- Author
-
Josiane Mothe, Xavier Baril, Oihana Coustie, Olivier Teste, AIRBUS Operations Ltd., Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse - Jean Jaurès (UT2J), and IUT Toulouse 2 Blagnac
- Subjects
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,recherche d’information ,racinisation ,sentiment analysis ,fouille de texte ,système d’information ,text mining ,information retrieval ,General Medicine ,stemmer ,information systems ,NLP ,analyse de sentiments - Abstract
International audience; With the large expansion of available textual data, text mining has become of special interest. Due to their unstructured nature, such data require important preprocessing steps. Among them, stemming algorithms conflate the variants of words into their stems. However, the most popular algorithms are rule-based, and therefore highly language- dependent. In contrast, corpus-based stemmers often exhibit significant algorithmic complexity, making them inefficient. They do not necessarily provide the extracted stems either, which are required for certain text mining tasks. We propose a new approach, RFreeStem, that is corpus-based and can therefore be applied on many languages. The implementation of our method is flexible and efficient, since it relies on a single running through the words’ngrams. We also detail a method to extract the stems. Our experiments show that RFreeStem improves the results of text mining tasks, even more than the Porter reference, while providing a stemming solution on poorly endowed languages, which do not benefit from a version of Porter.; La racinisation est un pré-traitement essentiel dans de nombreuses tâches de fouille de texte. Les algorithmes les plus utilisés sont basés sur l’application successive de règles spécifiques à la langue. Cette construction les rend fortement dépendants de la langue d’application. Par opposition, les raciniseurs basés sur des corpus présentent souvent une importante complexité algorithmique, les rendant peu efficaces. Ils ne mettent pas non plus nécessairement à disposition les racines extraites, pourtant requises pour certaines tâches de traitement de texte. Nous proposons ici une nouvelle approche, appelée RFreeStem, qui se base sur l’étude d’un corpus et peut être appliquée à différentes langues. L’implémentation de notre méthode est flexible et efficace, car basée sur un unique parcours des n-grammes. Nous détaillons également une méthode d’extraction des racines. Nos expériences montrent que RFreeStem améliore les résultats des tâches de traitement de texte, plus encore que la référence de l’état de l’art, Porter, tout en proposant une racinisation sur des langues peu dotées, où aucune version de Porter n’est implémentée.
- Published
- 2021
- Full Text
- View/download PDF
38. Analysis and Efficient Implementation of Morphological Normalization methods the for Croatian Language
- Author
-
Markušić, Luka and Šnajder, Jan
- Subjects
stemming ,regular expressions ,NFA ,TECHNICAL SCIENCES. Computing ,TEHNIČKE ZNANOSTI. Računarstvo ,regex ,regularni izrazi ,stemmer ,korjenovatelj ,NLP ,NKA - Abstract
Rad istražuje problem korjenovanja hrvatskog jezika s naglaskom na performansama. Istražena je povijest korjenovanja ostalih jezike, pristupi rješavanju problema korjenovanja te su izabrani konačni automati kao područje vrijedno istraživanja. Kreirani su nedeterministički konačni automati od predefiniranih pravila za odbacivanje nastavaka te je provjereno rade li brže od postojećih implementacija na testnom uzorku od 40 000 riječi. Usporedba se vršila među više postojećih rješenja, od kojih inačica s automatima postiže najbolje rezultate, no ti rezultati nisu bolji od rezultata polazišne implementacije. The problem of stemming for Croatian language was researched. An overview of the history of stemming for different languages was presented, and the various implementations of such a tool were researched. An approach trying to improve on the performance of existing implementations was made based on non-deterministic finite automata. The automata were constructed from prewritten rules and tested on a test set of 40 000 words. The final execution time was compared to a few existing implementations and it was among the fastest, but not faster than the starting implementation.
- Published
- 2020
39. Relatives in everyday life: A study based on collaboration with relatives of people with type 1 diabetes
- Author
-
Roosta, Natalie, Meyer, Maria Holmgaard, Aamann, Iben Charlotte, and Phillips, Louise Jane
- Subjects
Samskabelsesmetoder ,Roller ,Sundhedsdiskurser ,IFADIA ,Stemmer ,Samskabelse ,Mikhail Bakhtin ,Pårørende ,Centripetale og centrifugale bevægelser ,Hverdagsliv ,Inddragelse ,Borgerinddragelse ,Sundhedsfremme ,Socialkonstruktivisme ,Workshop ,Diskurs ,Dialogiske processer ,Speciale ,Sundhedsbegreb ,Foucault ,Diabetes type 1 ,Kommunikation ,Sundhedsvæsenet ,Betydningsdannelse ,Lene Otto ,Relatives ,Dialogteori ,Ytringer ,Nikolas Rose ,Relationer ,Memory work ,Diagnosesamfund - Abstract
Forskning viser, at pårørende til personer med kronisk sygdom, ofte oplever forringet livskvalitetPårørende til kronisk syge personer, kan opleve flere forskellige både helbredsmæssige, sociale og psykiske belastninger. På baggrund af denne forskning og vores egen interesse for sundhedsfremme i sundhedsvæsenet, oplever vi et behov for et fokus på pårørende. Dette undersøger vi med udgangspunkt i pårørende til personer, med den kroniske sygdom diabetes type 1.Vi bruger Mikhail Bakhtins teoretiske greb om centripetale og centrifugale bevægelser, til at få indsigt i hvilke stemmer og ytringer, der i dialogerne opstår som henholdsvis entydige og flerstemmige. Herigennem ser vi Bakhtins begreb om stemmer, som et udtryk for diskurser. Gennem Bakhtins teoretiske greb, og ud Fra Lene Ottos forståelse, ses hvordan både dominerende og marginaliserede diskurser er betinget af sociale og kulturelle erfaringer.Ud fra de valgte teoretiske greb opnår vi gennem de to workshops viden om, at deltagerne på tværs af relationer spejler sig i hinandens hverdagsliv. Deltagerne taler ud fra forskellige stemmer og diskurser, hvilket skaber udvikling af deltagernes forståelsesrammer. Heraf konkludere vi at stemmer og diskurser fra de to workshops, med fordel kan deles med øvrige pårørende til personer med diabetes type 1. Hvis sundhed ses som en vekselvirkning mellem et forebyggelsesperspektiv og i et bredere kulturelt perspektiv, ser vi behov for en øget indsigt i, og inddragelse af, pårørende til kronisk syge., Research shows that relatives of people with chronic illness often experience impaired quality of life. These relatives may experience various health, social and psychological stresses. Based on research and our interest in health promotion, we experience a need for a focus on relatives. This thesis therefore investigates relatives of people with the chronical illness diabetes type 1. We created participatory-based knowledge through two workshops using a range of dialogic communication theory and the method memory work. With a use of Mikhail Bakhtin’s dialogue theory about centrifugal and centripetal language forces, we have found that relatives’ voices are clarified as an expression of different discourses. Based on an understanding of Lene Otto's descriptions about how health discourses are conditioned by social and cultural experiences, our study shows that relatives individual perspectives are essential for their understanding of discourses. Through theoretical approaches, we gain knowledge that the relatives across relationships are reflected in each other’s everyday life. The relatives speak from different voices and discourses which creates a development of the relatives’ understandings. In conclusion, our study clarifies that voices and discourses can be shared with other relatives of people with diabetes type 1. If health is an interaction between a prevention perspective and a broader cultural perspective, increased insight and involvement of relatives of chronically ill patients is needed.
- Published
- 2020
40. Implementasi Algoritma Confix Stripping untuk Pendeteksian Kesalahan pada Tenses
- Author
-
Suryaningrum, Kristien Margi and Suryaningrum, Kristien Margi
- Abstract
Penelitian ini membahas bagaimana cara melakukan deteksi kesalahan penulisan tenses Bahasa Inggris dengan menggunakan metode stemming. Algoritma yang digunakan untuk melakukan stemming adalah algoritma Confix Stripping. Algoritma Confix-Stripping (CS) berdasarkan pada aturan morfologi bahasa Indonesia yang dikelompokkan menjadi satu dan di-enkapsulasi pada imbuhan, termasuk awalan, akhiran, sisipan, dan kombinasi. Algoritma Confix-Stripping (CS) menggunakan tiga komponen: kumpulan imbuhan, rule-rule, dan kamus. Pada penelitian ini akan dilakukan analisis penerapan algoritma Confix-Stripping (CS) tersebut pada Information Retrieval system. Kalimat yang akan dideteksi dipisahkan menjadi kata per kata untuk kemudian di tentukan posisinya di dalam kalimat. Setelah posisi setiap kata ditentukan, kata yang mendapatkan posisi sebagai verb akan melalui proses stemming untuk mendapatkan bentuk kata dasarnya. Setelah didapatkan kata dasarnya maka bentuknya akan dirubah sesuai dengan bentuk tenses dari kalimat. Penggunaan metode stemming dengan algoritma Confix Stripping dijalankan untuk melakukan deteksi kesalahan penulisan tenses Bahasa Inggris
- Published
- 2019
41. Повышение эффективности эксплуатации нефтепромыслового оборудования
- Subjects
клапан лифтовый ,привод ШГН ,clamp ,elevator valve ,датчик нагрузок ,канатная подвеска ,pipe holder ,traverse ,rope suspension ,зажим ,траверса ,винт штанговращателя ,rod rotator screwr ,муфта НКТ ,ShGN actuator ,tubing coupling ,штанговращатель ,устьевой шток ,трубодержатель ,stemmer ,wellhead rod ,load sensor - Abstract
В ПАО «Татнефть» реализуется программа оптимизации производства и сокращения собственных издержек. Снижение затрат на эксплуатацию наземного нефтепромыслового оборудования — одна из важнейших задач в рамках обозначенной программы. В данной статье рассмотрены технические решения, направленные на оптимизацию производственных процессов в области эксплуатации нефтепромыслового оборудования. В процессе механизированного способа добычи нефти штанговыми скважинными насосными установками существует ряд технических проблем оказывающие серьезное влияние на эффективность эксплуатации добывающего фонда скважин и в целом на объем добычи нефти. К таким проблемам следует относить: 1) не совершенность имеющихся конструкций клапанов для стравливания газов из затрубного пространства нефтедобывающих скважин по НКТ в устьевую арматуру, 2) отсутствие возможности проведения технологических исследований по замеру степени загруженности наземных приводов ШГН, оборудованных штанговращателем, 3) интенсивный износ сальниковых уплотнений по причине повреждения рабочей поверхности полированного штока., PJSC “Tatneft” implements a program to optimize production and reduce its own costs. Reducing the cost of operating onshore oilfield equipment is one of the most important tasks in the framework of the designated program. This article discusses technical solutions aimed at optimizing production processes in the field of exploitation of oilfield equipment. In the process of a mechanized method of oil production by sucker-rod pumping units, there are a number of technical problems that have a serious impact on the efficiency of operation of the production stock of wells and, in general, on the volume of oil production. Such problems should include: 1) the imperfection of the existing valve designs for bleeding gases from the annular space of oil wells through tubing to the wellhead fittings, 2) the lack of the possibility of technological studies to measure the degree of congestion of SHGN ground drives equipped with a rotator, 3) intensive wear of stuffing boxes due to damage to the working surface polished stock.
- Published
- 2019
- Full Text
- View/download PDF
42. Usporedba jezičnih alata za njemački jezik
- Author
-
Beli, Dorian and Martinčić-Ipšić, Sanda
- Subjects
computer analysis of the German language ,part-of-speech tagger ,corpus ,korjenovatelj, lematizator, obilježivač vrsta riječi, njemački, korpus, računalna analiza njemačkog jezika ,stemmer ,lemmatizer ,German - Abstract
Kada govorimo o računalnoj analizi i razumijevanju teksta, alati poput lematizatora, korjenovatelja, obilježivača vrsta riječi te različiti korpusi jezika igraju veliku ulogu u području računalne lingvistike. Alati poput ovih promatraju sintaksu i lingvistiku nekog određenog jezika te što boljom primjenom pravila istih, uz pokoju implementaciju vjerojatnosnih algoritama, nastoje bolje obraditi zadani jezik. U ovom završnom radu obrađujemo 4 najpoznatija korjenovatelja, dva lematizatora te dva obilježivača vrsta riječi njemačkog jezika. Osim teorijske obrade navedenih alata, dotaknut ćemo se i praktične usporedbe navedenih u zasebnom poglavlju na vlastitim tekstovima. Korjenovatelji Snowball, CISTEM, Text::Geramn i UniNE, lematizatori GermaLemma i SMOR te obilježivači TIGER korpusa i Pro3GreDE imaju iskazanu točnost u postotcima. Među korjenovateljima najuspješniji se pokazao CISTEM korjenovatelj s 91.23% točnih korjenovanja, zatim Text::German sa 88, 55% kojeg slijedi Snowball sa 82, 44% te na kraju UniNE koji ima točnost u rasponu od 78, 63% do 80, 92%. Između dva obilježivača vrsta riječi točniji se pokazao hibridni Pro3GresDE sa 93, 55% te onaj uključen unutar TIGER korpusa sa 90, 32% točnosti. Kod lematizatora točnijim se pokazao SMOR sa 94, 27% točnosti te nakon njega GermaLemma sa 85, 5% točnosti.
- Published
- 2018
43. Fonografens stemmer:en kort introduktion til en revolutionerende opfindelse
- Author
-
Steinskog, Erik and Steinskog, Erik
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.