2,598 results on '"PARTS of speech"'
Search Results
2. When Socialism Meets Terrorism: A Computer-Assisted Discursive News Values Analysis of Chinese Newspapers' Coverage of Domestic and International Terrorist Attacks.
- Author
-
Guo, Jingxuan, Mast, Jelle, and Vosters, Rik
- Subjects
- *
PARTS of speech , *CHINESE language , *COLLOCATION (Linguistics) , *SOCIALIST societies , *SOCIAL types - Abstract
Terrorist attacks, as a particular type of negative social event, are a potential threat to the establishment of a "Socialist Harmonious Society" advocated by the Chinese government. To shed light on how Chinese journalists construct reports on terrorist attacks and, more broadly, how non-Western countries report on negative social issues, we compared newspaper coverage of one domestic attack with four international incidents. Starting from Bednarek and Caple's framework, we further developed the computer-assisted Discursive News Values Analysis (DNVA) they proposed by adapting individual definitions of news values and dividing them into 15 subcategories to suit the Chinese context, the topic of terrorism, and the automatic processing of large-scale datasets. Based on our understanding, we developed an open-source list of Chinese lexical indicators by Part of Speech tagging, sentiment, collocation, and concordance analysis, with a view to providing resources to support future quantitative DNVA. Following this approach, we found that while market factors enabled Chinese newspapers to cover domestic terrorist attacks, media adopted specific media strategies to meet political needs, which differed from those used to cover international attacks. These different media strategies can be explained by the Chinese developmentalist ideology and harmonious cultural traditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Tree hierarchical deep convolutional neural network optimized with sheep flock optimization algorithm for sentiment classification of Twitter data.
- Author
-
Sanmugaraja, Lakshmanaprakash and Annamalai, Pandiaraj
- Subjects
- *
CONVOLUTIONAL neural networks , *OPTIMIZATION algorithms , *PARTS of speech , *CLASSIFICATION algorithms , *FEATURE extraction - Abstract
The increasing volume of online reviews and tweets poses significant challenges for sentiment classification because of the difficulty in obtaining annotated training data. This paper aims to enhance sentiment classification of Twitter data by developing a robust model that improves classification accuracy and computational efficiency. The proposed method named Tree Hierarchical Deep Convolutional Neural Network optimized with Sheep Flock Optimization Algorithm for Sentiment Classification of Twitter Data (SCTD-THDCNN-SFOA) utilizes the Stanford Sentiment Treebank dataset. The process begins with pre-processing steps including Tokenization, Stop words Elimination, Filtering, Hashtag Removal, and Multiword Grouping. The Gray Level Co-occurrence Matrix Window Adaptive Algorithm is employed to extract features, such as emoticon counts, punctuation counts, gazetteer word existence, n-grams, and part of speech tags. These features are selected using Entropy–Kurtosis-based Feature Selection approach. Finally, the Tree Hierarchical Deep Convolutional Neural Network enhanced by the Sheep Flock Optimization Algorithm is used to categorize the Twitter data as positive, negative, and neutral sentiments. The proposed SCTD-THDCNN-SFOA method demonstrates superior performance, achieving higher accuracy and lesser computation time than the existing models, respectively. The SCTD-THDCNN-SFOA framework significantly improves the accuracy and efficiency of sentiment classification for Twitter data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Similar, not universal: the cognitive dimensions of conceptual prototypes of basic emotions in English and in Polish.
- Author
-
Bąk, Halszka and Altarriba, Jeanette
- Subjects
- *
PARTS of speech , *ENGLISH language , *EMOTIONS , *GENDER , *PROTOTYPES , *FACIAL expression - Abstract
The current study explores the differences in conceptualisation of the prototypical basic emotion lexicalisations (
anger ,disgust ,fear ,joy ,sadness ,surprise ) in English and in Polish. Measures of concreteness, imageability and context availability were collected and analysed across the six semantic categories of basic emotions, across different parts of speech and between the self-determined genders of the study participants. The initial results indicate that within these cognitive dimensions the conceptualisations of basic emotions in English and in Polish are only similar on the more general but not the higher levels of conceptualisation. The folk-psychological division between positive and negative emotions and the grammatical parts of speech reveal similar patterns in basic emotion concepts in both Polish and in English. However, on the higher levels of conceptualisations that include specific basic emotion semantic categories and self-identified gender, marked language-specific differences become apparent. Different negative emotions drive the statistical differences in Polish and in English, and the gender effects on the measures of concreteness, imageability and context availability are opposite from one language to the other. In other words, basic emotions may be broadly mutually intelligible, but not exactly the same when communicated across languages and cultures. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
5. Old English Wlītan and Wlātian: Poetic Verbs of Looking (And Seeing).
- Author
-
Klein, Thomas
- Subjects
- *
VERBS , *VERBALS (Grammar) , *PARTS of speech , *ENGLISH language , *GERMANIC languages - Abstract
The article explores the nuanced meanings of two Old English verbs related to visual perception. Topics include the distinction between agentive and experiential verbs in Old English, the rarity and poetic nature of the verbs wītan and wlātian, and the semantic relationships of these verbs to their cognates in other Germanic languages.
- Published
- 2024
- Full Text
- View/download PDF
6. Part-of-Speech Features in Bob Dylan's Song Lyrics: A Stylometric Analysis.
- Author
-
Dai, Zheyuan and Liu, Haitao
- Subjects
- *
PARTS of speech , *SONG lyrics , *STYLOMETRY , *VERBS , *ADJECTIVES (Grammar) - Abstract
Honoured as a Nobel Laureate in 2016, Bob Dylan's song lyrics have garnered well-deserved recognition and appreciation for their themes, content and artistic performances. Part-of-speech characteristics are effective in denoting stylistic features of texts in stylometric studies. The present study carried out a quantitative observation of stylistic features of Dylan's lyrics. Specifically, the study focuses on parts of speech like verbs, adjectives, adverbs, nouns and pronouns. Results of the present study reveal that: (1) Based on the distribution of verbs and adjectives, Dylan's lyrics are significantly active texts, and the activity sequences (Q-sequences) have validated the result in a dynamic way; (2) Dylan's lyrics tend to present the characteristics of a written body; (3) Individualism is prominent in Dylan's lyrics accompanied by the wide use of the first-person singular pronouns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. 基于中文文本相似度评估的情感勒索话语检测系统.
- Author
-
林文晟, 杨观赐, and 钟世吴
- Subjects
- *
PARTS of speech , *TEST systems , *CHINESE language , *SELF-expression , *EVALUATION methodology - Abstract
Emotional blackmail is a way of communication that forces people around them to listen to their requirements by emotional pressure. It can easily lead to negative emotions and even psychological problems of the other side, then affect the communication effect. In order to detect emotional blackmail utterance in daily communication scenarios and improve the communication effect, this paper developed an emotional blackmail utterance detection system based on Chinese text similarity evaluation. Firstly, this paper labelled the collected data based on Susan Forward's emotional blackmail theory, and construeted an emotional blackmail corpus and a test set. Secondly, this paper analyzed the expression modes of emotional blackmail, and designed text similarity evaluation methods based on part of speech and semantic words respectively, then formed an emotional blackmail utterance detection algorithm based on Chinese text similarity evaluation. Thirdly, this paper carried out experiments on the constructed datasets. The average recall and F1-score of the proposed algorithm are 95.21% and 79.95% respectively, which are better than the compared algorithms. Finally, this paper integrated an emotional blackmail utterance detection prototype system based on the proposed algorithm. Under different testing conditions, system test results show that the average recall is 87.24%, which shows good robustness and detection performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. The action reference construction in Mandarin Chinese and typology of lexical flexibility.
- Author
-
Gong, Liwei and Uehara, Satoshi
- Subjects
- *
PARTS of speech , *CONSTRUCTION grammar , *MANDARIN dialects , *CHINESE language , *LEXICAL grammar - Abstract
The parts of speech system and lexical flexibility in Mandarin Chinese (henceforth Chinese) has long been subjects of debate due to the pervasive zero coding of action reference constructions. In this article, we analyze properties of the Chinese Action Reference Construction from the perspective of Radical Construction Grammar (Croft 2001, 2022), focusing on its structural coding, behavioral potential, productivity, and semantic shifts. We also discuss typological features that potentially reinforce lexical flexibility in Chinese, and the implications that the language-specific properties of Chinese present for cross-linguistic discussions of parts of speech. Specifically, reference, instead of predication, is the most flexible information-packaging function in Chinese, challenging the privileged status of predication established in previous studies on parts of speech and lexical flexibility. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Exploring language relations through syntactic distances and geographic proximity.
- Author
-
De Gregorio, Juan, Toral, Raúl, and Sánchez, David
- Subjects
LINGUISTIC typology ,PARTS of speech ,MARKOV processes ,KINSHIP ,SYNTAX (Grammar) - Abstract
Languages are grouped into families that share common linguistic traits. While this approach has been successful in understanding genetic relations between diverse languages, more analyses are needed to accurately quantify their relatedness, especially in less studied linguistic levels such as syntax. Here, we explore linguistic distances using series of parts of speech (POS) extracted from the Universal Dependencies dataset. Within an information-theoretic framework, we show that employing POS trigrams maximizes the possibility of capturing syntactic variations while being at the same time compatible with the amount of available data. Linguistic connections are then established by assessing pairwise distances based on the POS distributions. Intriguingly, our analysis reveals definite clusters that correspond to well known language families and groups, with exceptions explained by distinct morphological typologies. Furthermore, we obtain a significant correlation between language similarity and geographic distance, which underscores the influence of spatial proximity on language kinships. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Verb Form Recognition and Error Detection in English Articles Using Long Short-Term Memory and Grammar Checks.
- Author
-
Hu, Ping and Zhang, Huicheng
- Subjects
- *
RECOGNITION (Psychology) , *SUPPORT vector machines , *PARTS of speech , *ENGLISH language , *VERBS - Abstract
Error checking of verb forms in English articles is beneficial for learning English and improving the fluency of English texts. In this study, long shortterm memory (LSTM) was used to recognize the types of errors in verb forms. To maximize the utilization of textual context information, a bidirectional LSTM algorithm was employed. Simulation experiments were then conducted, and the algorithm was evaluated against the support vector machine (SVM) algorithm and the grammar rules-based algorithm. The bidirectional LSTM method demonstrated higher accuracy in recognizing the parts of speech of words and the types of verb form errors in the text. Additionally, the accuracy was more stable when faced with different types of verb form errors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Awakening the Proto‐Lexicon: A Proto‐Lexicon Gives Learning Advantages for Intentionally Learning a Language.
- Author
-
Mattingley, Wakayo, Panther, Forrest, Todd, Simon, King, Jeanette, Hay, Jennifer, and Keegan, Peter J.
- Subjects
- *
SECOND language acquisition , *PARTS of speech , *LANGUAGE ability , *VOCABULARY , *CLASSROOM environment - Abstract
Previous studies report that exposure to the Māori language on a regular basis allows New Zealand adults who cannot speak Māori to build a proto‐lexicon of Māori—an implicit memory of word forms without detailed knowledge of meaning. How might this knowledge feed into explicit language learning? Is it possible to "awaken" the proto‐lexicon in the context of overt language learning? We investigate whether implicit linguistic knowledge represented in a proto‐lexicon gives any advantages for intentional language learning in a tertiary educational environment. We conducted a three‐task experiment which: (a) assessed participants' Māori proto‐lexicon, (b) assessed their phonotactic knowledge, and (c) tested them on Māori vocabulary that they had been exposed to during the course at two time points. The results show that students with larger Māori proto‐lexicons learn more words in a classroom setting. This study shows that proto‐lexicon acquired from ambient exposure can lead to significant benefits in language learning. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Toddlers' Verb‐Marking Errors Are Predicted by the Relative Frequency of Uninflected Sequences in Well‐Formed Child‐Directed Speech: A Preregistered Corpus Analysis.
- Author
-
Sawyer, Hannah, Bannard, Colin, and Pine, Julian
- Subjects
- *
VERBAL ability , *GRAMMAR , *MORPHEMICS , *ENGLISH language education in elementary schools , *PARTS of speech - Abstract
Verb‐marking errors such as she play football and daddy singing are a hallmark feature of English‐speaking children's speech. We investigated the proposal that these errors are input‐driven errors of commission arising from the high relative frequency of subject + unmarked verb sequences in well‐formed child‐directed speech. We tested this proposal via a preregistered corpus analysis and asked at what level the effects occur: Is it the relative frequency of specific subject + unmarked verb sequences in the input that is important, or is it simply that verbs become entrenched, such that their frequency of appearance with any third person singular subject accounts for errors? We found that the best predictor of children's verb‐marking errors is the relative frequency of unmarked forms of specific subject + verb sequences. Our results supported the proposal that children's apparent omissions of certain grammatical morphemes are in fact input‐driven errors of commission and provided insight into the mechanisms by which this occurs. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Signature Dynamics of Development in Second Language Sociolinguistic Competence: Evidence From an Intensive Microlongitudinal Study.
- Author
-
Wirtz, Mason A. and Pfenninger, Simone E.
- Subjects
- *
SECOND language acquisition , *LANGUAGE ability , *PARTS of speech , *VOCABULARY , *SOCIOLINGUISTICS - Abstract
This study is the first to explore microdevelopment in sociolinguistic evaluative judgments of standard German and Austro‐Bavarian dialect by adult second language learners of German by using dense time serial measurements. Intensive longitudinal data (10 observations per participant) were collected from four learners at approximately weekly intervals over 3 months. We employed generalized additive models with superimposed periods of significant change to identify rapid developmental phases in individual developmental trajectories. By triangulating these models with qualitative introspective and retrodictive interview data, we identified environmental and psychological stimuli for change. Learners evinced increasing and decreasing periods of significant change, independent of length of residence. Dynamic constellations of identity‐ and agency‐related variables alongside more intensive social interaction with target‐variety speakers contributed to significant changes. We discuss findings from a complexity perspective and advocate for microlongitudinal studies in variationist second language acquisition to better capture stimuli for change in learners' emerging multivarietal repertoires. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Unraveling the Complexities of Second Language Lexical Stress Processing: The Impact of First Language Transfer, Second Language Proficiency, and Exposure.
- Author
-
Sagarra, Nuria, Fernández‐Arroyo, Laura, Lozano‐Argüelles, Cristina, and Casillas, Joseph V.
- Subjects
- *
SECOND language acquisition , *LANGUAGE ability , *MONOLINGUALISM , *SUFFIXES & prefixes (Grammar) , *PARTS of speech - Abstract
We investigated the role of cue weighting, second language (L2) proficiency, and L2 daily exposure in L2 learning of suprasegmentals different from the first language (L1), using eye‐tracking. Spanish monolinguals, English–Spanish learners, and Mandarin–Spanish learners saw a paroxytone and an oxytone verb (e.g., FIRma–firMÓ "s/he signs–signed"), listened to a sentence containing one of the verbs, and chose the one that they heard. The three languages have contrastive lexical stress, but suprasegmentals have a greater functional load in Mandarin than in English. Monolinguals predicted suffixes accurately with both stress conditions and favored oxytones, but learners predicted suffixes accurately only with oxytones, the condition activating fewer lexical competitors. Monolinguals predicted suffixes accurately sooner but at a slower rate than did learners. L2 proficiency, but not L1 or L2 exposure, facilitated L2 predictions. In conclusion, learners of a contrastive‐stress L1 rely on L2 suprasegmentals to the same extent as monolinguals, regardless of their L1. Lower L2 proficiency and higher cognitive load (more lexical competitors) reduce learners' reliance on suprasegmentals. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Testing the Relationship of Linguistic Complexity to Second Language Learners' Comparative Judgment on Text Difficulty.
- Author
-
Zhang, Xiaopeng and Lu, Xiaofei
- Subjects
- *
SECOND language acquisition , *LANGUAGE ability , *VOCABULARY , *PARTS of speech , *ENGLISH language education - Abstract
This study examined the relationship of linguistic complexity, captured using a set of lexical richness, syntactic complexity, and discoursal complexity indices, to second language (L2) learners' perception of text difficulty, captured using L2 raters' comparative judgment on text comprehensibility and reading speed. Testing materials were 180 texts abridged from college English coursebooks, and raters were 90 advanced Chinese learners of L2 English. Forty‐five raters read paired texts and determined which text was harder to understand in each pair, and another 45 raters read paired texts and determined which text they read faster in each pair. Two stepwise linear regression models containing lexical, syntactic, and discoursal features explained 48.1% and 54.6% of the variance in L2 learners' estimates of text comprehensibility and reading speed, respectively, outperforming four commonly used language readability models. These findings contribute useful insights into the relationship between linguistic complexity and L2 learners' perception of text difficulty. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Measuring Lexical Diversity in Texts: The Twofold Length Problem.
- Author
-
Bestgen, Yves
- Subjects
- *
ENGLISH language education , *LANGUAGE ability , *SECOND language acquisition , *VOCABULARY , *PARTS of speech - Abstract
The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century. Numerous indices have been proposed, and many studies have been conducted to evaluate them, but the problem remains. This methodological review provides a critical analysis not only of the most commonly used indices in language learning studies, but also of the length problem itself, as well as of the methodology for evaluating the proposed solutions. Analysis of three data sets of texts produced by English language learners revealed that indices that reduce all texts to the same length using a probabilistic or an algorithmic approach solve the length‐dependency problem; however, all these indices failed to address the second problem, which is their sensitivity to the parameter that determines the length to which the texts are reduced. The paper concludes with recommendations for optimizing lexical diversity analysis. A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Redundancy and Complementarity in Language and the Environment: How Intermodal Information Is Combined to Constrain Learning.
- Author
-
Monaghan, Padraic, Murray, Heather, and Holz, Heiko
- Subjects
- *
STUDY & teaching of artificial languages , *LANGUAGE ability , *SECOND language acquisition , *VOCABULARY , *PARTS of speech - Abstract
To acquire language, learners have to map the language onto the environment, but languages vary as to how much information they include to constrain how a sentence relates to the world. We investigated the conditions under which information within the language and the environment is combined for learning. In a cross‐situational artificial language learning study, participants listened to transitive sentences and viewed two scenes, and selected which scene was described by the sentence. There were three conditions, involving different language variants. All variants had free word order but varied as to whether or not they contained morphosyntactic information that defined the subject and object roles of nouns in the sentence. We found that participants were able to learn information about word order and vocabulary from each variant, demonstrating that learners are not reliant on information within a language only, but can combine constraints from language and environment to support acquisition. Data and analyses are available at: https://osf.io/hxqzc/?view_only=ea6ba6fff6bb468e8de2e8596f029dca A one‐page Accessible Summary of this article in nontechnical language is freely available in the Supporting Information online and at https://oasis‐database.org [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Ensemble learning approach for distinguishing human and computer-generated Arabic reviews.
- Author
-
Alhayan, Fatimah and Himdi, Hanen
- Subjects
CONVOLUTIONAL neural networks ,NATURAL language processing ,MACHINE learning ,ARTIFICIAL intelligence ,PARTS of speech ,DEEP learning - Abstract
While customer reviews are crucial for businesses to maintain their standing in the marketplace, some may employ humans to create favorable reviews for their benefit. However, advances in artificial intelligence have made it less complex to create these reviews, which now rival real ones written by humans. This poses a significant challenge in distinguishing between genuine and artificially generated reviews, thereby impacting consumer trust and decision-making processes. Research has been conducted to classify whether English reviews were authored by humans or computers. However, there is a notable scarcity of similar studies conducted in Arabic. Moreover, the potential of ensemble learning (EL) techniques, such as soft voting, to enhance model performance remains underexplored. This study conducts a comprehensive empirical analysis using various models, including traditional machine learning, deep learning, and transformers, with an investigation into ensemble techniques, like soft voting, to classify human and computer-generated Arabic reviews. Integrating top logistic regression (LR) and convolutional neural network (CNN) models, it achieves an accuracy of 89.70%, akin to AraBERT's 90.0%. Additionally, a thorough textual analysis, covering parts of speech (POS), emotions, and linguistics reveals significant linguistic disparities between human and computer-generated reviews. Notably, computer-generated reviews exhibit a substantially higher proportion of adjectives (6.3%) compared to human reviews (0.46%), providing crucial insights for discerning between the two review types. The results not only advance natural language processing (NLP) in Arabic but also have significant implications for businesses combating the influence of fake reviews on consumer trust and decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Igama neendidi zentsingiselo eziqulathwe lilo.
- Author
-
Nkosiyane, Neliswa and Made, Zoliswa
- Subjects
PARTS of speech ,SEMANTICS ,POLYSEMY ,AMBIGUITY ,VOCABULARY - Abstract
This work aims to show that a word has no specific meaning where one can say 'this word means this'. In other words, a word is just a label of something at that time and place. This idea shows that a word can only have a meaning when used in a particular context, which means one word may have different meanings when used in different contexts. Also, this work will show that a word can have different meanings when used in different contexts of place, time and reason. The analysis of isiXhosa words will be observed using three theories, context theory, context of situation and the referential theory of meaning. All these theories will give clear meanings of different parts of speech like ambiguity, polysemy, antonymy, synonymy and hyponymy. This work aims to show social life by using a qualitative approach. This research method was chosen because it is a method used to obtain information without using numbers. It is a type of research that requires a person to use their mind because it will always use questions that want to know what, why, where, and how. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. The sound roots of Umóⁿhoⁿ
- Author
-
Julie Marsault
- Subjects
onomatopoeias ,siouan languages ,ideophones ,parts of speech ,instrumental affixes ,Philology. Linguistics ,P1-1091 - Abstract
This paper presents a corpus-based study of lexemes denoting sounds in Umóⁿhoⁿ (oma), a Siouan language of North America. I take as a starting point a list of sound-denoting verbal roots (in short: “sound roots”), presented as onomatopoeia in a paper by Dorsey in 1892, that form a coherent set based on their semantic features – they denote sounds. I describe their morphological and syntactic features and their form-meaning mappings in order to assess (1) whether these features distinguish them from other verbal roots, and (2) how well they fit the cross-linguistic definition of ideophones proposed by Dingemanse in 2019. I show that several salient morphological and syntactic features are repeatedly attested with sound roots. However, the currently available corpus does not provide evidence that the sound roots form a homogeneous class on the morphological and syntactic level, due to the disparity of features attested from one root to the other. Hence I conclude that these roots cannot be considered ideophones in Dingemanse’s sense. Nonetheless, similarities between the sound roots of Umóⁿhoⁿ and ideophones in other languages can be observed. They can be grammatically integrated, by contrast with onomatopoeia, and their meaning extends from sound to other sensory domains.
- Published
- 2024
- Full Text
- View/download PDF
21. Teko Ideophones: description of a word class
- Author
-
Françoise Rose
- Subjects
parts of speech ,phonosymbolism ,prosody ,reduplication ,pause ,expressivity ,Philology. Linguistics ,P1-1091 - Abstract
The aim of this paper is to present a comprehensive description of the ideophones of Teko, a Tupi language spoken in French Guiana. This word class, previously only briefly described, is defined in this paper through a systematic comparison to nouns and verbs, at various levels: phonology, word structure, prosody, semantic, morphology, syntax and discourse use. When relevant, it offers quantitative analyses with statistical tests to support the comparison. The analysis is based on a lexical database of 177 ideophones, 420 occurrences in texts and a subset of 101 tokens with audio-recording. It particularly investigates in detail various aspects of prosody, including syllabic structure, pitch, intensity and duration, and pauses. Contrary to the common view on ideophones that postulates a rather marginal status of the latter, this paper shows that ideophones are in fact rather well integrated in the linguistic system of Teko. They yet show regularities that require to consider them a distinct word category. This paper aims at contributing to the growing literature on the cross-linguistic definition and description of ideophones.
- Published
- 2024
- Full Text
- View/download PDF
22. COGMED: a database for Chinese olfactory and gustatory metaphor.
- Author
-
Huang, Jiayu, Chen, Lixin, Huang, Yanyang, Chen, Yuying, and Zou, Laiquan
- Subjects
DATABASES ,PARTS of speech ,STATISTICAL reliability ,METAPHOR ,STATISTICAL correlation ,SMELL ,OLFACTORY perception - Abstract
In recent years, metaphors have been used to understand how abstract concepts are processed, and developing metaphorical database can aid this research agenda. Accordingly, the present study aimed to produce a broad Chinese olfactory and gustatory database of metaphors. Expressions for two sensory types (olfaction/gustation) were compiled and classified according to two conditions (metaphorical/literal) and three parts of speech (adjective/noun/verb). To test the validity of this database, we enrolled 199 participants, of whom 97 completed the olfactory test (319 items) and 102 completed the gustatory test (352 items). Each item was rated on seven dimensions: familiarity, meaningfulness, figurativeness, valence, difficulty, imageability, and naturalness. Descriptive statistics and reliability and correlation analyses were conducted for the items under the two sensory types, two conditions, three parts of speech, and seven dimensions mentioned above. The results showed high internal consistency and split-half reliability for the participants' scores; the correlations between the seven dimensions resembled those of previous studies. We concluded by suggesting several ways the database might be applied to aid future research efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking.
- Author
-
Khatun, Rubaya and Sarkar, Arup
- Subjects
CONVOLUTIONAL neural networks ,TIME complexity ,PARTS of speech ,DEEP learning ,WEBSITES - Abstract
During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Research of semantic aspects of the Kazakh language when translating into the Kazakh sign language.
- Author
-
Nurgazina, Dana and Kudubayeva, Saule
- Subjects
SIGN language ,MACHINE translating ,PARTS of speech ,TENSE (Grammar) ,LINGUISTICS - Abstract
The article discusses the semantic aspects of Kazakh sign language and its characteristics. Semantics, a field within linguistics, focuses on examining the meanings conveyed by expressions and combinations of signs. The author delves into the inquiry of the degree of similarity between verbal and sign languages, highlighting their fundamental distinctions. The primary objective of the research is to scrutinize the characteristics of parts of speech in the Kazakh language when expressed gesturally, along with the principles governing the translation of verbs and adverbial tenses. The article explains in detail the formulas for translating the text into sign language, based on the subject-object-predicate. Examples are given that illustrate the subject-object relationship and determine who acts as the speaker, "object" or "subject" of the utterance. It is necessary to note that for successful translation it is necessary first to understand the meaning of the sentence. The article concludes by emphasizing the importance of understanding both structural elements and contextual nuances in the fascinating world of the semantics of the Kazakh sign language. It inspires further research aimed at uncovering the complexities and exceptions that contribute to a deep understanding of linguistic nuances in this unique form of communication. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Chinese Clinical Named Entity Recognition Using Multi-Feature Fusion and Multi-Scale Local Context Enhancement.
- Author
-
Li, Meijing, Huang, Runqing, and Qi, Xianxian
- Subjects
CONVOLUTIONAL neural networks ,KNOWLEDGE graphs ,CHINESE language ,PARTS of speech ,SEMANTICS - Abstract
Chinese Clinical Named Entity Recognition (CNER) is a crucial step in extracting medical information and is of great significance in promoting medical informatization. However, CNER poses challenges due to the specificity of clinical terminology, the complexity of Chinese text semantics, and the uncertainty of Chinese entity boundaries. To address these issues, we propose an improved CNER model, which is based on multi-feature fusion and multi-scale local context enhancement. The model simultaneously fuses multi-feature representations of pinyin, radical, Part of Speech (POS), word boundary with BERT deep contextual representations to enhance the semantic representation of text for more effective entity recognition. Furthermore, to address the model's limitation of focusing just on global features, we incorporate Convolutional Neural Networks (CNNs) with various kernel sizes to capture multi-scale local features of the text and enhance the model's comprehension of the text. Finally, we integrate the obtained global and local features, and employ multi-head attention mechanism (MHA) extraction to enhance the model's focus on characters associated with medical entities, hence boosting the model's performance. We obtained 92.74%, and 87.80% F1 scores on the two CNER benchmark datasets, CCKS2017 and CCKS2019, respectively. The results demonstrate that our model outperforms the latest models in CNER, showcasing its outstanding overall performance. It can be seen that the CNER model proposed in this study has an important application value in constructing clinical medical knowledge graph and intelligent Q&A system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Structural and sequential regularities modulate phrase-rate neural tracking.
- Author
-
Zhao, Junyuan, Martin, Andrea E., and Coopmans, Cas W.
- Subjects
- *
ARTIFICIAL satellite tracking , *PARTS of speech , *NATIVE language , *MANDARIN dialects , *HOUSE selling , *ELECTROENCEPHALOGRAPHY - Abstract
Electrophysiological brain activity has been shown to synchronize with the quasi-regular repetition of grammatical phrases in connected speech—so-called phrase-rate neural tracking. Current debate centers around whether this phenomenon is best explained in terms of the syntactic properties of phrases or in terms of syntax-external information, such as the sequential repetition of parts of speech. As these two factors were confounded in previous studies, much of the literature is compatible with both accounts. Here, we used electroencephalography (EEG) to determine if and when the brain is sensitive to both types of information. Twenty native speakers of Mandarin Chinese listened to isochronously presented streams of monosyllabic words, which contained either grammatical two-word phrases (e.g., catch fish, sell house) or non-grammatical word combinations (e.g., full lend, bread far). Within the grammatical conditions, we varied two structural factors: the position of the head of each phrase and the type of attachment. Within the non-grammatical conditions, we varied the consistency with which parts of speech were repeated. Tracking was quantified through evoked power and inter-trial phase coherence, both derived from the frequency-domain representation of EEG responses. As expected, neural tracking at the phrase rate was stronger in grammatical sequences than in non-grammatical sequences without syntactic structure. Moreover, it was modulated by both attachment type and head position, revealing the structure-sensitivity of phrase-rate tracking. We additionally found that the brain tracks the repetition of parts of speech in non-grammatical sequences. These data provide an integrative perspective on the current debate about neural tracking effects, revealing that the brain utilizes regularities computed over multiple levels of linguistic representation in guiding rhythmic computation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. SIFG: an ensemble model for sieving fake news from genuine without metadata by combining syntactic and semantic features.
- Author
-
Alam, Shahid and Khalid, Samina
- Subjects
- *
FAKE news , *METADATA , *VECTOR quantization , *PARTS of speech , *FEATURE selection , *SIEVES , *NATURAL language processing - Abstract
With the emergence of the Internet, social media, and smartphones, almost everyone can now claim that they have news in their pockets. This phenomenon makes people aware of their surroundings and knowledgeable of current events. But, it also presents new challenges, such as automatic detection, machine/bot-generated fake news, limited or no metadata, etc. In this paper, we mitigate some of these challenges by taking a step toward SIeving Fake News from Genuine (SIFG) and presenting a system named SIFG. In SIFG we have used basic and advanced Natural Language Processing techniques to detect online fake news automatically. This makes SIFG independent of the metadata, such as the source, network structure and behavior, temporal propagation, and responses, about the news. We also introduce specific Parts of Speech patterns, utilize Generalized Relevance Learning Vector Quantization for feature selection, and employ ensemble learning to improve the performance of SIFG. When tested with a publicly available online fake news dataset of 10,000 fake and 10,000 genuine news, SIFG shows promising results and achieves an accuracy in the range of 89.85% − 95%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. The role of vision in the acquisition of words: Vocabulary development in blind toddlers.
- Author
-
Campbell, Erin, Casillas, Robyn, and Bergelson, Elika
- Subjects
- *
LANGUAGE acquisition , *TODDLERS development , *CHILDREN'S language , *PARTS of speech , *VISION - Abstract
What is vision's role in driving early word production? To answer this, we assessed parent‐report vocabulary questionnaires administered to congenitally blind children (N = 40, Mean age = 24 months [R: 7–57 months]) and compared the size and contents of their productive vocabulary to those of a large normative sample of sighted children (N = 6574). We found that on average, blind children showed a roughly half‐year vocabulary delay relative to sighted children, amid considerable variability. However, the content of blind and sighted children's vocabulary was statistically indistinguishable in word length, part of speech, semantic category, concreteness, interactiveness, and perceptual modality. At a finer‐grained level, we also found that words' perceptual properties intersect with children's perceptual abilities. Our findings suggest that while an absence of visual input may initially make vocabulary development more difficult, the content of the early productive vocabulary is largely resilient to differences in perceptual access. Research Highlights: Infants and toddlers born blind (with no other diagnoses) show a 7.5 month productive vocabulary delay on average, with wide variability.Across the studied age range (7–57 months), vocabulary delays widened with age.Blind and sighted children's early vocabularies contain similar distributions of word lengths, parts of speech, semantic categories, and perceptual modalities.Blind children (but not sighted children) were more likely to say visual words which could also be experienced through other senses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Metaphor forces argument overtness.
- Author
-
Reinöhl, Uta and Ellison, T. Mark
- Subjects
- *
CONTRAST effect , *ARGUMENT , *PARTS of speech , *NATIVE language , *METAPHOR - Abstract
This paper uncovers how metaphor forces argument overtness – across languages and parts of speech. It addresses the relationship between semantically unsaturated terms, functors, and the argument terms that complete them. When the component terms' default senses clash semantically, a metaphor arises. In such cases, the argument must be overt, in contrast to literal uses. It is possible to say Everyone was waiting at the hotel. Finally, Kim arrived. By contrast, people do not use arrived metaphorically without a goal argument: Everything had been pointing to that conclusion all along. *Finally, Kim arrived. What they say is Finally, Kim arrived at it. We illustrate the phenomenon with powerful and diverse evidence: three corpus studies (Indo-Aryan languages, British English, Vera'a) and a sentence-completion experiment with around 250 native speakers of English. Both the corpus studies and the experiment show no or almost no exceptions to metaphor-driven argument overtness. The strength of the effect contrasts with a complete lack of speaker awareness. We propose that metaphor-driven argument overtness – as well as the lack of speaker consciousness – is a universal phenomenon that can be accounted for in terms of human language processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Alma: Fast Lemmatizer and POS Tagger for Arabic.
- Author
-
Jarrar, Mustafa, Akra, Diyam, and Hammouda, Tymaa
- Subjects
PARTS of speech ,ENCYCLOPEDIAS & dictionaries ,DATABASES ,SPEED ,MORPHOLOGY - Abstract
We introduce Alma (
ﺍ ﺍ ﻠ ى), an open-source and state-of-the-art lemmatizer, POS tagger, and root tagger for Arabic, boasting both high speed and accuracy. Alma relies on a dictionary of morphological solutions ordered by the frequency of these solutions. This dictionary was developed based on the Qabas lexicographic database. Unlike many Arabic lemmatizers that return a lemma after stripping diacritics, shadda, and hamza (i.e., ambiguous lemma), Alma retrieves unambiguous lemmas (we called true lemmatization). Our POS tagger uses a rich tagset of 40 POS tags. Additionally, our root tagger is the first fully-featured tagger since it uses Qabas, the largest Arabic lexicographic database. We evaluated Alma on the LDC Arabic Treebank (ATB) that contains 339,710 tokens and achieved an 88% F1 score. We also evaluated Alma on the Salma corpus (34k tokens) and obtained a 90% F1 score. Compared to Farasa, MADAMIRA, and Camelira lemmatizers and POS taggers, Alma outperformed all of them in both tasks, excelling in both speed and accuracy. Alma demonstrated superior processing speed, handling 339k tokens in 10.00. Alma is open-source and publicly available at (https://sina.birzeit.edu/alma). [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
31. The morpho-sociolinguistic aspect of the morpheme /-ng∼-eng/ in Sesotho.
- Author
-
Matlosa, Litsepiso, 'Matjotjo, 'Masehloho, and Selebeli, Thuso
- Subjects
LINGUISTICS ,PARTS of speech ,SOCIAL dynamics ,PEERS ,SUFFIXES & prefixes (Grammar) - Abstract
This is a morpho-sociolinguistic article that adopts a qualitative approach to investigate the attachment of the morpheme /-ng∼-eng/ to Sesotho parts of speech and to determine the validity of the justifications provided for this type of suffixation. The study employs the integrated frameworks of item-and-process and variationist sociolinguistics. It is a known fact that human language is inherently a culturally evolving system. That is, it is not monolithic, but instead, is dynamic or variable. Among the social dynamics stimulating language variability is age. In our everyday experience, we witness a generational gap between youngsters and adults in terms of communication. Adults tend to be conservative, while the youth are innovative in their language use. Through their peer groups, the youth are able to establish new linguistic norms which may diffuse into the wider community. In Sesotho, one such linguistic norm is the current suffixation of /-ng∼-eng/ to some parts of speech to which traditionally it was not suffixed. The article further discusses the semantic implications of the attachment of this suffix and the reaction of the elderly towards it. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Age-related Chinese word recognition across different AoA and parts of speech.
- Author
-
Wang, Zining, Zhang, Lina, and Xuan, Bin
- Subjects
WORD recognition ,PARTS of speech ,CHINESE language ,OLDER people ,YOUNG adults ,RECOGNITION (Psychology) - Abstract
The cognitive ability of older adults declines with age, while the language processing experience becomes more abundant. Word recognition is one of the most fundamental abilities in language processing, but how the interaction of cognitive aging and language experience affects word recognition in older adults is still a complex issue. This study focuses on the influence of two factors, age of acquisition (AoA) and parts of speech on Chinese word recognition in older adults. In Experiment 1, we employed a context-free lexical decisions task to compare the differences of word recognition between young and older adults at the conditions of different AoAs and parts of speech. In Experiment 2, participants were required to perform a sentence reading task in order to examine the above effects with contextual involvement, and their eye movements were tracked and analyzed. Combining the results of the two experiments, we found a greater AoA effect in older adults compared to young adults. This effect primarily manifested as a decline in processing efficiency for late-acquired words and was already evident in the early stage of word recognition. In addition, only the older adults exhibit differences in the processing of distinct word components, indicating a progressive pattern of increasing difficulty from nouns to verbs and subsequently to adjectives. The findings suggest that older adults experience a heightened exacerbation of word processing difficulties. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Models of conversion in Modern English.
- Author
-
Rubanets, Tetyana, Kiyko, Svitlana, and Kiyko, Yuriy
- Subjects
ENGLISH language ,NATURAL languages ,PARTS of speech ,ENCYCLOPEDIAS & dictionaries ,STRUCTURAL models - Abstract
The article under studies deals with the issue of the conversives structure and semantic characteristics in the models noun – verb, verb – noun, noun – adjective and adjective – noun. It consists of four stages. The first stage regards the main approaches to the phenomenon of conversion in line with system-structural, communicative-functional and cognitive paradigms, as well as elaborates the definitions, used in the work. The purpose of the second stage is to form the research material. By a continuous sample of three academic dictionaries New Webster's Dictionary of the English Language (2009. 5th ed. London: Pearson education), Macmillian English Dictionary (2006. In M. Rundell (ed.), For advanced learners. London: Palgrave Macmillan) and Longman Dictionary of Contemporary English (2009. 5th ed. London: Pearson education), a total amount of 18,263 conversives was written out. To avoid repeating the conversives given in the dictionaries, we have developed a sample, in which every conversion pair occurred once. The total number of the studied conversives is 10,140 tokens, grouped into 5,070 conversion pairs. The third stage highlights the structural and semantic features of conversives in modern English. It describes the peculiarities of parts of speech and semantic transitions, as well as determines the conversives structural models and their modifications. In order to establish regular semantic changes, the conversion semantic models are singled out, quantitative characteristics of each model are established and the most productive transitions from the generative to the derivative are described. At the final stage, the results of the work are summarized and the prospects for further research are outlined. The obtained results will enrich the theory of nomination with the new systematized material of conversives, which are an integral part of natural languages, and the analysis of English language conversives will supplement the theoretical and methodological basis for further study of the phenomenon of conversion in other languages. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Multidimensional Adjectives.
- Author
-
D'Ambrosio, Justin and Hedden, Brian
- Subjects
ADJECTIVES (Grammar) ,PARTS of speech ,SOCIAL choice ,DECISION making ,VAGUENESS (Philosophy) ,PHILOSOPHY - Abstract
Multidimensional adjectives are ubiquitous in natural language. An adjective $F$ F is multidimensional just in case whether $F$ F applies to an object or pair of objects depends on how those objects stand with respect to multiple underlying dimensions of $F$ F -ness. Developing a semantics for multidimensional adjectives requires us to address the problem of dimensional aggregation: how do the application conditions of an adjective $F$ F in its positive and comparative forms depend on its underlying dimensions? Here we develop a semantics for multidimensional adjectives that incorporates aggregation functions. We then explore an analogy between dimensional aggregation and preference aggregation, bringing results from social choice theory to bear on the number and kind of aggregation functions which are admissible in a context. These results suggest that, for any given adjective, there will often be multiple aggregation functions admissible, meaning that multidimensional comparatives are often vague. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Perspective and spatial experience.
- Author
-
Kerr, Alex
- Subjects
SPACE perception ,PARTS of speech ,SOCIAL choice - Abstract
Distant things look smaller, in a sense. Why? I argue that the reason is not that our experiences have a certain subject matter, or are about certain mind-independent things and features. Instead, distant things look smaller because of our way of perceiving them. I go on to offer a hypothesis about which specific way of perceiving explains why distant things look smaller. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Sõnavara ja konstruktsioonide areng eesti keelt teise keelena omandava noore õppija kõnes aasta jooksul
- Author
-
Reili Argus and Tiina Rüütmaa
- Subjects
acquisition of estonian as a second language ,vocabulary ,constructions ,parts of speech ,estonian ,Philology. Linguistics ,P1-1091 ,Finnic. Baltic-Finnic ,PH91-98.5 - Abstract
Artiklis vaadeldakse eesti keelt teise keelena omandavate 8–10-aastaste laste sõnavara ja konstruktsioonide arengut aasta jooksul 2022. ja 2023. aastal läbi viidud pildikirjelduskatse põhjal. Aasta jooksul suurenes õpilaste sõnavara, areng toimus kõikides sõnaliikides suhteliselt ühtlaselt, siiski oli lekseemide hulga kasv kõige suurem omadus- ja asesõnade puhul. Ka konstruktsioonides toimus märkimisväärne areng, lisandus uusi konstruktsioonitüüpe ning need muutusid komplekssemaks. Sõnavara ja konstruktsioonide hulk oli omavahel seotud: mida suurem oli õppija sõnavara 2022. a pildi kirjelduses, seda rohkem eri konstruktsioone oli lisandunud tema 2023. a pildikirjeldusse. Eri sõnaliikide sõnavara ulatus konstruktsioonide arengut ei paista mõjutavat. *** "Development of vocabulary and constructions in the speech of young learners of Estonian as a second language over the course of one year" The article examines the development of vocabulary and grammatical constructions on both phrase and clause level of children aged 8-10 years acquiring Estonian as a second language on the basis of a picture description test carried out in 2022 and 2023. Over the course of the year students’ vocabulary increased, development occurred relatively evenly in all parts of speech, however, the increase in the number of lexemes was the greatest for adjectives and pronouns. The growth of vocabulary was faster for children whose language skills were weaker in 2022. The constructions also developed significantly over the year: new types of constructions were added and they became more complex. In most children, the growth of structures was focused either on the phrase or on the clause level. The correlation between the number of lexemes and the number of constructions was significant: those children whose vocabulary was bigger in 2022 used more constructions in test conducted in 2023.The scope of the vocabulary of the different parts of speech does not seem to affect the development on the level of constructions. Data from the study on the relationship between vocabulary and constructions strongly support the view that syntactic development is driven by lexicon-level development, i.e. sufficient words must be acquired so that different types of constructions can be recognized and used.
- Published
- 2024
- Full Text
- View/download PDF
37. Unveiling factors influencing judgment variation in sentiment analysis with natural language processing and statistics.
- Author
-
Kellert, Olga, Gómez-Rodríguez, Carlos, and Uz Zaman, Mahmud
- Subjects
- *
JUDGMENT (Psychology) , *SENTIMENT analysis , *NATURAL language processing , *PARTS of speech , *HOTEL ratings & rankings , *HOTEL restaurants - Abstract
TripAdvisor reviews and comparable data sources play an important role in many tasks in Natural Language Processing (NLP), providing a data basis for the identification and classification of subjective judgments, such as hotel or restaurant reviews, into positive or negative polarities. This study explores three important factors influencing variation in crowdsourced polarity judgments, focusing on TripAdvisor reviews in Spanish. Three hypotheses are tested: the role of Part Of Speech (POS), the impact of sentiment words such as "tasty", and the influence of neutral words like "ok" on judgment variation. The study's methodology employs one-word titles, demonstrating their efficacy in studying polarity variation of words. Statistical tests on mean equality are performed on word groups of our interest. The results of this study reveal that adjectives in one-word titles tend to result in lower judgment variation compared to other word types or POS. Sentiment words contribute to lower judgment variation as well, emphasizing the significance of sentiment words in research on polarity judgments, and neutral words are associated with higher judgment variation as expected. However, these effects cannot be always reproduced in longer titles, which suggests that longer titles do not represent the best data source for testing the ambiguity of single words due to the influence on word polarity by other words like negation in longer titles. This empirical investigation contributes valuable insights into the factors influencing polarity variation of words, providing a foundation for NLP practitioners that aim to capture and predict polarity judgments in Spanish and for researchers that aim to understand factors influencing judgment variation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Evaluation of adjective and adverb types for effective Twitter sentiment classification.
- Author
-
Ali, Syed Fahad and Masood, Nayyer
- Subjects
- *
MICROBLOGS , *ADJECTIVES (Grammar) , *PARTS of speech , *FEATURE selection , *SENTIMENT analysis , *CLASSIFICATION - Abstract
Twitter, the largest microblogging platform, has reported more than 330 million active users in recent years. Many users express their sentiments about politics, sports, products, personalities, etc. Sentiment analysis has emerged as a specialized branch of machine learning in which tweets are binary-classified to provide sentimental insights. A major step in sentiment classification is feature selection, which primarily revolves around parts of speech (POS). Few techniques merely focused on single features such as adjectives, adverbs, and verbs, while other techniques examined types of these features, such as comparative adjectives, superlative adjectives, or general adverbs. Furthermore, POS as linguistic entities have also been studied and extensively classified by researchers, such as CLAWS-C7. For sentiment analysis, none of the studies conceptualized all possible POS features under similar conditions to draw firm conclusion. This research is centered on the following objectives: 1) examining the impact of various types of adjectives and adverbs that have not been previously explored for sentiment classification; 2) analyzing potential combinations of adjectives and adverbs types 3) conducting a comparison with a benchmark dataset for better classification accuracy. To assess the concept, a renowned human annotated dataset of tweets is investigated. Results showed that classification accuracy for adjectives is improved up to 83% based on the general superlative adjective whereas for adverbs, comparative general adverb also depicted significant accuracy improvement. Their combination with general adjectives and general adverbs also played a substantial role. The unexplored potential of adjectives and adverb types proved better in accuracy against state-of-the-art probabilistic model. In comparison to lexicon-based model, proposed research model overruled the dependency of lexicon-based dictionary where each term first needs to be matched for semantic orientation. The evident outcomes also help in time reduction aspect where huge volume of data need to be processed swiftly. This noteworthy contribution brought up significant knowledge and direction for domain experts. In the future, the proposed technique will be explored for other types of textual data across different domains. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Eye-movement patterns in skilled Arabic readers: effects of specific features of Arabic versus universal factors.
- Author
-
Lahoud, Hend, Eviatar, Zohar, and Kreiner, Hamutal
- Subjects
EYE movements ,UNIVERSAL language ,WORD frequency ,PARTS of speech ,WRITTEN communication ,CURRENT awareness services - Abstract
This study aims to shed light on the contribution of universal versus language specific factors on reading. We examined eye movements of Arabic readers and analyzed effects specific to Arabic such as perceptual complexity, diglossia and morphology, in addition to universal factors such as word length and frequency. Twenty native Arabic speakers read continuous texts in Modern Standard Arabic (MSA) while their eye movements were monitored. A corpus-based analyses was carried to test effects specific to Arabic and effects of the benchmark eye movement factors. We found that perceptually more complex words received longer fixation durations, moreover, differences in processing words unique in MSA versus words shared between MSA and spoken Arabic Vernacular were found. This is the first indication for these effects during an eye movement reading task. However, the effect of morphological length was not significant when included in the model with all predictors. Lastly, the benchmark factors were significant showing effects for word length, word frequency and part of speech. Short and frequent words are processed faster than longer and less frequent words. Function words are often skipped. We conclude that eye movement of Arabic readers reflect proficient reading, yet they also exhibit an on-going challenge in processing the written language. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models.
- Author
-
Zhang, Guangzi, Qian, Yulin, Deng, Juntao, and Cai, Xingquan
- Subjects
PARTS of speech - Abstract
Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model's ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Development and validation of the trauma-specific emotional counting Stroop paradigm for fMRI study.
- Author
-
Hong, Ji Sun, Lee, Dayoung, Han, Doug Hyun, and Sim, Minyoung
- Subjects
- *
PREFRONTAL cortex , *TEMPORAL lobe , *FUNCTIONAL magnetic resonance imaging , *KOREAN language , *WORD frequency , *PARTS of speech - Abstract
The emotional-counting Stroop (ecStroop) is a cognitive task to evaluate emotional information processing. This study aimed to develop a trauma-specific ecStroop protocol for firefighters and assess its validity as a functional magnetic resonance imaging (fMRI) activation paradigm. To develop the ecStroop protocol, trauma-related words for firefighters were selected from previous studies, and general negative and neutral words were matched corresponding to the number of letters and syllables, parts of speech, and frequency in the Korean language. The negative emotional valence of whole words was investigated in 520 healthy participants. To compare brain activation between three categories, 25 healthy individuals underwent fMRI during the ecStroop task. Eight trauma-related words, eight general negative words, and sixteen neutral words were selected by emotional valence scores. The general negative words were related to increased activation in the right inferior and middle temporal gyrus, right medial frontal gyrus, and left superior frontal gyrus compared to the neutral words. When exposed to the trauma-related words, participants' brain activation was increased in the right inferior temporal gyrus, right medial frontal gyrus, left superior temporal gyrus, and left inferior frontal gyrus as compared to when exposed to the neutral words. The fact that all participants in the phase 2 fMRI study were male could limit generalization to all genders. These findings suggest that the ecStroop paradigm successfully activated the brain regions for emotional processing. This paradigm could be valuable in assessing the trauma-specific neural changes in firefighters. • Eight trauma-related words, eight general-negative words, and sixteen neutral words were selected to develop a trauma-specific protocol. • Trauma-related words were associated with the activation within the inferior temporal gyrus and medial frontal gyrus. • General-negative words were also associated with the activation within the inferior temporal gyrus and medial frontal gyrus. • There were no significant differences in brain activation between trauma-related and general negative words in healthy participants. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. SemSyn: Semantic-Syntactic Similarity Based Automatic Machine Translation Evaluation Metric.
- Author
-
Chauhan, Shweta, Kumar, Rahul, Saxena, Shefali, Kaur, Amandeep, and Daniel, Philemon
- Subjects
- *
MACHINE translating , *NATURAL languages , *PARTS of speech , *WORD order (Grammar) , *LANGUAGE & languages - Abstract
Machine translation evaluation is difficult and challenging for natural languages because different languages behave differently for the same dataset. Lexical-based metrics have been poorly represented semantic relationships and impose strict identity matching. However, translation and assessment become difficult for target morphologically rich languages with relatively free word order. Most of the standard evaluation metrics consider word order but do not effectively consider sentence structure. In this paper, we propose a novel machine translation evaluation metric SemSyn which incorporates both semantic and syntactic similarity. We incorporate the term frequency-inverse document frequency with the earth mover's distance and word embedding to cover the semantic similarity. The part of speech and dependency parsing tags assist in covering syntactic similarity in the sentence structure. Part of speech and dependency parsing tags are extracted from universal dependencies and trained on the SpaCy library. Experimental results show that SemSyn has a higher correlation with human judgment than other evaluation metrics for morphologically rich language and other languages. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. NSP-SCD: A corpus construction protocol for child-directed print in understudied languages.
- Author
-
Nag, Sonali, John, Sunila, and Agrawal, Aakash
- Subjects
- *
PSYCHOLINGUISTICS , *CORPORA , *CHILDREN'S language , *LINGUISTICS , *CHILDREN'S books , *PARTS of speech - Abstract
Child-directed print corpora enable systematic psycholinguistic investigations, but this research infrastructure is not available in many understudied languages. Moreover, researchers of understudied languages are dependent on manual tagging because precise automatized parsers are not yet available. One plausible way forward is to limit the intensive work to a small-sized corpus. However, with little systematic enquiry about approaches to corpus construction, it is unclear how robust a small corpus can be made. The current study examines the potential of a non-sequential sampling protocol for small corpus development (NSP-SCD) through a cross-corpora and within-corpus analysis. A corpus comprising 17,584 words was developed by applying the protocol to a larger corpus of 150,595 words from children's books for 3-to-10-year-olds. While the larger corpus will by definition have more instances of unique words and unique orthographic units, still, the selectively sampled small corpus approximated the larger corpus for lexical and orthographic diversity and was equivalent for orthographic representation and word length. Psycholinguistic complexity increased by book level and varied by parts of speech. Finally, in a robustness check of lexical diversity, the non-sequentially sampled small corpus was more efficient compared to a same-sized corpus constructed by simply using all sentences from a few books (402 books vs. seven books). If a small corpus must be used then non-sequential sampling from books stratified by book level makes the corpus statistics better approximate what is found in larger corpora. Overall, the protocol shows promise as a tool to advance the science of child language acquisition in understudied languages. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Absolute gradable adjectives and loose talk.
- Author
-
Dinges, Alexander
- Subjects
SEMANTICS ,LANGUAGE & languages ,ADJECTIVES (Grammar) ,NOMINALS (Grammar) ,PARTS of speech - Abstract
Kennedy (Linguist Philos 30:1–45, 2007) forcefully proposes what is now a widely assumed semantics for absolute gradable adjectives. On this semantics, maximum standard adjectives like "straight" and "dry" ascribe a maximal degree of the underlying quantity. Meanwhile, minimum standard adjectives like "bent" and "wet" merely ascribe a non-zero, non-minimal degree of the underlying quantity. This theory clashes with the ordinary intuition that sentences like "The stick is straight" are frequently true while sentences like "The stick is bent" are frequently informative, and fans of the indicated theory of absolute gradable adjectives appeal to loose talk in response. One goal of this paper is to show that all extant theories of loose talk are inconsistent with this response strategy. Another goal is to offer a revised version of Hoek's (Philos Rev 127:151–196, 2018, in: Proceedings of the 22nd Amsterdam Colloquium, 2019) recent theory of loose talk that accommodates absolute gradable adjectives after all, while being defensible against a range of important concerns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. School Dictionary Entries with a Pedagogical and Linguistic Approach.
- Author
-
Devecioğlu, Miray and Benzer, Ahmet
- Subjects
ENCYCLOPEDIAS & dictionaries ,SPEECH ,PARTS of speech ,SEMANTICS ,RESERVATION systems ,NATURAL language processing - Abstract
Dictionaries are reference books and online databases that contain information about the meaning of a word and its most correct use in speech. The headwords and definitions of dictionaries vary depending on the target users of the dictionary, the purpose of editing, and the type of dictionary. School dictionaries are works that should be prepared by taking into account pedagogical concerns as well as lexicography and linguistics. A dictionary entry contains a headword and its explanations (definition, grammatical function labels, pronunciation, illustrations, etc.), and this unit is called microstructure in lexicography. This study aims to create a list of what kind of structures (e.g. part of speech labels, example sentences, etc.) an entry in the school dictionary may contain, taking into account linguistic and pedagogical concerns, and then to develop an entry scheme for Turkish school dictionaries based on this list. For this purpose, in our qualitative research, the descriptive method, which is frequently preferred in lexicography studies that determine the current situation in dictionaries, was preferred. The linguistic and pedagogical features included in a school dictionary entry were obtained through content analysis from 3 different languages (Turkish, English and French) and 3 different school dictionaries (Türk Dil Kurumu [TDK], Cambridge and Le Robert) selected as research object. As a result of the study, it was observed that the school dictionary entry contained 6 different structures in the headword part and 21 different structures in the definition part. There are 1 pedagogical and 5 linguistic structures in the headword. There are 10 pedagogical and 11 linguistic structures in the definition. According to the data obtained, the pedagogical structures in school dictionaries were mostly used to make the use of the headword more concrete. For example, while indicating the part of speech functions of a word is a linguistic feature, it is pedagogical to give an example usage that clearly indicates the grammatical function of the word. As a result of this study, it was suggested that Turkish school dictionaries include structures such as illustrations and example sentences that will concretize the headword. Unlike English and French, Turkish has a flexible system in which words can have various grammatical functions, that is, the grammatical function of a word with the same morphological appearance (a word that does not have a special suffix for adverbs or adjectives) changes depending on the context, the importance of including an example sentence indicating each grammatical function of the word has been emphasized. As a result of the research, an ideal school dictionary entry scheme was proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. ІНФОРМАЦІЙНА ТЕХНОЛОГІЯ РОЗПІЗНАВАННЯ ПРОПАГАНДИ, ФЕЙКІВ ТА ДЕЗІНФОРМАЦІЇ У ТЕКСТОВОМУ КОНТЕНТІ НА ОСНОВІ МЕТОДІВ NLP ТА МАШИННОГО НАВЧАННЯ.
- Author
-
В. А., Висоцька
- Subjects
MACHINE learning ,ARTIFICIAL intelligence ,WORD recognition ,PARTS of speech ,DISINFORMATION - Abstract
Context. The research is aimed at the application of artificial intelligence for the development and improvement of means of cyber warfare, in particular for combating disinformation, fakes and propaganda in the Internet space, identifying sources of disinformation and inauthentic behavior (bots) of coordinated groups. The implementation of the project will contribute to solving the important and currently relevant issue of information manipulation in the media, because in order to effectively fight against distortion and disinformation, it is necessary to obtain an effective tool for recognizing these phenomena in textual data in order to develop a further strategy to prevent the spread of such data. Objective of the study is to develop or automatic recognition of political propaganda in textual data, which is built on the basis of machine learning with a teacher and implemented using natural language processing methods. Method. Recognition of the presence of propaganda will occur at two levels: at the general level, that is, at the level of the document, and at the level of individual sentences. To implement the project, such feature construction methods as the TF-IDF statistical indicator, the “Bag of Words” vectorization model, the marking of parts of speech, the word2vec model for obtaining vector representations of words, as well as the recognition of trigger words (reinforcing words, absolute pronouns and “shiny” words). Logistic regression was used as the main modeling algorithm. Results. Machine learning models have been developed to recognize propaganda, fakes and disinformation at the document (article) and sentence level. Both model scores are satisfactory, but the model for document-level propaganda recognition performed almost 1.2 times better (by 20%). Conclusions. The created model shows excellent results in recognizing propaganda, fakes and disinformation in textual content based on NLP and machine learning methods. The analysis of the raw data showed that the propaganda recognition model at the document (article) level was able to correctly classify 6097 non-propaganda articles and 694 propaganda articles. 123 propaganda articles and 285 non-propaganda articles were misclassified. The obtained estimate of the model: 0.9433254618697041. The sentence-level propaganda recognition model successfully classified 205 propaganda articles and 1917 non-propaganda articles. The model score is: 0.7437784787942516 (but 731 articles were incorrectly classified). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Exploring part of speech (pos) tag sequences in a large-scale learner corpus of L2 English: a developmental perspective.
- Author
-
Lim, Joyce Dong Ok, Mark, Geraldine, Pérez-Paredes, Pascual, and O'Keeffe, Anne
- Subjects
PARTS of speech ,ENGLISH language ,LANGUAGE ability ,PHILOSOPHY of language ,CORPORA ,A priori ,SECOND language acquisition - Abstract
This research explores the pos tag sequences that shape the transition from upper intermediate (B2 cefr) to near-native proficiency (C2 cefr) in a corpus of essays (n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that pos tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot pos-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar (Francis et al., 1996, 1998; and Hunston and Francis, 2000). Findings point to the presence of both core and emergent pos-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence in our understanding of the development of L2 writing in efl contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Investigating Natural Language Techniques for Accurate Noun and Verb Extraction.
- Author
-
Nair, Reshma P and Thushara, M G
- Subjects
NATURAL languages ,VERB phrases ,VERBS ,NOUNS ,PARTS of speech ,NATURAL language processing - Abstract
Natural language processing (NLP) has witnessed significant advancements in recent decades. Automatically classifying parts of speech, like nouns and verbs, from textual input has transformed text analysis and language understanding. Using natural language processing techniques, we explore various methods for identifying noun and verb phrases automatically, with an emphasis on high accuracy. Our study explores rule-based, statistical, and Machine Learning (ML) approaches for determining the nouns and verbs from sentences. The effectiveness of these approaches is clearly evident, especially when NLP libraries such as SpaCy and the Natural Language Toolkit (NLTK) are used. As well as demonstrating their potential applications across diverse language processing tasks and industries, we conduct comparative research to showcase their advantages and disadvantages. The performance of these methods is also examined in terms of retrieving subject and action terms. SpaCy achieves an impressive accuracy of 95% in noun and verb extraction, while Part-Of-Speech (POS) technology tagging delivers an even higher accuracy of 96%. The results obtained with these methods illustrate how nouns, verbs, and names can be classified in text successfully. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Deep Learning based Part-of-Speech tagging for Assamese using RNN and GRU.
- Author
-
Talukdar, Kuwali and Sarma, Shikhar Kumar
- Subjects
NATURAL language processing ,DEEP learning ,RECURRENT neural networks ,PARTS of speech ,INDO-European languages ,RESEARCH personnel - Abstract
Deep Learning (DL) techniques have been widely used in different Natural Language Processing (NLP) tasks. Parts of Speech (PoS) tagging is one where a wide variety of DL techniques have been experimented with across the languages. Here in the present work, Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU) based Parts of Speech taggers have been trained and modelled for Assamese, an Indo Aryan family language. Universal Parts of Speech (UPoS) tag set of 17 tags were used for the experiment. A dataset of 30000 sequences has been used for the work, which is originally a BIS tag set tagged dataset, and customized through conversion from BIS tagged sequences to UPoS tagged sequences. RNN and GRU based systems have been configured using tensorflow platform and the performance measurement was done through accuracy, precision, recall and F1 scores. The accuracy of the RNN based system has been found to be 93.78%. Precision of 94.75 and recall of 93.28 were recorded for the RNN model. Accuracy of 94.38%, precision of 95.44 and recall of 93.7 were recorded for the GRU model. RNN and GRU models respectively yield F1 scores of 94.01 and 94.56. Although PoS tagging with other tag sets like BIS have been attempted by other researchers, UPoS tagging using DL approaches for Assamese is attempted for the first time. And this baseline work with observed accuracies of 93.78 and 94.38 for RNN and GRU respectively, shall serve as reference models for further works. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Syntactic Category based Assamese Question Pattern Extraction using N-grams.
- Author
-
Chakraborty, Rita, Deka, Manisha, and Sarma, Shikhar Kr.
- Subjects
PARTS of speech ,NATURAL language processing - Abstract
Syntactic categories are important because they play a major role in analysis of sentences. Assamese is a versatile language added with diversified linguistic features. Analysis and manipulation of such a rich language in terms of its vocabulary is really challenging. It requires incorporation of syntactic level annotations like Parts of Speech (PoS) tags into the text for the purpose of efficient and effective processing for different Natural Language Processing (NLP) tasks. Our research mainly focuses on extraction of patterns of syntactic categories in Assamese question sentences. We have also worked on extracting the Bi-grams and Tri-grams of these patterns so that analysis and processing can be done on more basic level. In this paper, we have tried to focus on extraction of PoS tag based Assamese question patterns. We also have tried to generate the Bi-grams and Tri-grams with their frequencies of occurrences throughout the patterns. The work is a noval attempt in modeling question sentence patterns with the embedded PoS patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.