Descriptor: "WORD order (Grammar)" / Publisher: elsevier b.v. - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"WORD order (Grammar)"' showing total 113 results

Start Over Descriptor "WORD order (Grammar)" Publisher elsevier b.v.

113 results on '"WORD order (Grammar)"'

1. WETM: A word embedding-based topic model with modified collapsed Gibbs sampling for short text.

Author: Rashid, Junaid, Kim, Jungeun, Hussain, Amir, and Naseem, Usman
Subjects: *GIBBS sampling, *WORD order (Grammar), *DATA mining, *VOCABULARY
Abstract: • A word embedding-based topic model (WETM) for short text documents. • Sparsity problem removed in short text and discovered structural information for topics and words. • A modified collapsed Gibbs sampling algorithm to find the parameters for WETM. • WETM achieved better classification, topic coherence, topic quality, and clustering results. • The execution time is lower for WETM as compared to baseline topic models. Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Neural dynamics underlying the processing of implicit form-meaning connections: The dissociative roles of theta and alpha oscillations.

Author: Duan, Jipeng, Ouyang, Hui, Lu, Yang, Li, Lin, Liu, Yuting, Feng, Zhengning, Zhang, Weidong, and Zheng, Li
Subjects: *OSCILLATIONS, *IMPLICIT learning, *TIME-frequency analysis, *WORD order (Grammar), *LANGUAGE acquisition
Abstract: Implicit learning plays an important role in the language acquisition. In addition to helping people acquire the form-level rules (e.g., the word order regularities), implicit learning can also facilitate the acquisition of word meanings (i.e., the establishment of connections between the word form and its meanings). Although some behavioral studies have explored the processing of implicit form-meaning connections, the neural dynamics underlying this processing remains unclear. Through examining whether participants could implicitly acquire the literal and metaphorical meanings of novel words, and applying the time-frequency analysis on the electroencephalogram (EEG) data collected in the testing phase, the neural oscillations corresponding to the processing of implicit form-literal and form-metaphorical meaning connections were explored. The results showed that participants in the experimental group could implicitly acquire the form-literal and form-metaphorical meaning connections after training, while participants in the control group who were not trained did not have access to such form-meaning connections. Meanwhile, during the processing of form-literal meaning connections, the greater suppression of alpha oscillations was induced by the testing items that follow the same rules as the training items (i.e., the regular testing items) in the experimental group, whereas the stronger enhancement of theta oscillations was elicited by the regular testing items in the experimental group during the processing of form-metaphorical meaning connections. Our study provides insights for understanding the processing of implicit form-literal and form-metaphorical meaning connections and the neural dynamics underlying the processing. • Participants can implicitly learn the literal and metaphorical meanings of novel characters. • The suppression of alpha oscillations is associated with the processing of implicit form-literal meaning connections. • The enhancement of theta oscillations is linked to the processing of implicit form-metaphorical meaning connections. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Intelligent Part of Speech tagger for Hindi.

Author: Dutta, Devashish, Halder, Subhanu, and Gayen, Tirthankar
Subjects: PARTS of speech, WORD order (Grammar), HINDI language, PREPOSITIONS, ENGLISH language
Abstract: English Part of Speech like noun, verb, adverb, adjective, pronoun, preposition, interjection, conjunction is somewhat similar in Hindi but not exactly the same. Hindi grammar has different Part of Speech (POS) based on its morphological features and the occurrence of a word/lexeme in a sentence. The existing techniques used in English language for POS tagging may not work properly for Indian language like Hindi. It is because the grammatical structure of the relatively free word order language like Hindi differs from English. Stochastic taggers may not give good performance as morphological information is not taken into account. The available Hindi word corpora usually have less frequency for individual tags. As a result, a larger size corpus having diversity in the type of sentences can provide better results. But, even after using smoothing techniques most these taggers fail to provide correct results in the presence of unknown words. Considering these aspects, this paper proposes an Intelligent POS tagger for Hindi language based on VITERBI and K-Nearest Neighbour, capable of providing more accurate results than VITERBI in the presence of unknown words. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Transfer Learning Based Neural Machine Translation of English-Khasi on Low-Resource Settings.

Author: Hujon, Aiusha V, Singh, Thoudam Doren, and Amitab, Khwairakpam
Subjects: MACHINE translating, WORD order (Grammar), EVALUATION methodology, INSTRUCTIONAL systems
Abstract: Machine translation for low-resource language can be improved using various techniques. One such technique is the application of knowledge learned by training a model with high-resource language pair to another model with a low-resource language pair. The paper discusses the experiments and improvement of the results of neural machine translation using transfer learning for the English-Khasi language pair. Long short-term memory is used as the backbone architecture for the transfer learning model. The essential technique is the shared vocabulary, constructed utilizing the subword unit of byte pair encoding of the two pairs of languages and the subword unit of byte pair encoded datasets. Analysis and evaluation of the experimental output using human subjective evaluation, statistical evaluation, and automatic evaluation show positive results for the transfer learning system. A thorough analysis of word order agreement and comparisons of the outputs between the baseline and the transfer learning system is made. The analysis and evaluation methods portray that neural machine translation using transfer learning improves the translation accuracy for Khasi, a low-resource language. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. The impact of directionality on interpreters' syntactic processing: Insights from syntactic dependency relation measures.

Author: Xu, Han and Liu, Kanglong
Subjects: *WORD order (Grammar), *SPEECH, *TRANSLATORS, *CORPORA, *LANGUAGE & languages
Abstract: • Dependency measures were used to study the impact of interpreting directionality. • L2 (English) to L1 (Chinese) interpreting shows simplification, but not vice versa. • Language pair factors seem to outweigh directionality in word order adjustment. • Preparation mitigates the impact of directionality on interpreters' performance. This study investigates the impact of interpreting direction on interpreters' syntactic processing strategies, utilizing a bidirectional parallel corpus from UN Security Council meetings of Chinese-English simultaneous interpretations and their original speeches. Two syntactic measures of dependency, namely, dependency distance and dependency direction, are used to examine the syntactic complexity and typological characteristics of interpreted speech in comparison to that of non-interpreted speech in the target language, to reflect how interpreters process sentences. The study showed that when interpreters worked from L2 to L1, they employed less complex syntactic structures, indicating a tendency towards simplification, while such a pattern was not observed in the opposite direction. Additionally, interpreters were found to adjust the word order of interpreted speech in both directions to produce an idiomatic rendition. These findings suggest that when professional interpreters prepare adequately, the constraint of directionality on their cognitive capability appears to be limited. Language pair-related factors, including the influence of the source language and the normative requirement to comply with target language conventions, tended to have a greater impact on how they processed sentences in both directions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. A medical report generation method integrating teacher–student model and encoder–decoder network.

Author: Zhang, Shujun, Han, Qi, Li, Jinsong, Sun, Yukang, and Qin, Yuhua
Subjects: TRANSFORMER models, WORD order (Grammar), PHYSIOLOGY education, VIDEO coding, RADIOLOGISTS
Abstract: The automatic medical report generation task can reduce the burden of radiologists and improve the intelligence of auxiliary diagnosis, but still faces the following challenges: (1) The small lesions are easily overlooked, leading to loss of crucial information in the report and low accuracy; (2) The generated long text reports often suffer from jumbled word order and sentence order, resulting in poor fluency. Through simulation of the cognitive principle of professional physicians during their training and work, this paper put forward a medical report generation method integrating a teacher–student model with an encoder–decoder network. The core idea is to propose a cross-modal teacher (text)-student (image) model, adopting different supervision methods for different stages of report generation to improve the model's learning performance. A semantic space alignment mechanism is designed to enhance the cross-modal feature matching ability by contrasting the encoding methods of different modalities through adversarial learning, gradually optimizing and capturing the critical information. A layer-supervised decoder based on the Transformer hierarchical structure is proposed with the teacher model guiding the student model to decode layer by layer to increase the fluency of report generation. Comparative experiments are conducted on IU-X-ray and MIMIC-CXR datasets with various other methods, and the results show that the proposed method can effectively improve the quality of generated reports. [Display omitted] • A teacher–student model is built to adopt various supervisions at different stages. • Adversarial learning is utilized in encoding to align semantic spaces. • Layer supervision mechanism is used in decoding to enhance text fluency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. An accurate transformer-based model for transition-based dependency parsing of free word order languages.

Author: Zuhra, Fatima Tuz, Saleem, Khalid, and Naz, Surayya
Subjects: NATURAL language processing, LANGUAGE models, WORD order (Grammar), TRANSFORMER models, URDU language
Abstract: Transformer models are the state-of-the-art in Natural Language Processing (NLP) and the core of the Large Language Models (LLMs). We propose a transformer-based model for transition-based dependency parsing of free word order languages. We have performed experiments on five treebanks from the Universal Dependencies (UD) dataset version 2.12. Our experiments show that a transformer model, trained with the dynamic word embeddings performs better than a multilayer perceptron trained on the state-of-the-art static word embeddings even if the dynamic word embeddings have a vocabulary size ten times smaller than the static word embeddings. The results show that the transformer trained on dynamic word embeddings achieves an unlabeled attachment score (UAS) of 84.17% for Urdu language which is ≈ 3. 6 % and ≈ 1. 9 % higher than the UAS scores of 80.56857% and 82.26859% achieved by the multilayer perceptron (MLP) using two static state-of-the-art word embeddings. The proposed approach is investigated for Arabic, Persian and Uyghur languages, in addition to Urdu, for UAS scores and the results suggest that the proposed solution outperform the MLP-based approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Survey on dialogue systems including slavic languages.

Author: Wołk, Krzysztof, Wołk, Agnieszka, Wnuk, Dominika, Grześ, Tomasz, and Skubis, Ida
Subjects: *SLAVIC languages, *WORD order (Grammar), *INFLECTION (Grammar), *VERBS, *ADJECTIVES (Grammar)
Abstract: Slavic languages pose a challenge to the researchers in the domain of dialogue technology. A relatively free word order with a large degree of inflection, such as conjugation of verbs, and declension of adjectives, pronouns, and nouns are exhibited by the Slavic languages, which has a significant impact on the size of lexical inventories that significantly complicate the design of dialogue systems. This article conducts an empirical study on the state-of-the-art dialogue systems within Slavic languages. Moreover, we review the existing models in recent dialogue systems, pinpoint the current main challenges and identify potential research directions of practical and intelligent systems within low-resourced languages. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Graph transformer networks based text representation.

Author: Mei, Xin, Cai, Xiaoyan, Yang, Libin, and Wang, Nanxin
Subjects: *PROBLEM solving, *WORD order (Grammar), *MACHINE translating, *NATURAL language processing
Abstract: Graph Neural Networks (GNN) has been used to exploit global features in text representation learning for natural language processing (NLP) tasks, including text classification, sequence tagging, neural machine translation and relational reasoning. However, GNN based models usually build a graph for the entire corpus, they have high memory consumption, ignoring the order of words and containing test documents in the training graph. Thus, these models are inherently transductive and have difficulties in inductive learning. In order to solve the above problems, we propose a Graph Transformer Networks based Text representation (GTNT) model. It first constructs a degree-centric text graph, which generates a text graph for each document in the corpus. Then it adopts a graph transformer network to model the graph to obtain node embeddings. When we apply our proposed GTNT model to citation recommendation and text classification tasks, the experimental results show that our model outperforms other state-of-the-art models. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. Unsupervised multi-sense language models for natural language processing tasks.

Author: Roh, Jihyeon, Park, Sungjin, Kim, Bo-Kyeong, Oh, Sang-Hoon, and Lee, Soo-Young
Subjects: *NATURAL languages, *NATURAL language processing, *POLYSEMY, *WORD order (Grammar), *SEMANTICS
Abstract: Existing language models (LMs) represent each word with only a single representation, which is unsuitable for processing words with multiple meanings. This issue has often been compounded by the lack of availability of large-scale data annotated with word meanings. In this paper, we propose a sense-aware framework that can process multi-sense word information without relying on annotated data. In contrast to the existing multi-sense representation models, which handle information in a restricted context, our framework provides context representations encoded without ignoring word order information or long-term dependency. The proposed framework consists of a context representation stage to encode the variable-size context, a sense-labeling stage that involves unsupervised clustering to infer a probable sense for a word in each context, and a multi-sense LM (MSLM) learning stage to learn the multi-sense representations. Particularly for the evaluation of MSLMs with different vocabulary sizes, we propose a new metric, i.e., unigram-normalized perplexity (PPLu), which is also understood as the negated mutual information between a word and its context information. Additionally, there is a theoretical verification of PPLu on the change of vocabulary size. Also, we adopt a method of estimating the number of senses, which does not require further hyperparameter search for an LM performance. For the LMs in our framework, both unidirectional and bidirectional architectures based on long short-term memory (LSTM) and Transformers are adopted. We conduct comprehensive experiments on three language modeling datasets to perform quantitative and qualitative comparisons of various LMs. Our MSLM outperforms single-sense LMs (SSLMs) with the same network architecture and parameters. It also shows better performance on several downstream natural language processing tasks in the General Language Understanding Evaluation (GLUE) and SuperGLUE benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

11. Learning word order: early beginnings.

Author: de la Cruz-Pavía, Irene, Marino, Caterina, and Gervain, Judit
Subjects: *WORD order (Grammar), *ADULTS, *LANGUAGE acquisition, *NATIVE language, *LEARNING, *GRAMMAR, *PROSODIC analysis (Linguistics)
Abstract: We examine the beginning of the acquisition of the relative order of function and content words, a fundamental but cross-linguistically highly variable aspect of grammar. A review of the existing empirical literature shows that infants as young as 8 months of age can distinguish between functors and content words, and have a rudimentary knowledge of the order of these two universal lexical categories in their native language. Furthermore, human adults and non-human animals such as rodents process the same linguistic information differently from infants, emphasizing the developmental relevance of bootstrapping function/content word order from surface cues available in the input. We discuss the implications of these findings for a synergistic view of language acquisition, considering how grammar acquisition interacts with word learning. By 7–8 months of age, infants already have a rudimentary but abstract representation of word order, specifically of the relative order of functors and content words in their native language. Moreover, by 8–17 months of age infants are sensitive to all characteristics that universally distinguish between functors and content words, namely their differing prosodic, statistical, distributional, and functional properties. The acquisition of grammar is thus intimately intertwined with lexical acquisition, supporting a model of language growth in which the different levels of language develop synergistically, bootstrapping learning interactively. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

12. Is word order considered by foundation models? A comparative task-oriented analysis.

Author: Zhao, Qinghua, Li, Jiaang, Liu, Junfeng, Kang, Zhongfeng, and Zhou, Zenghui
Subjects: *WORD order (Grammar), *LANGUAGE models, *CHATGPT, *COMPARATIVE studies
Abstract: Word order, a linguistic concept essential for conveying accurate meaning, is seemingly not that necessary in language models based on the existing works. Contrary to this prevailing notion, our paper delves into the impacts of word order by employing carefully selected tasks that demand distinct abilities. Using three large language model families (ChatGPT, Claude, LLaMA), three controllable word order perturbation strategies, one novel perturbation qualification metric, four well-chosen tasks, and three languages, we conduct experiments to shed light on this topic. Empirical findings demonstrate that Foundation models take word order into consideration during generation. Moreover, tasks emphasizing reasoning abilities exhibit a greater reliance on word order compared to those primarily based on world knowledge. • Word order is reexamined by 4 tasks, 3 strategies, 3 languages and 5 models. • The tested datasets includes TruthfulQA, MGSM, XWinoGrande and WiQueen. • The word order perturbation strategies include Random , Rotate and Adjacent. • Both English, Chinese, and French dataset are tested on ChatGPT, Claude and LLaMA. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Measuring text similarity based on structure and word embedding.

Author: Farouk, Mamdouh
Subjects: *SENTENCES (Grammar), *NATURAL language processing, *WORD order (Grammar), *COGNITIVE science, *NATURAL languages, *SPEECH processing systems
Abstract: The problem of finding the similarity between natural language sentences is crucial for many applications in Natural Language Processing (NLP). An accurate calculation of similarity between sentences is highly needed. Many approaches depend on word-to-word similarity to measure sentence similarity. This paper proposes a new approach to improve the accuracy of the sentence similarity calculation. The proposed approach combines different similarity measures in the calculation of sentence similarity. In addition to traditional word-to-word similarity measure, the proposed approach exploits sentence semantic structure. Discourse representation structure (DRS) which is a semantic representation for natural sentences is generated and used to calculated structure similarity. Furthermore, word order similarity is measured to consider the order of words in sentences. Experiments show that exploiting structural information achieves good results. Moreover, the proposed method outperforms the current approaches on a standard benchmark dataset achieving 0.8813 Pearson correlation with human similarity. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

14. Chinese Sentence Decomposition Based on Hierarchical Word Order.

Author: Pi, Qian Dong, Shao, Yu Bin, Long, Hua, and Yang, Chen Ju
Subjects: WORD order (Grammar), PARSING (Grammar), HIERARCHICAL Bayes model, VERBS, CHINESE language, SENTENCES (Grammar)
Abstract: At present, the field of parsing is occupied by statistical-based methods, and a large number of parsing methods used in English are applied to Chinese. Because of the difference between Chinese and English, it exposes a lot of shortcomings. In dependency parsing, there are many verbs in Chinese sentences, but dependency parsing only uses one core verb, which is not in line with the characteristics of Chinese. Based on the knowledge of Chinese grammar, statistics, and the hierarchical analysis theory of Chinese dichotomy of Zhu Dexi and Lu Jianming, this paper proposes a hierarchical word order-based decomposition of Chinese simple sentences. This method makes up for the deficiency of dependency parsing, and deeply combines the theory of Chinese grammar to decompose Chinese sentences hierarchically in order to extract sentences provides a better basis. Experiments were carried out on the People's Daily corpus labeled manually by Peking University from January to June 1998. The correct decomposition rate reached 46.692%. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

15. A Natural Language Sentence Analysis Algorithm Based on Word Order Modifier Syntax Rules.

Author: Jia, Ji Kang, Shao, Yu Bin, Long, Hua, and Du, Qing Zhi
Subjects: WORD order (Grammar), NATURAL languages, INFORMATION organization, ALGORITHMS, VISUALIZATION, MATHEMATICAL decomposition
Abstract: This thesis discusses the decomposition of natural language sentences based on the rules of order-modified syntax. By making a comprehensive use of the combination of part-of-speech tagging, syntactic structure and sentence organization information, we propose to apply the custom rules to achieve the decomposition of sentences. The visualization of the decomposition results under a certain condition of part-of-speech tagging and word order is realized, which provides a new idea for the development of syntactic analysis structure. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

16. A Chinese Text Synthesis Method Based on Word Order Rules.

Author: Zhang, Hai Ling, Shao, Yu Bin, Long, Hua, and Du, Qingzhi
Subjects: WORD order (Grammar), PARLIAMENTARY practice, NATURAL language processing, CHINESE language
Abstract: Chinese language processing is a basic subject in Natural Language Processing (NLP). The analysis of Chinese sentences is a difficult point and a research focus in this field. We propose a method for Chinese text analysis and synthesis based on the word part-of-speech(POS) and the priority of the word order rules. Experiment results show that the proposed method is effective for language synthesis. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

17. English motion and progressive constructions, and the typological drift from bounded to unbounded discourse construal.

Author: Fanego, Teresa
Subjects: *ENGLISH language, *GERMANIC languages, *LANGUAGE contact, *WORD order (Grammar), *FIFTEENTH century
Abstract: Recent psycholinguistic studies have revealed an important distinction in narrative discourse between bounded and unbounded language use. Bounded language use is typical of Germanic languages other than English and involves the holistic presentation of situations, with clauses construed as self-contained units attaining a point of completion. Unbounded language use, in turn, groups events into larger complexes of roughly simultaneous events, each event of which is still open when the next one begins. This contrast between English and the other Germanic languages has been accounted for by the claim that English began its history as a bounded language, but shifted to unbounded following the decline, from the fifteenth century onwards, of the Verb-second (V2) constraint on word order. According to this hypothesis, the loss of V2 made possible the grammaticalization of the be progressive, a device that encourages unboundedness. The present article expands on this line of research and examines seven constructions which developed at around the same time and which together are taking English in the direction of unbounded construal; it is argued that the drift in English from a bounded to an unbounded system may have been instigated by the contact situation between Old English speakers and Old Norse speakers in the Danelaw area. • English has shifted from a bounded to an unbounded system of discourse construal. • The shift from bounded to unbounded can be interpreted as a typological drift. • The change was instigated by language contact in the Danelaw area. • Motion and progressive constructions are devices encouraging unboundedness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages.

Author: Mahowald, Kyle, Diachek, Evgeniia, Gibson, Edward, Fedorenko, Evelina, and Futrell, Richard
Subjects: *NATURAL languages, *WORD order (Grammar), *DOG walking, *ENGLISH language, *LANGUAGE & languages, *SEMANTICS, *DOGS
Abstract: Grammatical cues are sometimes redundant with word meanings in natural language. For instance, English word order rules constrain the word order of a sentence like "The dog chewed the bone" even though the status of "dog" as subject and "bone" as object can be inferred from world knowledge and plausibility. Quantifying how often this redundancy occurs, and how the level of redundancy varies across typologically diverse languages, can shed light on the function and evolution of grammar. To that end, we performed a behavioral experiment in English and Russian and a cross-linguistic computational analysis measuring the redundancy of grammatical cues in transitive clauses extracted from corpus text. English and Russian speakers (n = 484) were presented with subjects, verbs, and objects (in random order and with morphological markings removed) extracted from naturally occurring sentences and were asked to identify which noun is the subject of the action. Accuracy was high in both languages (∼89% in English, ∼87% in Russian). Next, we trained a neural network machine classifier on a similar task: predicting which nominal in a subject-verb-object triad is the subject. Across 30 languages from eight language families, performance was consistently high: a median accuracy of 87%, comparable to the accuracy observed in the human experiments. The conclusion is that grammatical cues such as word order are necessary to convey subjecthood and objecthood in a minority of naturally occurring transitive clauses; nevertheless, they can (a) provide an important source of redundancy and (b) are crucial for conveying intended meaning that cannot be inferred from the words alone, including descriptions of human interactions, where roles are often reversible (e.g., Ray helped Lu/Lu helped Ray), and expressing non-prototypical meanings (e.g., "The bone chewed the dog."). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Phrase2Vec: Phrase embedding based on parsing.

Author: Wu, Yongliang, Zhao, Shuliang, and Li, Wenbin
Subjects: *NATURAL language processing, *TERMS & phrases, *WORD order (Grammar), *EMBEDDINGS (Mathematics)
Abstract: • BOP (bag of phrases) is a better text representation than BOW (bag of words). • Phrase2Vec can be solved through two stages: phrase mining and phrase embedding. • Hierarchical phrases in sentences can be mined by traversing the parse tree. • Phrase2Vec can effectively solve the problem of phrase embedding. • The BOP can improve the performance of downstream natural language processing tasks, e.g. text categorization, text clustering. Text is one of the most common unstructured data, and usually, the most primary task in text mining is to transfer the text into a structured representation. However, the existing text representation models split the complete semantic unit and neglect the order of words, finally lead to understanding bias. In this paper, we propose a novel phrase-based text representation method that takes into account the integrity of semantic units and utilizes vectors to represent the similarity relationship between texts. First, we propose HPMBP (Hierarchical Phrase Mining Based on Parsing) which mines hierarchical phrases by parsing and uses BOP (Bag Of Phrases) to represent text. Then, we put forward three phrase embedding models, called Phrase2Vec, including Skip-Phrase, CBOP (Continuous Bag Of Phrases), and GloVeFP (Global Vectors For Phrase Representation). They learn the phrase vector with semantic similarity, further obtain the vector representation of the text. Based on Phrase2Vec, we propose PETC (Phrase Embedding based Text Classification) and PETCLU (Phrase Embedding based Text Clustering). PETC utilizes the phrase embedding to get the text vector, which is fed to a neural network for text classification. PETCLU gets the vectorization expression of text and cluster center by Phrase2Vec, furthermore extends the K-means model for text clustering. To the best of our knowledge, it is the first work that focuses on the phrase-based English text representation. Experiments show that the introduced Phrase2Vec outperforms state-of-the-art phrase embedding models in the similarity task and the analogical reasoning task on Enwiki, DBLP, and Yelp dataset. PETC is superior to the baseline text classification methods in the F1-value index by about 4%. PETCLU is also ahead of the prevalent text clustering methods in entropy and purity indicators. In summary, Phrase2Vec is a promising approach to text mining. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

20. A systematic review of the word sentence association paradigm (WSAP).

Author: Gonsalves, Meghan, Whittles, Randy L., Weisberg, Risa B., and Beard, Courtney
Subjects: *META-analysis, *WORD order (Grammar), *STATISTICAL reliability, *CRIMINAL sentencing
Abstract: Background and Objectives: The Word Sentence Association Paradigm (WSAP) was originally designed to assess and modify interpretive biases (IB) in socially anxious individuals. Researchers have since modified the WSAP for use across various populations. Despite its widespread use, no studies have systematically reviewed the WSAP to determine its validity and reliability.Methods: We review variations to the WSAP, populations in which the WSAP has been used, reliability data, and effect sizes across 41 studies published between 2008 and March 2018.Results: Results indicate that the WSAP has been utilized to target 18 disorders and symptoms in adults and children. Modifications include stimulus content, timing parameters, and presentation order of word and sentence pairs. Reported internal consistency and test-retest reliability suggest good to excellent reliability. Medium to large effect sizes were reported when comparing control samples to those with psychopathology and in pre-post comparisons of the modification version of the WSAP.Limitations: Studies varied regarding which indices of the WSAP were presented and specific task parameters used, making it challenging to compare effects.Conclusions: The WSAP is a reliable and valid instrument for assessing and modifying interpretive biases with unique characteristics compared with other IB assessment and modification tasks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

21. Out of vocabulary word detection and recovery in Arabic handwritten text recognition.

Author: Jemni, Sana Khamekhem, Kessentini, Yousri, and Kanoun, Slim
Subjects: *TEXT recognition, *HANDWRITING recognition (Computer science), *VOCABULARY, *WORD order (Grammar), *LONG-term memory, *SHORT-term memory
Abstract: • A novel two-step OOV words detection and recovery method is proposed. • The proposed method is generic and independent of the recognition engine. • The proposed method uses various sub-lexical modeling to improve the detection step. • The recovery process relies on dynamic lexicons built from large text corpora. • The proposed method significantly improves the recognition results. Today's Arabic Handwriting recognition systems are able to recognize arbitrary words over a large but finite vocabulary. Systems operating with a fixed vocabulary are bound to encounter so-called out-of-vocabulary (OOV) words. The aim of this research is to propose a two-step approach that tackles the problem of OOV words in Arabic handwriting. In the first step, we exploit different types of sub-word units to detect the potential OOVs. In the recovery stage, a dynamic dictionary is built to extend the initial static word lexicon in order to cope with the detected OOVs. The recovery includes a selection step in which the best word candidates extracted from the external resource are kept. Experiments were conducted on the public benchmarking KHATT and AHTID/MW databases. The obtained results revealed that sub-word modeling could give cues for improving the detection and that the use of a dynamic dictionary significantly improves the recognition performance compared to one-step approaches that are based on a large static dictionary or the combination of different sub-word units. We achieve the state of the art results on the KHATT dataset. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

22. Framework for syntactic string similarity measures.

Author: Gali, Najlah, Mariescu-Istodor, Radu, Hostettler, Damien, and Fränti, Pasi
Subjects: *DOCUMENT clustering, *WORD order (Grammar), *INFORMATION retrieval, *RESEMBLANCE (Philosophy)
Abstract: • Token-level measures outperform character-level measures when the order of the words varies. • Q-grams provide a good compromise between token- and character-level measures. • Token-level measures are significantly outperformed by their soft variants. • Soft measures based on set-matching methods perform best when using q-gram at the character level. • The performance of similarity measures varies depending on the type of the datasets. Similarity measure is an essential component of information retrieval, document clustering, text summarization, and question answering, among others. In this paper, we introduce a general framework of syntactic similarity measures for matching short text. We thoroughly analyze the measures by dividing them into three components: character-level similarity, string segmentation, and matching technique. Soft variants of the measures are also introduced. With the help of two existing toolkits (SecondString and SimMetric), we provide an open-source Java toolkit of the proposed framework, which integrates the individual components together so that completely new combinations can be created. Experimental results reveal that the performance of the similarity measures depends on the type of the dataset. For well-maintained dataset, using a token-level measure is important but the basic (crisp) variant is usually enough. For uncontrolled dataset where typing errors are expected, the soft variants of the token-level measures are necessary. Among all tested measures, a soft token-level measure that combines set matching and q-grams at the character level perform best. A gap between human perception and syntactic measures still remains due to lacking semantic analysis. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

23. On generalized Lyndon words.

Author: Dolce, Francesco, Restivo, Antonio, and Reutenauer, Christophe
Subjects: *LINEAR orderings, *WORD order (Grammar), *VOCABULARY, *ALPHABET
Abstract: A generalized lexicographical order on infinite words is defined by choosing for each position a total order on the alphabet. This allows to define generalized Lyndon words. Every word in the free monoid can be factorized in a unique way as a nonincreasing factorization of generalized Lyndon words. We give new characterizations of the first and the last factor in this factorization as well as new characterization of generalized Lyndon words. We also give more specific results on two special cases: the classical one and the one arising from the alternating lexicographical order. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

24. Learning document representation via topic-enhanced LSTM model.

Author: Zhang, Wenyue, Li, Yang, and Wang, Suge
Subjects: *RECURRENT neural networks, *LATENT semantic analysis, *NATURAL language processing, *DOCUMENT clustering, *INFORMATION retrieval, *WORD order (Grammar)
Abstract: Abstract Document representation plays an important role in the fields of text mining, natural language processing, and information retrieval. Traditional approaches to document representation may suffer from the disregard of the correlations or order of words in a document, due to unrealistic assumption of word independence or exchangeability. Recently, long–short-term memory (LSTM) based recurrent neural networks have been shown effective in preserving local contextual sequential patterns of words in a document, but using the LSTM model alone may not be adequate to capture global topical semantics for learning document representation. In this work, we propose a new topic-enhanced LSTM model to deal with the document representation problem. We first employ an attention-based LSTM model to generate hidden representation of word sequence in a given document. Then, we introduce a latent topic modeling layer with similarity constraint on the local hidden representation, and build a tree-structured LSTM on top of the topic layer for generating semantic representation of the document. We evaluate our model in typical text mining applications, i.e., document classification, topic detection, information retrieval, and document clustering. Experimental results on real-world datasets show the benefit of our innovations over state-of-the-art baseline methods. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

25. Word order as an interface between syntax and pragmatics: The case of identifying topics in mixed case-marking patterns in Mandarin Chinese.

Author: Yu, Xiujin and Zhang, Hui
Subjects: *WORD order (Grammar), *MANDARIN dialects, *SYNTAX (Grammar), *PRAGMATICS, *ERGATIVE constructions, *ABSOLUTE constructions (Grammar)
Abstract: • Word order in Mandarin serves as an interface between syntax and pragmatics, which is exhibited through the role that word order in Mandarin plays in marking case as well as topics. • Mandarin shows a mixed case-marking pattern through word order, namely that nominative-accusative pattern coexists with ergative-absolutive pattern, both of which allow intra-clausal topics and extra-clausal topics. • It is the intra-clausal topics rather than the extra-clausal ones that are associated with case-marking patterns, and hence topic assignment in unmarked constructions of different case-marking patterns can be identified. • In light of the two case-marking patterns, context-independent topics in various constructions can be identified. The typological classification of Mandarin Chinese as an isolating language lacking inflection brings about a theoretical consideration that word order in Mandarin serves as an interface between syntax and pragmatics. This paper provides from typological perspective a unified framework to examine the role that word order in Mandarin plays in marking case as well as topics. The study finds that Mandarin exhibits a mixed case-marking pattern through word order, namely that nominative-accusative pattern coexists with ergative-absolutive pattern. Both of the two case-marking patterns allow intra-clausal topics and extra-clausal topics, and it is the intra-clausal topics rather than the extra-clausal ones that are associated with case-marking patterns. With respect to the intra-clausal topics, in nominative-accusative pattern, topic assignment in unmarked constructions is A/S oriented, whereas in ergative-absolutive pattern, topic assignment in unmarked two/three-argument constructions is post- bă P oriented, and topic assignment in unmarked one-argument constructions is preverbal-S oriented. In light of the two case-marking patterns, context-independent topics in various constructions can be identified, and thus the controversial issue of distinguishing topics from other arguments can be properly addressed. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

26. Extending the MNREAD sentence corpus: Computer-generated sentences for measuring visual performance in reading.

Author: Mansfield, J.S., Atilgan, N., Lewis, A.M., and Legge, G.E.
Subjects: *LOW vision, *WORD order (Grammar), *CRIMINAL sentencing, *VISION testing, *READING comprehension, *VISION testing equipment, *ALGORITHMS, *DIGITAL image processing, *SENSORY perception, *READING, *RESEARCH funding, *VISUAL acuity
Abstract: The MNREAD chart consists of standardized sentences printed at 19 sizes in 0.1 logMAR steps. There are 95 sentences distributed across the five English versions of the chart. However, there is a demand for a much larger number of sentences: for clinical research requiring repeated measures, and for new vision tests that use multiple trials at each print size. This paper describes a new sentence generator that has produced over nine million sentences that fit the MNREAD constraints, and demonstrates that reading performance with these new sentences is comparable to that obtained with the original MNREAD sentences. We measured reading performance with the original MNREAD sentences, two sets of our new sentences, and sentences with shuffled word order. Reading-speed versus print-size curves were obtained for each sentence set from 14 readers with normal vision at two levels of blur (intended to simulate acuity loss in low vision) and with unblurred text. We found no significant differences between the new and original sentences in reading acuity and critical print size across all levels of blur. Maximum reading speed was 7% slower with the new sentences than with the original sentences. Shuffled sentences yielded slower maximum reading speeds and larger reading acuities than the other sentences. Overall, measures of reading performance with the new sentences are similar to those obtained with the original MNREAD sentences. Our sentence generator substantially expands the reading materials for clinical research on reading vision using the MNREAD test, and opens up new possibilities for measuring how text parameters affect reading. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

27. Syntactic simplification in interpreted English: Dependency distance and direction measures.

Author: Xu, Han and Liu, Kanglong
Subjects: *ENGLISH language, *LANGUAGE transfer (Language learning), *SPEECH, *COGNITIVE load, *NATIVE language, *WORD order (Grammar)
Abstract: • Dependency measures were used to study complexity in interpreted, L2, and L1 speech. • Interpreted English has the lowest mean dependency distance. • Interpreted English features more head-final structures than native English. • Simplification in interpreted speech is attributed to higher cognitive load. This study investigates the simplification hypothesis in interpreting, as well as its cognitive implications, by examining features of syntactic dependency in three language varieties: English speech simultaneously interpreted from Chinese, original English speech produced by native speakers (L1 speech), and original English speech produced by non-native speakers (L2 speech). Two measures of the dependency relation, namely dependency distance and dependency direction, are employed to explore the distinction among the three language varieties in terms of their syntactic complexity, amount of cognitive demand, and the typological property of word order. The findings reveal that interpreted speech has the lowest mean dependency distance (MDD), followed by L2 speech and L1 speech, which indicates that interpreted English speech is syntactically more simplified than original English speech. The lowest MDD in interpreted speech is associated with the high cognitive demand in simultaneous interpreting, suggesting that increased cognitive demand in language processing is likely to lead to simplification of the syntactic structure of the linguistic output. Furthermore, dependency direction analysis of the three language varieties indicates that interpreted English tends to be more head-final than L1 English speech, confirming a typological word order distinction between translational and original language. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Rapid syntactic pre-activation in Broca’s area: Concurrent electrophysiological and haemodynamic recordings.

Author: Söderström, Pelle, Horne, Merle, Mannfolk, Peter, Van Westen, Danielle, and Roll, Mikael
Subjects: *BROCA'S area, *SUPRASYLVIAN gyrus, *WORD order (Grammar), *EVOKED potentials (Electrophysiology), *FUNCTIONAL magnetic resonance imaging
Abstract: Listeners are constantly trying to predict what the speaker will say next. We concurrently measured the electrophysiological and haemodynamic correlates of syntactic pre-activation, investigating when and where the brain processes speech melody cues to upcoming word order structure. Pre-activation of syntactic structure was reflected in a left-lateralised pre-activation negativity (PrAN), which was subserved by Broca’s area in the left inferior frontal gyrus, as well as the contiguous left anterior insula. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

29. Feature-enhanced attention network for target-dependent sentiment classification.

Author: Yang, Min, Qu, Qiang, Chen, Xiaojun, Guo, Chaoxue, Shen, Ying, and Lei, Kai
Subjects: *SPEECH, *WORD order (Grammar), *ENGLISH language, *CHINESE language, *SENTENCES (Grammar), *BIG data
Abstract: In this paper, we propose a Feature-enhanced Attention Network to improve the performance of target-dependent Sentiment classification (FANS). Specifically, we first learn the feature-enhanced word representations by leveraging the unigram features, part of speech features and word position features. Second, we develop an multi-view co-attention network to learn a better multi-view sentiment-aware and target-specific sentence representation via interactively modeling the context words, target words and sentiment words. We conduct experiments to verify the effectiveness of our model on two real-world datasets in both English and Chinese. The experimental results demonstrate that FANS has robust superiority over competitors and sets state-of-the-art. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

30. Grammatical licensing and relative clause parsing in a flexible word-order language.

Author: Wagers, Matthew W., Borja, Manuel F., and Chung, Sandra
Subjects: *RELATIVE clauses, *PARSING (Grammar), *WORD order (Grammar), *CHAMORRO language, *COMPREHENSION
Abstract: Evidence from two experiments reveals that in Chamorro, a verb-first language, the comprehension of relative clauses (RCs) is sensitive to the order of the RC with respect to the head. Unlike most other languages, Chamorro allows both postnominal and prenominal RCs, so it is possible to compare how the two types are processed within the same language. Moreover, Chamorro is a small language whose speakers do not fit the typical profile of participants in cognitive science experiments. We found that RC comprehension is affected by the relative order of RC and head, and by other language-specific factors. However, we also found new support for a subject gap advantage in all RC types. This advantage emerged in early response measures and was reinforced in postnominal RCs, but often outcompeted in prenominal RCs by other pressures. We frame this competition in terms of a model in which grammatical licensing requirements play a key role in comprehension. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

31. The emergence of word order from a social network perspective.

Author: Lev-Ari, Shiri
Subjects: *WORD order (Grammar), *SOCIAL order, *SOCIAL networks, *COMMUNITIES, *UNIVERSAL language
Abstract: The distribution of word order across languages is skewed with SOV order (e.g., researchers sentences write) and SVO order (e.g., researchers write sentences) being >100 times more common than OSV order (sentences researchers write). It is commonly assumed that cross-linguistic preferences reflect cognitive biases, but it is unknown why some languages exhibit dispreferred patterns, or why languages settle on a specific pattern out of several preferred ones. This paper tests whether larger communities are more likely to rely on cross-linguistically preferred patterns as a way to overcome the greater communicative challenges they encounter. Participants played a communication game in large groups, small groups, or alone. Results support the hypothesis that community size influences word order as well as suggest that SOV and SVO orders are prevalent for different reasons with SVO specifically addressing communicative pressures. The studies thus show how community structure can give rise to cross-linguistic preferences, when these preferences can be overridden, and suggest how language might change with changes in social structure. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Finding your voice: Voice-specific effects in Tagalog reveal the limits of word order priming.

Author: Garcia, Rowena, Roeser, Jens, and Kidd, Evan
Subjects: *WORD order (Grammar), *HUMAN voice, *GRAMMAR, *VERBS
Abstract: The current research investigated structural priming in Tagalog, a symmetrical voice language containing rich verbal morphology that results in changes in mapping between syntactic positions and thematic roles. This grammatically rare feature, which results in multiple transitive structures that are balanced in terms of the grammatical status of their arguments, provides the opportunity to test whether word order priming is sensitive to the voice morphology of the verb. In three sentence priming experiments (N s = 64), we manipulated whether the target-verb prompt carried the same voice as the verb in the prime sentence. In all experiments, priming occurred only when the prime and target had the same voice morphology. Additionally, we found that the strength of word order priming depends on voice: stronger priming effects were found for the voice morpheme associated with a more flexible word order. The findings are consistent with learning-based accounts where language-specific representations for syntax emerge across developmental time. We discuss the implications of these results in the context of Tagalog's grammar. The results reveal the value of crosslinguistic data for theory-testing, and the value of structural priming in determining the representational nature of linguistic structure. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Semantic and sentiment trajectories of literary masterpieces.

Author: Gromov, Vasilii A. and Dang, Quynh Nhu
Subjects: *INDO-European languages, *MACHINE translating, *WORD order (Grammar), *RUSSIAN language, *LANGUAGE & languages
Abstract: The paper deals with semantic and sentiment trajectories of literary masterpieces (we used corpora of 12 languages of various language families), composed of individual embeddings or n-grams. We ascertain that, for all languages, semantic and sentiment trajectories are markedly chaotic: positive largest Lyapunov exponents; 'entropy-complexity' pairs belonging to the 'chaotic' area of the respective plane; the distinctive 'chaotic' drop of the number of false nearest neighbours at a particular value of an embedding dimension. The Russian language turns out to be more 'chaotic' than, for example, the English one; we attribute this fact to the free order of words. The Esperanto language, for various 'approaches' to different Indo-European languages. The results do not corroborate its claim to be equidistant from all languages. However, it seems to be equidistant from all Indo-European languages. These characteristics are utilised in order to develop a method to compare styles of an original masterpiece and its translations (to automatically assess translation quality). It appears that machine translations are still worse than human ones, however, for example, the Facebook translation is comparable with them. • Sentiment and semantic trajectories for considered languages are markedly chaotic. • Different languages cluster themselves into groups on the entropy-complexity plane. • We estimate the boundaries for the size of n-grams and embedding dimensions. • Machine translations are still worse than human ones. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. Effects of word order and morphological information on Japanese sentence comprehension in nonfluent/agrammatic variant of primary progressive aphasia.

Author: Kinno, Ryuta, Kii, Yoshitaka, Kurokawa, Shinji, Owan, Yoshiyuki, Kasai, Hideyo, and Ono, Kenjiro
Subjects: *WORD order (Grammar), *MORPHOLOGY (Grammar), *JAPANESE language, *SENTENCES (Grammar), *APHASIA
Abstract: A clinical feature of the nonfluent/agrammatic variant of primary progressive aphasia (naPPA) is asyntactic comprehension. Previous studies have suggested that patients with asyntactic comprehension will probably rely on heuristics, such as considering the first noun as the agent. Japanese is a subject–object–verb language with a flexible word order and overt morphology; therefore, as subject-initial word order can be reordered by a transformation termed as scrambling, the flexible word order and rich morphology in Japanese may affect the sentence comprehension deficits in naPPA. This study aims to clarify the effects of word order and morphological information, such as case particle or verb inflection, on the comprehension of Japanese sentences in naPPA. Four patients with naPPA and 14 age-matched healthy controls were tested. Sentence comprehension was assessed using picture–sentence verification tasks with semantically reversible sentences. Four different sentence types were tested: subject-initial active (agent-first), scrambled active (theme-first), subject-initial passive (theme-first), and scrambled passive (agent-first). Compared with healthy controls, all patients demonstrated lower performance accuracy for the noncanonical sentences; however, there were no significant differences in performance accuracy for the canonical sentences. For the noncanonical sentences, all patients performed at significantly above chance levels for both the subject-initial and scrambled passive sentences but performed at chance levels for the scrambled active sentences. These results indicated that patients with naPPA would not only resort to the heuristics based on word order but will also intermittently use morphological information, and the heuristics would conflict with morphological information for the scrambled active sentences, which affects sentence comprehension deficits in naPPA. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

35. A pragmatic account of scrambling and topicalization in Japanese.

Author: Imamura, Satoshi
Subjects: *SCRAMBLING systems (Telecommunication), *JAPANESE language, *WORD order (Grammar), *PARTICLES (Grammar), *DATA structures, *PERMUTATIONS, *DISCOURSE analysis
Abstract: The purpose of this study is to explore the interactions between word orders and particles in Japanese transitive sentences in terms of information structure. To this end, a series of corpus analyses within the framework of the Givónian approach were conducted. Based on the present corpus analyses, I propose that scrambling is chosen when the scrambled object is anaphorically prominent but cataphorically non-prominent, and that topicalization is selected when the direct object is anaphorically and cataphorically prominent. Additionally, I arrive at the conclusion that word order permutations in Japanese are applied to intermediately accessible referents. In other words, word order changes are neither used with highly accessible referents, nor completely new information. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

36. Verbal cluster order and processing complexity.

Author: Bloem, Jelke, Versloot, Arjen, and Weerman, Fred
Subjects: *COGNITIVE processing of language, *LINGUISTIC complexity, *DUTCH language, *GENERALIZATION, *CORPORA, *WORD order (Grammar)
Abstract: We examine a case of word order variation where speakers choose between two near-synonymous constructions partly on the basis of the processing complexity of the construction and its context. When producing two-verb clusters in Dutch, a speaker can choose between two word orders. Previous corpus studies have shown that a wide range of factors are associated with this word order variation. We conducted a large-scale corpus study in order to discover what these factors have in common. The underlying generalization appears to be processing complexity: we show that a variety of factors that are related to verbal cluster word order, can also be related to the processing complexity of the cluster's context. This implies that one of the word orders might be easier to process — when processing load is high, speakers will go for the easier option. Therefore, we also investigate which of the two word orders might be easier to process. By testing for associations with factors indicating a higher or lower processing complexity of the verb and its context, we find evidence for the hypothesis that the word order where the main verb comes last is easier to process. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

37. No “Chinese-speaking phase” in Chinese Children's Early Grammar – A Study of the Scope between Negation and Universal Quantification in Mandarin Chinese.

Author: Fan, Li
Subjects: *MANDARIN dialects, *CHINESE language, *WORD order (Grammar), *SEMANTICS, *CHILDREN'S language, *LANGUAGE acquisition, *GRAMMAR
Abstract: Based on the naturalistic data from four children (aged from 00;10 to 02;06) and experimental data from 60 children (aged from four to eight years old) and 15 adults, the study proposes that Mandarin speakers’ judgment about the scope assignment between negation and universal quantification is attributed to the interplay of several influential factors: word order, lexical semantics of logical expressions, structural complexity, conversational implicature, and felicity in the use of negation. Knowledge about the vital importance of word order in the grammar of Mandarin-speaking adults is not initially present in the grammar of children, but is gradually acquired. They initially rely excessively on lexical semantics in interpreting scope, sometimes leading to non-adult interpretations. Full mastery of scope knowledge comes when they become fully aware of the prominent role of isomorphism, at age 6 or 7. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

38. A unified dynamic account of auxiliary placement in Rangi.

Author: Gibson, Hannah
Subjects: *BANTU languages, *COMPARATIVE grammar, *PARSING (Grammar), *WORD order (Grammar), *SYNTAX (Grammar)
Abstract: The Tanzanian Bantu language Rangi exhibits a comparatively and typologically unusual word order alternation in the future tense. Whilst declarative main clauses exhibit post-verbal auxiliary placement, the auxiliary appears pre-verbally in wh-questions, sentential negation, relative clauses, cleft constructions and subordinate clauses. This paper examines this alternation from the perspective of Dynamic Syntax ( Cann et al., 2005; Kempson et al., 2001 ). Dynamic Syntax (DS) is a parsing-oriented framework which aims to capture the way in which meaning is established incrementally as a result of lexical input encountered in context. The paper presents a unified analysis of this construction found in Rangi, locating it within the wider workings of the language. It shows that this seemingly idiosyncratic constituent order is in fact predictable on the basis of a general constraint operative in the DS framework which prohibits the co-occurrence of more than one unfixed node, thereby also confirming the claim of Dynamic Syntax to constitute a grammar framework rather than merely a parsing device. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

39. Development of word order and morphosyntactic skills in reading comprehension among Chinese elementary school children.

Author: Siu, Carrey Tik-Sze, Ho, Connie Suk-han, Chan, David Wai-ock, and Chung, Kevin Kien-hoa
Subjects: *MORPHOSYNTAX, *WORD order (Grammar), *READING comprehension, *PSYCHOLOGY of school children, *NONVERBAL intelligence tests, *SENTENCES (Grammar)
Abstract: The present paper reported two studies which compared the roles of word order and morphosyntactic skills in reading comprehension among Chinese elementary school children. In Study 1, we found that over and above the effects of age, nonverbal intelligence and word reading, word order skill was a stable predictor of reading performance throughout grades 1 to 6, whereas morphosyntactic skill was associated with reading comprehension at grades 3 and 4. Study 2 was a three-year longitudinal study which showed that morphosyntactic but not word order skill at grade 2 longitudinally predicted sentence comprehension at grade 3 beyond the control variables and the auto-regressor; word order rather than morphosyntactic skill at grade 2 contributed significant variances to passage comprehension at grades 3 and 4. The findings suggested a differential dependence of reading on word order and morphosyntactic skills at different ages and in reading comprehension at different levels. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

40. Electrophysiological evidence for a subject-first strategy in visually situated auditory sentence processing in Korean.

Author: Lee, Sun-Young and Nam, Yunju
Subjects: *KOREAN language, *AUDITORY perception, *VERBS, *NATIVE language, *WORD order (Grammar), *ELECTROPHYSIOLOGY
Abstract: This study investigated a subject-first strategy in prediction mechanism in visually situated sentence processing in Korean, using event-related potentials (ERPs). According to the subject-first strategy, parsers tend to generate sentences conforming to canonical sentence word order (i.e., SOV in Korean), subject-first sentence, mapping conceptually more prominent referent such as agent of the event on the subject position of the sentence. Therefore, in the predictive mechanism of language comprehension, the subject is pre-activated and anticipated for the first NP of the sentence at the initial phase of bottom-up language processing. This study tested this subject-first strategy in Korean by examining brain responses to object-initial sentences (OV) compared with subject-initial sentences (SV) under the context of clear thematic role relations set by a visual image. The results of an ERP experiment with 30 native Korean speakers identified neural effects for object-initial sentences compared with subject-initial sentences at the NP and Verb, reflecting a conflict between the pre-activated representation in the parser's mind and the encountered bottom-up input. An N400 effect was elicited at the NP, as early as at the noun, not at the following object case marker. Late frontal positivity (LFP) was also found in the sentence-final verb, proving the processing difficulty of non-canonical object-initial sentences compared with canonical subject-initial sentences. These results indicate that Korean native speakers build linguistic representation conforming to a canonical sentence in SOV language in the predictive mechanism supporting subject-first strategy but revise the predicted event structure rapidly upon newly encountering input. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. From improvisation to learning: How naturalness and systematicity shape language evolution.

Author: Motamedi, Yasamin, Wolters, Lucie, Naegeli, Danielle, Kirby, Simon, and Schouwstra, Marieke
Subjects: *GESTURE, *VERBS, *WORD order (Grammar), *NATURAL languages, *CULTURAL transmission, *COGNITIVE bias, *LANGUAGE & languages, *SEMANTICS, *LINGUISTICS, *LANGUAGE acquisition, *LEARNING, *BODY language
Abstract: Silent gesture studies, in which hearing participants from different linguistic backgrounds produce gestures to communicate events, have been used to test hypotheses about the cognitive biases that govern cross-linguistic word order preferences. In particular, the differential use of SOV and SVO order to communicate, respectively, extensional events (where the direct object exists independently of the event; e.g., girl throws ball) and intensional events (where the meaning of the direct object is potentially dependent on the verb; e.g., girl thinks of ball), has been suggested to represent a natural preference, demonstrated in improvisation contexts. However, natural languages tend to prefer systematic word orders, where a single order is used regardless of the event being communicated. We present a series of studies that investigate ordering preferences for SOV and SVO orders using an online forced-choice experiment, where English-speaking participants select orders for different events i) in the absence of conventions and ii) after learning event-order mappings in different frequencies in a regularisation experiment. Our results show that natural ordering preferences arise in the absence of conventions, replicating previous findings from production experiments. In addition, we show that participants regularise the input they learn in the manual modality in two ways, such that, while the preference for systematic order patterns increases through learning, it exists in competition with the natural ordering preference, that conditions order on the semantics of the event. Using our experimental data in a computational model of cultural transmission, we show that this pattern is expected to persist over generations, suggesting that we should expect to see evidence of semantically-conditioned word order variability in at least some languages. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Adapting Stanford Parser's Dependencies to Paninian Grammar's Karaka Relations Using VerbNet.

Author: Kumar, Manish and Dua, Mohit
Subjects: PARSING (Computer grammar), HINDI language, SYNTAX in programming languages, WORD order (Grammar), MATHEMATICAL mappings, SEMANTICS
Abstract: Paninian Grammar framework provides a better solution for parsing free word order languages and Stanford Parser gives the dependencies for English language (Fixed word order language). In this paper, we map the Stanford parser dependencies to karaka relations. By using VerbNet, we capture the syntax and semantics of verb. We present the issues that encounter while doing adaptation and proposed solution to overcome these problems. We are using Hindi Dependency parser for verification of results. With this adaptation of Stanford Parser, an English-Hindi parallel treebank can be created. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

43. Cross-linguistic gestures reflect typological universals: A subject-initial, verb-final bias in speakers of diverse languages.

Author: Futrell, Richard, Hickey, Tina, Lee, Aldrin, Lim, Eunice, Luchkina, Elena, and Gibson, Edward
Subjects: *VERBS, *VOCABULARY, *LANGUAGE & languages, *GESTURE, *PSYCHOLINGUISTICS, *WORD order (Grammar)
Abstract: In communicating events by gesture, participants create codes that recapitulate the patterns of word order in the world’s vocal languages (Gibson et al., 2013; Goldin-Meadow, So, Ozyurek, & Mylander, 2008; Hall, Mayberry, & Ferreria, 2013; Hall, Ferreira, & Mayberry, 2014; Langus & Nespor, 2010; and others). Participants most often convey simple transitive events using gestures in the order Subject–Object–Verb (SOV), the most common word order in human languages. When there is a possibility of confusion between subject and object, participants use the order Subject–Verb–Object (SVO). This overall pattern has been explained by positing an underlying cognitive preference for subject-initial, verb-final orders, with the verb-medial order SVO order emerging to facilitate robust communication in a noisy channel (Gibson et al., 2013). However, whether the subject-initial and verb-final biases are innate or the result of languages that the participants already know has been unclear, because participants in previous studies all spoke either SVO or SOV languages, which could induce a subject-initial, verb-late bias. Furthermore, the exact manner in which known languages influence gestural orders has been unclear. In this paper we demonstrate that there is a subject-initial and verb-final gesturing bias cross-linguistically by comparing gestures of speakers of SVO languages English and Russian to those of speakers of VSO languages Irish and Tagalog. The findings show that subject-initial and verb-final order emerges even in speakers of verb-initial languages, and that interference from these languages takes the form of occasionally gesturing in VSO order, without an additional bias toward other orders. The results provides further support for the idea that improvised gesture is a window into the pressures shaping language formation, independently of the languages that participants already know. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

44. Syntactic and typological properties of translational language: A comparative description of dependency treebank of academic abstracts.

Author: Liang, Yan and Sang, Zhonggang
Subjects: *WORD order (Grammar), *LANGUAGE & languages, *LINGUISTIC typology
Abstract: • Linguistic features of translational language are examined with dependency grammar. • Translational language has longer mean dependency distance. • Translational language has more head-final dependencies and less head-initial ones. • Translational and non-translational language has the same word order preference. Translational language, the language of translated texts, is distinct from both the source and the target language. Although there are many studies on translational language, they present contradictory results. This paper aims to investigate the syntactic and typological properties of translational language by adopting the two main indices of dependency grammar: mean dependency distance (MDD) and dependency direction. A comparable dependency treebank, consisting of translated and non-translated English abstracts, was built and quantitatively described. The results show that (1) the MDD of translational English is significantly longer than that of non-translational English, which can be explained by the difference in MDD of the main grammatical relations (subject, object, attribute and adverbial); (2) the MDD of translational English is within the threshold of four; (3) translational English has more head-final and fewer head-initial dependencies, which can be explained by the distributional differences of the four grammatical relations; (4) translational English has the same word order preference for SV-VO-AdjN as non-translational English, but there are more AdjN and fewer SV pairs in translational English. These findings show the effect of source language interference and suggest that a quantitative method is valid for a holistic and systemic description of translational language. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. Redundancy can benefit learning: Evidence from word order and case marking.

Author: Tal, Shira and Arnon, Inbal
Subjects: *WORD order (Grammar), *UNIVERSAL language, *ARTIFICIAL languages, *LINGUISTIC change, *PHILOSOPHY of language, *MORPHEMICS, *SPECIFIC language impairment in children, *READABILITY (Literary style), *LINGUISTICS, *LANGUAGE & languages, *LANGUAGE acquisition, *LEARNING
Abstract: The prevalence of redundancy in the world languages has long puzzled language researchers. It is especially surprising in light of the growing evidence on speakers' tendency to avoid redundant elements in production (omitting or reducing more predictable elements). Here, we propose that redundancy can be functional for learning. In particular, we argue that redundant cues can facilitate learning, even when they make the language system more complicated. This prediction is further motivated by the Linguistic Niche Hypothesis (Lupyan & Dale, 2010), which suggests that morphological complexity can arise due to the advantage redundancy might confer for child learners. We test these hypotheses in an artificial language learning study with children and adults, where either word order alone or both word order and case marking serve as cues for thematic assignment in a novel construction. We predict, and find, that children learning the redundant language learn to produce it, and show better comprehension of the novel thematic assignment than children learning the non-redundant language, despite having to learn an additional morpheme. Children in both conditions were similarly accurate in producing the novel word order, suggesting redundancy might have a differential effect on comprehension and production. Adults did not show better learning in the redundant condition, most likely because they were at ceiling in both conditions. We discuss implications for theories of language learning and language change. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Locality and expectation effects in Hindi preverbal constituent ordering.

Author: Ranjan, Sidharth, Rajkumar, Rajakrishnan, and Agarwal, Sumeet
Subjects: *WORD order (Grammar), *PHILOSOPHY of language, *STATISTICAL bias, *MACHINE learning, *RESEARCH, *READABILITY (Literary style), *MOTIVATION (Psychology), *RESEARCH methodology, *LANGUAGE & languages, *EVALUATION research, *COMPARATIVE studies, *IMPACT of Event Scale, *WRITTEN communication
Abstract: We investigate the relative impact of two influential theories of language comprehension, viz., Dependency Locality Theory (Gibson, 2000; DLT) and Surprisal Theory (Hale, 2001; Levy, 2008), on preverbal constituent ordering in Hindi, a predominantly SOV language with flexible word order. Prior work in Hindi has shown that word order scrambling is influenced by information structure constraints in discourse. However, the impact of cognitively grounded factors on Hindi constituent ordering is relatively underexplored. We test the hypothesis that dependency length minimization is a significant predictor of syntactic choice, once information status and surprisal measures (estimated from n-gram i.e., trigram and incremental dependency parsing models) have been added to a machine learning model. Towards this end, we setup a framework to generate meaning-equivalent grammatical variants of Hindi sentences by linearizing preverbal constituents of projective dependency trees in the Hindi-Urdu Treebank (HUTB) corpus of written text. Our results indicate that dependency length displays a weak effect in predicting reference sentences (amidst variants) over and above the aforementioned predictors. Overall, trigram surprisal outperforms dependency length and parser surprisal by a huge margin and our analyses indicate that maximizing lexical predictability is the primary driving force behind preverbal constituent ordering choices in Hindi. The success of trigram surprisal notwithstanding, dependency length minimization predicts non-canonical reference sentences having fronted direct objects over variants containing the canonical word order, cases where surprisal estimates fail due to their bias towards frequent structures and word sequences. Locality effects persist over the Given-New preference of subject-object ordering in Hindi. Accessibility and local statistical biases discussed in the sentence processing literature are plausible explanations for the success of trigram surprisal. Further, we conjecture that the presence of case markers is a strong factor potentially overriding the pressure for dependency length minimization in Hindi. Finally, we discuss the implications of our findings for the information locality hypothesis and theories of language production. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

47. Revisiting the Persian Ezafe construction: A roll-up movement analysis.

Author: Kahnemuyipour, Arsalan
Subjects: *SYNTAX (Grammar), *NOUN phrases (Grammar), *PERSIAN language, *VOWELS, *CONSTRUCTION (Philosophy), *WORD order (Grammar), *COLLOQUIAL language
Abstract: This paper explores the Persian Ezafe construction, a construction which has received significant attention in the syntactic literature in the past few decades. Descriptively, Ezafe is an unstressed vowel -e (-ye after vowels) which appears between a noun and its modifier (N-e Mod), and is repeated on subsequent modifiers, if they are present, except the last one (N-e Mod1-e Mod2-e Mod3). This paper takes a fresh look at the distribution of the Ezafe vowel, with a special emphasis on its correlation with the order of elements in the noun phrase. After the close connection between word order and the absence/presence of Ezafe is established, the paper considers alternative ways this relation can be captured and argues for a roll-up movement account of this construction, which takes the base order of the noun phrase in Persian to be head final, with the surface order derived via phrasal movement to specifiers of intermediate functional projections in a roll-up fashion. In developing and arguing for this analysis, we will also have a closer look at several constructions in Persian, such as the superlative and the colloquial definite marker, and will account for various word order restrictions in the Persian noun phrase. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

48. The semantic origins of word order.

Author: Schouwstra, Marieke and de Swart, Henriëtte
Subjects: *WORD order (Grammar), *SEMANTICS, *SENTENCES (Grammar), *LANGUAGE & languages, *GESTURE, *VERBS
Abstract: Abstract: Where do the different sentence orders in the languages of the world come from? Recently, it has been suggested that there is a basic sentence order, SOV (Subject–Object–Verb), which was the starting point for other sentence orders. Backup for this claim was found in newly emerging languages, as well as in experiments where people are asked to convey simple meanings in improvised gesture production. In both cases, researchers found that the predominant word order is SOV. Recent literature has shown that the pragmatic rule ‘Agent first’ drives the preference for S initial word order, but this rule does not decide between SOV and SVO. This paper presents experimental evidence for grounding the word order that emerges in gesture production in semantic properties of the message to be conveyed. We focus on the role of the verb, and argue that the preference for SOV word order reported in earlier experiments is due to the use of extensional verbs (e.g. throw). With intensional verbs like think, the object is dependent on the agent’s thought, and our experiment confirms that such verbs lead to a preference for SVO instead. We conclude that the meaning of the verb plays a crucial role in the sequencing of utterances in emerging language systems. This finding is relevant for the debate on language evolution, because it suggests that semantics underlies the early formation of syntactic rules. [Copyright &y& Elsevier]
Published: 2014
Full Text: View/download PDF

49. Deviant word order in Swedish poetry.

Author: Magnusson Petzell, Erik and Hellberg, Staffan
Subjects: *DEVIANT behavior, *WORD order (Grammar), *SWEDISH poetry, *COMPARATIVE grammar, *ORDER (Grammar)
Abstract: Highlights: [•] Poetic word order has a non-complex relationship to ordinary grammar. [•] Remnants from older stages in grammar can complicate the pattern. [•] The wish for special effects can also break the pattern. [Copyright &y& Elsevier]
Published: 2014
Full Text: View/download PDF

50. The processing of different syntactic structures: fMRI investigation of the linguistic distinction between wh-movement and verb movement.

Author: Shetreet, Einat and Friedmann, Naama
Subjects: *FRAMES (Linguistics), *WORD order (Grammar), *LANGUAGE & languages, *LINGUISTICS, *COGNITIVE science, *DISTINCTION (Philosophy), *VERBS, *COMPREHENSION
Abstract: Abstract: Word order variation is a core property of sentence construction in natural languages and has been one of the most extensively studied issues in linguistics and cognitive science. In Hebrew, like in English, the basic word order is Subject–Verb–Object (SVO), but other orders, such as OSV or VSO, are also possible. According to generative syntactic theory, OSV and VSO are derived from the basic SVO order by two different types of syntactic movement: wh-movement, which moves the object to the beginning of the sentence, and verb movement, which moves the verb to a pre-subject position. Using sets of minimally-different sentences, containing the same words in different orders, we investigated the cortical activations related to the processing of these movement types. For wh-movement, we compared OSV and SVO sentences; like earlier studies of wh-movement, we found activations in the left IFG and bilateral posterior temporal regions. Activations related to verb movement were obtained through the comparison of VSO and SVO sentences, which showed activation in the left inferior occipital gyrus. Furthermore, an ROI analysis of regions that were active in the wh-movement contrast showed no difference between VSO and SVO conditions. This is the first fMRI study to compare wh-movement and verb movement, and the first to test verb movement in comprehension. The findings indicate that the different syntactic analyses assumed by linguistic theory for different word orders are reflected in differential brain activations, lending support for the generative theory of syntactic movement and the distinction between wh-movement and verb movement. [Copyright &y& Elsevier]
Published: 2014
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

113 results on '"WORD order (Grammar)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources