201. Experimental approach : The case of the syntax of attributive adjectives
- Author
-
Thuilier, Juliette, Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS), Université Paris Diderot, and Thuilier, Juliette
- Subjects
Corpus linguistics ,Logistic regression model ,Soft Constraints ,French ,Questionnaire ,Mixed-effect model ,Attributive adjective alternation ,Word order variations ,[SCCO.LING] Cognitive science/Linguistics ,Acceptability judgment ,[SCCO.LING]Cognitive science/Linguistics ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics - Abstract
International audience; In French, as well as in other Romance languages, attributive adjectives (A) can appear both before or after the noun (N): (1) une agréable soirée (anteposed) / une soirée agréable (postposed) ‘a nice evening’.Semantically, the general idea is that anteposed As tend to be subsective or intensional, whereas the postposed ones tend to be intersective (or predicative). However, there is no semantic property categorically associated with one position (Abeillé & Godard 1999). Thus, the semantics do not account for the entire phenomenon (contra Bouchard 1998) and the choice of the position is driven by the syntax. First, As followed by a dependent must be postposed to the N: (2) une musique agréable à écouter / *une agréable à écouter musique ‘a music nice to hear”This is the only categorical constraint; the other syntactic constraints do not impose, but rather favor one position over the other. For example, long and complex APs tend to be postposed, respecting the generalization that in SVO languages the “heavy” constituents tend to appear last (Behagel 1909, Hawkins 1994). This means that while long As as well as coordinations of As can appear in both positions as shown in (3), they tend to prefer postposition.(3) un petit et confortable canapé / un canapé petit et confortable ‘a small and comfortable sofa’In this paper, we focus on the syntactic aspects of adjective ordering. We postulate that this phenomenon is led by various factors interacting in a complex way and favoring one position over the other. Thus we need an experimental approach in order to determine which factors are indeed involved in the choice and how they interact. Our approach is based on experiences using corpus data and questionnaires. The methodology is inspired by the work by Bresnan et al. (2007) and Bresnan & Ford (2010) on dative alternation in English. Using statistical modeling (logistic regression – Agresti, 2007 – and mixed-effect models – Gelman & Hill, 2006), we tested syntactic factors found in the literature (Abeillé & Godard 1999, Wilmet 1981, Forsgren 1978 Blinkenberg 1933, a. o.) with attested data extracted from corpora. We assume that, with statistical tools, we are able to free ourselves from variations due to the sampling of the corpora. Moreover, one advantage of the mixed-effect logistic regression is that it is predictive, in the sense that one can build a model on a set of data and use this model to predict the choice between anteposition and postposition on unseen data. This way, we can evaluate how well the model generalizes from the training set. To build our database, we first extracted the attributive As that appeared in both positions in the syntactically annotated newspaper corpus French Treebank (FTB, Abeillé & Clément 2004), leaving aside As with post-adjectival dependents. We then extracted the same As from the spoken corpus C-ORAL-ROM (CORAL, Cresti & Moneglia 2005). These data were annotated for 10 variables concerning the syntactic environment of each A in context: (1) the A is coordinated, (2) the A is modified by an adverbial element; the NP contains (3) an other A in postposition, (4) a relative clause, (5) a PP; the determiner of the NP is (6) demonstrative, (7) possessive, (8) definite; a measure of collocation for (9) the ordered sequence A+N and (10) the ordered sequence N+A (collocations estimated with χ2, Manning & Schütze, 1999). We also differentiated two lemmas in context for 9 As: ancien ‘ancient/former’, pur ‘pure’, seul ‘alone/single’, simple ‘simple/modest’, sacré ‘sacred/brilliant’, commun ‘ordinary/shared’, pauvre ‘poor/unfortunate’, propre ‘own/clean’, cher ‘expensive/dear’. The database contains 6621 occurrences of attributive As (4994 in FTB, 1627 in CORAL) representing 171 lemmas, with 68.8% of anteposition (67% in FTB, 74.2% in CORAL). There is variation according to the lemmas: for instance, the A unique ‘unique’ is anteposed in 20% of the cases, whereas sérieux ‘serious’ appears in this position in 51,4% and petit ‘small’ in 98.6%. Moreover, there is less alternation in spoken data than in written ones: the 171 lemmas appear in both positions in FTB, while only 56 (25%) of them are really alternating in CORAL. This seems to reveal that in spoken French, the As tend to have a more fixed behavior than in the written variant. We hypothesize that in order to not put the A in its preferred position, that is the more frequent position, the syntactic conditions must strongly favor the non-preferred position in the case of spoken French. Multi-factorial statistical modeling We used mixed-effects logistic regression to estimate the probability that the anteposition will be chosen as a function of 11 predictive variables (the 10 syntactic variables and the mode of production: written or spoken). The construction of the model consists in estimating the coefficients that are associated with each variable. Besides the predictive variables, also called fixed effects, mixed-effects models are able to take into account the variation in the data by means of random-effects. In our case, the adjectival lemmas are the random effects in order to model the adjectival idiosyncrasies. We built a model with 11 fixed-effects and 1 random-effect. All the effects are significant and thus participate in predicting the position of the As. This model has a mean accuracy of 0.88 (10-fold cross-validation) and the mean concordance probability is C = 0.947 (10-fold cross-validation). These numbers indicate that the model’s predictions are very accurate.Results Each coefficient associated with fixed-effects can be interpreted as the preference for a position: a positive coefficient indicates a preference for anteposition and a negative one for postposition. Thus the model shows that the nature of the determiner influences the position: demonstrative, possessive and definite determiners favor the anteposition. Moreover, APs containing coordinated As or adverbial modifiers tend to be postposed, which confirms that speakers tend to put “heavy” APs after the N. The occurrence of a relative clause, a PP or another A after the N also favors the anteposition. Finally, the N the A is combined with affects the choice: the more the A and the N tend to be a collocation in a given order, as in à justeA titreN ‘understandably’, the more the sequence tend to occur in the given order. The corpus model estimates the probability of anteposition of each adjectival occurrence given the syntactic environment and taking into account the specificities of each lemma (random-effect). A questionnaire experiment was then conducted to test whether these probabilities are related to the judgments of native speakers. Our hypothesis is that, for many speakers, the frequency of choice for anteposition will correspond to the probability of anteposition estimated in the corpus model. In this study, participants had to choose the preferred order for an A N sequence in context. The questionnaire is made up of 29 sentences extracted from the database (FTB part) and selected according to their probability in order to have a sample containing the all range of possible probabilities (from 0 to 1). For each tested A, the participant sees a pair of sentences: the original sentence and a modified version with the sequence A N in the opposite order. In both versions of the sentence, the NP is in bold and colored letters in order to help the participant to notice the difference in the pair. The pairs and the sentences within the pairs are randomly ordered in each questionnaire. The participants were contacted via social networks and scientific mailing lists. 141 participants completed the questionnaire online.As predicted, the proportion of choice for anteposed As significantly correlates with the probability of anteposition estimated in the corpus model: 0.74 (p.value < 0.0001). The correlation suggests that speakers tend to be sensitive to the syntactic constraints used in the corpus model and thus that the statistical modeling proposed on the basis of usage data is an appropriate way of describing and accounting for a rather complex syntactic phenomenon such as attributive A alternation. This result is in accordance with Bresnan’s (2007) experimental work, which suggests that the implicit knowledge of the language “is more powerful than has been recognized under the idealizations of categorical models of grammaticality”. Other Romance languages also display the attributive A alternation: (4) Italian: / un triste racconto / un racconto triste‘a sad story’(5) Romanian: o tristă poveste / o poveste tristă‘a sad story’(6) Portuguese: um imenso escritorio / um escritorio imenso ‘a huge office’(7) Spanish: un inmenso escritorio / un escritorio inmenso‘a huge office’Our study on French data suggests that this phenomenon should be analyzed in the light of quantitative and experimental approaches in other Romance languages. Using the same kind of statistical modeling employed here will allow us to observe and quantify cross-linguistic differences and similarities. Selected References: Abeillé A. & Godard D. 1999, La position de l’adjectif épithète en français : le poids des mots. Recherches Linguistiques de Vincennes, 28, 9-32. Agresti, A. 2007. An Introduction to Categorical Data Analysis. Wiley. Behaghel O. 1909, “Von deutsher wortstellung”, Insogermanische Forschungen, 25. Blinkenberg A. 1933, L’ordre des mots en français moderne. Deuxième partie, Copenhague : Levin & Munksgaard. Bresnan J. 2007, Is syntactic knowledge probabilistic ? Experiments with the English dative alternation, In S. Featherston & W. Sternefeld, Eds., Roots : Linguistics in Search of Its Evidential Base, p. 77-96. Berlin : Mouton de Gruyter. Bresnan, J., A. Cueni, T. Nikitina, and H. Baayen. 2007. Predicting the dative alternation. In Boume, Kraemer, and Zwarts (Eds.), Cognitive Foundations of Interpretation. Amsterdam: Royal Netherlands Academy of Science. Bresnan, J. & M. Ford, 2010, “Predicting syntax: Processing dative constructions in American and Australian varieties of English” Language 86 (1). Forsgren M. 1978. La place de l’adjectif épithète en français contemporain, étude quantitative et sémantique. Stockholm : Almqvist & Wilksell. Gelman, A. & J. Hill, 2006, Data Analysis Using Regression and Multilevel/ Hierarchical Models, Cambridge: Cambridge University Press. Hawkins, J. 1994, A performance theory of order and constituency, Cambridge: Cambridge University Press. Manning, C. D. & H. Schütze, 1999, Foundations of Statistical Natural Language Processing, Cambridge, MA : The MIT Press. Wilmet M. 1981, La place de l’épithète qualificative en français contemporain : étude grammaticale et stylistique, Revue de linguistique romane, 45, 17-73.
- Published
- 2013