6 results on '"Bénédicte Pierrejean"'
Search Results
2. Etude de la reproductibilité des word embeddings : repérage des zones stables et instables dans le lexique (Reproducibility of word embeddings : identifying stable and unstable zones in the semantic space).
- Author
-
Bénédicte Pierrejean and Ludovic Tanguy
- Published
- 2018
3. Investigating the stability of concrete nouns in word embeddings
- Author
-
Ludovic Tanguy, Bénédicte Pierrejean, Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS), Equipe de Recherche en Syntaxe et Sémantique (ERSS), Université Bordeaux Montaigne-Université Toulouse - Jean Jaurès (UT2J)-Centre National de la Recherche Scientifique (CNRS), and Pierrejean, Bénédicte
- Subjects
Degree (graph theory) ,Computer science ,business.industry ,Stability (learning theory) ,02 engineering and technology ,computer.software_genre ,Concreteness ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,020204 information systems ,Noun ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Word2vec ,Artificial intelligence ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Set (psychology) ,business ,computer ,Word (computer architecture) ,Reliability (statistics) ,Natural language processing - Abstract
International audience; We know that word embeddings trained using neural-based methods (such as word2vec SGNS) are sensitive to stability problems and that across two models trained using the exact same set of parameters, the nearest neighbors of a word are likely to change. All words are not equally impacted by this internal instability and recent studies have investigated features influencing the stability of word embeddings. This stability can be seen as a clue for the reliability of the semantic representation of a word. In this work, we investigate the influence of the degree of concreteness of nouns on the stability of their semantic representation. We show that for English generic corpora, abstract words are more affected by stability problems than concrete words. We also found that to a certain extent, the difference between the degree of concreteness of a noun and its nearest neighbors can partly explain the stability or instability of its neighbors.
- Published
- 2019
4. Toward a Computational Multidimensional Lexical Similarity Measure for Modeling Word Association Tasks in Psycholinguistics
- Author
-
Jérôme Farinas, Lola Danet, Bénédicte Pierrejean, Patrice Péran, Xavier de Boissezon, Ludovic Tanguy, Bruno Gaume, Cécile Fabre, Lydia Mai Ho-Dac, Mélanie Jucla, Julien Pinquier, and Nabil Hathout
- Subjects
Measure (data warehouse) ,Relation (database) ,business.industry ,Computer science ,Lexical similarity ,Neuropsychology ,Word Association ,computer.software_genre ,Psycholinguistics ,Semantic similarity ,Multidisciplinary approach ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This paper presents the first results of a multidisciplinary project, the “Evolex” project, gathering researchers in Psycholinguistics, Neuropsychology, Computer Science, Natural Language Processing and Linguistics. The Evolex project aims at proposing a new data-based inductive method for automatically characterising the relation between pairs of french words collected in psycholinguistics experiments on lexical access. This method takes advantage of several complementary computational measures of semantic similarity. We show that some measures are more correlated than others with the frequency of lexical associations, and that they also differ in the way they capture different semantic relations. This allows us to consider building a multidimensional lexical similarity to automate the classification of lexical associations.
- Published
- 2019
- Full Text
- View/download PDF
5. Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation
- Author
-
Bénédicte Pierrejean, Ludovic Tanguy, Tanguy, Ludovic, Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), Université Bordeaux Montaigne-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Centre National de la Recherche Scientifique (CNRS), Cognition, Langues, Langage, Ergonomie ( CLLE-ERSS ), École pratique des hautes études ( EPHE ) -Université Toulouse - Jean Jaurès ( UT2J ) -Université Bordeaux Montaigne-Centre National de la Recherche Scientifique ( CNRS ), École pratique des hautes études (EPHE), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,business.industry ,Pattern recognition ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,02 engineering and technology ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,03 medical and health sciences ,0302 clinical medicine ,Variation (linguistics) ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics ,020201 artificial intelligence & image processing ,[ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL] ,Artificial intelligence ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,business ,Value (mathematics) ,Word (computer architecture) - Abstract
International audience; We propose a method to study the variation lying between different word embeddings models trained with different parameters. We explore the variation between models trained with only one varying parameter by observing the distributional neighbors variation and show how changing only one parameter can have a massive impact on a given semantic space. We show that the variation is not affecting all words of the semantic space equally. Variation is influenced by parameters such as setting a parameter to its minimum or maximum value but it also depends on the corpus intrinsic features such as the frequency of a word. We identify semantic classes of words remaining stable across the models trained and specific words having high variation.
- Published
- 2018
6. Predicting Word Embeddings Variability
- Author
-
Bénédicte Pierrejean, Ludovic Tanguy, Cognition, Langues, Langage, Ergonomie ( CLLE-ERSS ), École pratique des hautes études ( EPHE ) -Université Toulouse - Jean Jaurès ( UT2J ) -Université Bordeaux Montaigne-Centre National de la Recherche Scientifique ( CNRS ), Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), Université Bordeaux Montaigne-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Centre National de la Recherche Scientifique (CNRS), Tanguy, Ludovic, École pratique des hautes études (EPHE), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,Stability (learning theory) ,02 engineering and technology ,Space (commercial competition) ,050105 experimental psychology ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,0202 electrical engineering, electronic engineering, information engineering ,0501 psychology and cognitive sciences ,Word2vec ,[ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL] ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Reliability (statistics) ,Hyperparameter ,business.industry ,05 social sciences ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics ,Variation (linguistics) ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Computer Science::Formal Languages and Automata Theory ,Word (computer architecture) - Abstract
International audience; Neural word embeddings models (such as those built with word2vec) are known to have stability problems: when retraining a model with the exact same hyperparameters, words neighborhoods may change. We propose a method to estimate such variation, based on the overlap of neighbors of a given word in two models trained with identical hyperparam-eters. We show that this inherent variation is not negligible, and that it does not affect every word in the same way. We examine the influence of several features that are intrinsic to a word, corpus or embedding model and provide a methodology that can predict the variability (and as such, reliability) of a word representation in a semantic vector space.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.