25 results on '"Simona Balbi"'
Search Results
2. The ideal candidates for the hotel industry vacancies: a conjoint analysis of managers’ preferences
- Author
-
Balzano, Simona, Massimo, Aria, Simona, Balbi, and Alfonso, Piscitelli
- Published
- 2020
3. BMS: An improved Dunn index for Document Clustering validation
- Author
-
Michelangelo Misuraca, Simona Balbi, Maria Spano, Misuraca, M., Spano, M., and Balbi, S.
- Subjects
Statistics and Probability ,021103 operations research ,business.industry ,Cosine similarity ,0211 other engineering and technologies ,k-means clustering ,Pattern recognition ,Dunn index ,02 engineering and technology ,Document clustering ,01 natural sciences ,010104 statistics & probability ,ComputingMethodologies_PATTERNRECOGNITION ,cosine similarity ,Artificial intelligence ,0101 mathematics ,cluster validation ,Cluster analysis ,business ,K-means ,Mathematics - Abstract
Document Clustering aims at organizing a large quantity of unlabeled documents into a smaller number of meaningful and coherent clusters. One of the main unsolved problems in the literature is the lack of a reliable methodology to evaluate the results, although a wide variety of validation measures has been proposed. Validation measures are often unsatisfactory with numerical data, and even underperforming with textual data. Our attention focuses on the use of cosine similarity into the clustering process. A new measure based on the same criterion is here proposed. The effectiveness of the proposal is shown by an extensive comparative study.
- Published
- 2018
4. Combining different evaluation systems on social media for measuring user satisfaction
- Author
-
Michelangelo Misuraca, Simona Balbi, Germana Scepi, Balbi, Simona, Misuraca, Michelangelo, and Scepi, Germana
- Subjects
Measure (data warehouse) ,Information retrieval ,Computer science ,05 social sciences ,User satisfaction ,02 engineering and technology ,Library and Information Sciences ,Management Science and Operations Research ,Rating score ,Computer Science Applications ,Social media mining ,Rating scale ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,050211 marketing ,020201 artificial intelligence & image processing ,Social media ,Set (psychology) ,Information Systems - Abstract
Web 2.0 allows people to express and share their opinions about products and services they buy/use. These opinions can be expressed in various ways: numbers, texts, emoticons, pictures, videos, audios, and so on. There has been great interest in the strategies for extracting, organising and analysing this kind of information. In a social media mining framework, in particular, the use of textual data has been explored in depth and still represents a challenge. On a rating and review website, user satisfaction can be detected both from a rating scale and from the written text. However, in common practice, there is a lack of algorithms able to combine judgments provided with both comments and scores. In this paper we propose a strategy to jointly measure the user evaluations obtained from the two systems. Text polarity is detected with a sentiment-based approach, and then combined with the associated rating score. The new rating scale has a finer granularity. Moreover, also enables the reviews to be ranked. We show the effectiveness of our proposal by analysing a set of reviews about the Uffizi Gallery in Florence (Italy) published on TripAdvisor.
- Published
- 2018
5. A multidisciplinary approach for the characterization of the coastal marine ecosystems of Monte Di Procida (Campania, Italy)
- Author
-
Carlo Donadio, Roberta Parisi, Michele Arienzo, Marco Guida, Micla Pennetta, Luciano Ferrara, Giuseppe Aiello, Simona Balbi, Diana Barra, Olga Mangoni, Francesco Bolinesi, Marco Trifuoggi, Mangoni, Olga, Aiello, Giuseppe, Balbi, Simona, Barra, Diana, Bolinesi, Francesco, Donadio, Carlo, Ferrara, Luciano, Guida, Marco, Parisi, Roberta, Pennetta, Micla, Trifuoggi, Marco, and Arienzo, Michele
- Subjects
Pollution ,Geologic Sediments ,Trophic statu ,010504 meteorology & atmospheric sciences ,media_common.quotation_subject ,Foraminifera ,Marine pollution ,010501 environmental sciences ,Aquatic Science ,Ecotoxicology ,Biological indicator ,Oceanography ,01 natural sciences ,Rivers ,Aquatic science ,Marine ecosystem ,Relative species abundance ,Ecosystem ,0105 earth and related environmental sciences ,media_common ,Principal Component Analysis ,geography ,geography.geographical_feature_category ,Geography ,biology ,Ecology ,biology.organism_classification ,Italy ,Environmental geology ,Quinqueloculina ,Environmental science ,Sewage treatment ,Environmental Pollution ,Water Microbiology ,Water Pollutants, Chemical ,Channel (geography) ,Environmental Monitoring - Abstract
A multidisciplinary survey was carried out on the quality of water and sediments of a coastal protected marine area, embedded between the inputs from Bagnoli steel plant to the south and a sewage plant, Volturno River and Regi Lagni channel to the north. The study integrated chemical-sedimentological data with biological and ecotoxicological analyses to assess anthropogenic pressures and natural variability. Data reveal marked differences in anthropogenic pollution between southeastern and northwestern zone, with the north affected by both inorganic and organic flows and the south influenced by levels of As, Pb and Zn in the sediments above law limits, deriving from inputs of the Bagnoli brownfield site. Meiobenthic data revealed at south higher relative abundance of sensitive species to pollution and environmental stress to the south, i.e. Lobatula lobatula and Rosalina bradyi, whereas to the north relative abundance of stress tolerant Quinqueloculina lata, Quinqueloculina pygmaea and Cribroelphidium cuvilleri were determined.
- Published
- 2016
6. A simultaneous non-symmetrical principal component analysis with a group structure
- Author
-
Vincenzo Esposito and Simona Balbi
- Subjects
Factorial ,Mathematical optimization ,Variables ,business.industry ,media_common.quotation_subject ,Pattern recognition ,Extension (predicate logic) ,Management Science and Operations Research ,General Business, Management and Accounting ,Modeling and Simulation ,Perception ,Principal component analysis ,Quality (business) ,Artificial intelligence ,business ,Subspace topology ,media_common ,Mathematics ,Interpretability - Abstract
The aim of this paper is to propose an extension of principal component analysis onto a reference subspace (PCAR) to the case where the same dependent variables have been measured on the same statistical units under two, or more, different observational conditions. As the units belong to the same multidimensional space, we profitably apply the orthogonal Procrustean rotations, jointly with PCAR, so as to enrich the interpretability of patterns on factorial planes. The proposed technique is applied to a problem of agreement in the area of sensory data analysis for representing evaluation gaps between the perception of quality by wine experts and ordinary consumers. The proposed approach allows to explain the eventually detected gaps in terms of the physical–chemical characteristics of wines. Copyright © 1999 John Wiley & Sons, Ltd.
- Published
- 1999
7. The analysis of structured qualitative data
- Author
-
Carlo Lauro, Simona Balbi, Lauro, C., and Balbi, Simona
- Subjects
Contingency table ,Multidimensional analysis ,computer.software_genre ,Correspondence analysis ,Relationship square ,Multiple correspondence analysis ,Management of Technology and Innovation ,Modeling and Simulation ,Resampling ,Statistics ,Principal component analysis ,Data mining ,Categorical variable ,computer ,Mathematics - Abstract
The aim of this paper is to give an overview of the methodological contribution given by Italian researchers in introducing a priori information into multidimensional data analysis techniques, paying special attention to categorical variables. The basic method is Non-Symmetrical Correspondence Analysis, which enables the analysis of a contingency table when the behaviour of one variable is supposed to be dependent on the other cross-classified variable. As usual correspondence analysis decomposes an association index (Pearson's Φ2), in a principal component sense, the proposed method is based on a decomposition of a predictability index (Goodman and Kruskal's τb). Non-symmetrical correspondence analysis has been extended to more than one dependent/explanatory variable(s), by means of proper flattening procedures, i.e. by the use of multiple tables, and the decomposition of Gray and Williams' multiple and partial τb's. In doing so multiple and partial versions have been proposed. A forward selection procedure for choosing the variables with higher predictive power is presented. After a brief review of non-symmetrical correspondence analysis confirmatory approach, the problem of validating results in terms of analytical stability and replication stability is faced by means of influence functions and resampling techniques. Copyright © 1999 John Wiley & Sons, Ltd.
- Published
- 1999
8. Statistical tools in the joint analysis of closed and open-ended questions
- Author
-
Simona Balbi, Nicole Triunfo, C. DAVINO, L. FABBRIS, Balbi, Simona, and Triunfo, Nicole
- Subjects
Factorial ,Closed-ended question ,Modalities ,Computer science ,business.industry ,Questionnaire ,computer.software_genre ,Correspondence analysis ,Conjoint analysis ,Survey data ,Table (database) ,Textual Data Analysi ,Lexical correspondence ,Artificial intelligence ,Correspondence Analysi ,business ,Association (psychology) ,computer ,Natural language processing - Abstract
The paper aims at presenting some statistical exploratory methods useful in the joint analysis of data collected in a survey, by means of closed and open-ended questions. After a quick review of the main steps necessary for transforming texts in a numerical table, we focus our attention on Lexical Correspondence Analysis. This method is a popular technique for analysing a lexical table obtained by cross-classifying respondents and free responses. As our interest is often in measuring and visualising the association between socio-demographic characteristics and lexical behaviour, the modalities of one or more closed-ended questions are used both for aggregating individuals similar with respect to the considered variables and reducing the sparseness of the lexical table. Dealing with textual data, the effectiveness of a non symmetrical variant of correspondence analysis is introduced. Furthermore, the advantages of asking a free description of the desired product in a conjoint analysis questionnaire is shown, by applying a factorial conjoint analysis with the lexical table as external information.
- Published
- 2013
9. Competences and Professional Options of the Italian Graduates: Results from the Textual Analysis of the Degree Course Information Data
- Author
-
Corrado Crocetta, Simona Balbi, S. Zaccarin, E. Zavarrone, Maria Francesca Romano, Attanasio M., Capursi V., Balbi, S., Crocetta, C., Romano, M. F., Zaccarin, Susanna, Zavarrone, E., Attanasio, Massimo, Capursi, Vincenza (Eds.), Balbi, Simona, C., Crocetta, M. F., Romano, S., Zaccarin, and E., Zavarrone
- Subjects
Educational systems ,Statistical Methods ,Medical education ,Academic year ,text mining ,multivariate data analysi ,Information data ,Degree (music) ,Political science ,Pedagogy ,Italian university offer ,Christian ministry ,University system ,Educational system ,Latent semantic indexing - Abstract
The paper, developed within a research project promoted by the National Committee for the Evaluation of the University System (CNVSU, the complete report is available on www.cnvsu.it), analyses the Italian university offers, focusing on the communication adopted by the Universities to publicise their objectives and the results expected, also with respect to the potential employments. The analysed documents are the course information sheets of all the 3-year degree courses and some specialised (2-year) degree courses, contained in the OFF.F database of the Ministry of the University (MIUR) for the academic year 2005/06. The research pursued the following aims: 1. reading the education offer focusing mainly on the competences foreseen for the graduates; 2. reading the foreseen job prospects for the graduates in a course; 3. analysing the consistency of the competences provided with the foreseen employment prospects; 4. analysing the consistency between the competences acquired by the 3-year graduates and the competences offered to whom decides for continuing with a 2-year degree course specialisation. The analysis of points 1.-3. is carried out through the use of typical text mining procedures, while for point 4. the reference is to methods developed for the analysis of multi-linguistic corpora. In section 2, a short overview on the adopted methods is presented, while in the following sections some general reflections on the topics are made, referring to the principal results for the degree in Statistics, as an example.
- Published
- 2010
10. A Doubly Projected Analysis for Lexical Tables
- Author
-
Simona Balbi, Michelangelo Misuraca, Skiadas C.H., Balbi, Simona, M., Misuraca, and C.H. Skiadas
- Subjects
Factorial ,Orthogonal Projector ,Lexical analysis ,Principal component analysis ,Orthographic projection ,Factorial Maps ,External Information ,Row and column spaces ,Algorithm ,Subspace topology ,Readability ,Mathematics ,Reference space - Abstract
This paper aims at showing how external information contributes in analysing a lexical table by enriching the readability of factorial maps. The theoretical frame is given by Principal Component Analysis onto a Reference Subspace, a method based on the orthogonal projection of a correlation structure on the space spanned by an external set of explanatory variables. In previous papers the idea of a projected lexical analysis has been introduced by using a single reference space for terms. Here we consider a double projection strategy by involving external informative structures both on documents and terms, i.e. on rows and columns of a lexical table.
- Published
- 2009
11. Introduzione
- Author
-
Simona Balbi, Giovanna Boccuzzo, Maria Gabriella Grassia, BALBI S., BOCCUZZO G., GRASSIA M.G., Balbi, Simona, Boccuzzo, Giovanna, and Grassia, MARIA GABRIELLA
- Published
- 2009
12. Profiling and Labour Market Accessibility for the Graduates in Economics at Naples University
- Author
-
Simona Balbi, Maria Gabriella Grassia, LUIGI FABBRIS, Balbi, Simona, and Grassia, MARIA GABRIELLA
- Subjects
Operations research ,Association rule learning ,Pseudo-Panel ,Market evolution ,Symbolic data analysis ,Association rule ,Engineering management ,Semantic Marking ,Phenomenon ,Symbolic objects ,Symbolic Objects ,Pseudo panel ,Economics ,Profiling (information science) - Abstract
In this paper, after defining a pseudo-panel of groups observed at subsequent times, we propose a strategy for the construction of a set of association rules related to different survey occasions. First, we measure the similarity between systems built at different times for understanding the stability of the phenomenon. We apply a procedure developed for symbolic data analysis for this purpose. The procedure consists of two phases: the definition of the pseudo-panel and that of a system of rules referred to the semantic marking technique. Then, the agreement between the systems is measured. We applied such a strategy for studying the labour market accessibility for graduate in Economics, the University of Naples “Federico II”, and the market evolution during an eight-year time span.
- Published
- 2006
13. Visualization Techniques for Non Symmetrical Relations
- Author
-
Simona Balbi and Michelangelo Misuraca
- Subjects
Creative visualization ,Information retrieval ,business.industry ,Computer science ,media_common.quotation_subject ,Context (language use) ,computer.software_genre ,Correspondence analysis ,Term (time) ,Weighting ,Index (publishing) ,Singular value decomposition ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,Element (category theory) ,business ,computer ,Natural language processing ,media_common - Abstract
Many strategies of Text Retrieval are based on Latent Semantic Indexing and its variations, by considering different weighting systems for words and documents. Correspondence Analysis and L.S.I. share the basic algebraic tool, i.e. the Singular Value Decomposition and its generalisation, related to the use of a different way for measuring the importance of each element, both in determining and representing similarities between documents and words. Aim of the paper is to propose a peculiar factorial approach for better visualizing the relations between textual data and documents, compared with classical Correspondence Analysis. Here we consider a term frequency/document frequency index scheme, mainly developed for Text Retrieval, in a textual data analysis context. An application on Italian Le Monde Diplomatique corpus (about 2000 articles published from 1998 to 2003 in the Italian edition of LMD) will show the effectiveness of the proposal.
- Published
- 2006
14. Procrustes techniques for Text Mining
- Author
-
Simona Balbi, Michelangelo Misuraca, S. ZANI, A. CERIOLI, M. RIANI, M. VICHI, Balbi, Simona, and Misuraca, M.
- Subjects
distanza fra configurazioni ,traduzione ,corpora multilinguistici ,Procrustes ,Point (typography) ,Latent semantic analysis ,business.industry ,Computer science ,Pattern recognition ,Context (language use) ,Translation (geometry) ,computer.software_genre ,religion ,religion.deity ,Text mining ,Artificial intelligence ,business ,Focus (optics) ,computer ,Natural language processing ,Natural language - Abstract
This paper aims at exploring the capability of the so called Latent Semantic Analysis applied to a multilingual context. In particular we are interested in weighing how it could be useful in solving linguistic problems, moving from a statistical point of view. Here we focus on the possibility of evaluating the goodness of a translation by comparing the latent structures of the original text and its version in another natural language. Procrustes rotations are introduced in a statistical framework as a tool for reaching this goal. An application on one year of Le Monde Diplomatique and the corresponding Italian edition will show the effectiveness of our proposal.
- Published
- 2006
15. Contributions of Textual Data Analysis to Text Retrieval
- Author
-
Simona Balbi, Emilio Di Meglio, BANKS, D., HOUSE, L., MCMORRIS, F.R., ARABIE, P., GAUL, W. EDS., Balbi, Simona, and DI MEGLIO, E.
- Subjects
Euclidean distance ,Information retrieval ,Computer science ,Singular value decomposition ,Metric (mathematics) ,Data analysis ,Table (database) ,Correspondence Analysi ,Latent Semantic Indexing ,PLS Regression ,Correspondence analysis ,Regression ,Text retrieval - Abstract
The aim of this paper is to show how Textual Data Analysis techniques, developed in Europe under the influence of the Analyse Multidimen-sionelle des Donnees School, can improve performance of the LSI retrieval method. A first improvement can be obtained by properly considering the data contained in a lexical table. LSI is based on Euclidean distance, which is not adequate for frequency data. By using the chi-squared metric, on which Correspondence Analysis is based, significant improvements can be achieved. Further improvements can be obtained by considering external information such as keywords, authors, etc. Here an approach to text retrieval with external information based on PLS regression is shown. The suggested strategies are applied in text retrieval experiments on medical journal abstracts.
- Published
- 2004
16. A Factorial Technique for Analysing Textual Data with External Information
- Author
-
Simona Balbi, Giuseppe Giordano, S.BORRA, R.ROCCI, M.VICHI, M.SCHADER, Balbi, Simona, and Giordano, G.
- Subjects
Factorial ,Biplot ,Computer science ,business.industry ,Association (object-oriented programming) ,Matrix (music) ,Context (language use) ,lexical table ,computer.software_genre ,Data structure ,Correspondence analysis ,Singular value decomposition ,Data mining ,Artificial intelligence ,SVD ,business ,computer ,Natural language processing - Abstract
The paper aims at proposing a method allowing to take into account information on context, when we are analysing lexical tables by means of factorial techniques, as Correspondence Analysis. Such a kind of information, external to the main data structure, can concern where and how words are used, but, moreover, can concern their (syntactical, grammatical, etc.) role inside the corpus. Here a methodological tool has been proposed: a technique based on projections on subspaces spanned by two sets of variables related to fragments and words. The matrix to be analysed, called inter-reference matrix, measures the importance of the association between the external information on words and fragments. The final outputs are graphical representations that enrich the results of textual data analysis.
- Published
- 2001
17. Some RC33 Communications Problems
- Author
-
Katja Lozar Manfreda, Cor van Dijkum, Simona Balbi, Karl van Meter, and Jörg Blasius
- Subjects
Sociology and Political Science - Published
- 2008
18. Graphical Displays in Nonsymmetrical Correspondence Analysis
- Author
-
Simona Balbi
- Subjects
Algebra ,Data exploration ,Computer science ,Algorithm ,Correspondence analysis - Abstract
Publisher Summary This chapter focuses on graphic representations of the nonsymmetrical correspondence analysis (NSCA). It discusses the conditions in which the NSCA is preferred to the ordinary correspondence analysis in data exploration. The chapter illustrates the importance of the NSCA for dealing with survey data. The NSCA looks for the orthonormal basis that accounts for the largest part of variability. To generalize the NSCA to a three-way or even more complex datasets, multiple NSCA and partial NSCA have also been introduced. The multiple NSCA consists of transforming a multiway table into a suitable two-way table. The chapter explains the basics of the geometry of the NSCA, rules for interpreting NSCA factorial planes, and aids to the interpretation of NSCA maps. The chapter concludes with a discussion on joint plots in the NSCA.
- Published
- 1998
19. Textual data analysis for open-questions in repeated surveys
- Author
-
Simona Balbi, RIZZI, A., VICHI M., BOCK H.H., and Balbi, Simona
- Subjects
Vocabulary ,Information retrieval ,Computer science ,media_common.quotation_subject ,Principal matrice ,Graphical Display ,computer.software_genre ,Textual forms ,Set (abstract data type) ,Order (business) ,Data analysis ,Data mining ,Multi way analysis ,computer ,media_common - Abstract
The paper addresses the problem of analysing answers to an open question observed in different waves of a repeated survey. Multiway data analysis techniques can offer interesting tools for doing that. Unfortunately, dealing with textual data some peculiar problems arise. The aim of this paper is to propose the use of non-symmetrical data analysis techniques in order to follow lexical behaviours with respect to a set of explanatory numerical variables, defining groups through time. Furthermore, attention is paid to the definition of a conjoint vocabulary.
- Published
- 1998
20. An Algorithm for Detecting the Number of Knots in Non Linear Principal Component Analysis
- Author
-
Gerarda Tessitore and Simona Balbi
- Subjects
Nonlinear system ,Principal component analysis ,Singular value decomposition ,Mean vector ,Perturbation (astronomy) ,Linear combination ,Algorithm ,Cross-validation ,Mathematics - Abstract
Principal Component Analysis (PCA) aims at finding “few” linear combinations of the original variables which have “maximal” variance, losing in that summarizing process as little information as possible. The usual computational tool in PCA consists in the singular value decomposition of the observed individuals-variables matrix, centred with respect to the mean vector, and in its lower-rank approximation (in the least-squares sense). Determining this lower rank is a critical point for the method.
- Published
- 1996
21. Analysis of Structured Data
- Author
-
Simona Balbi
- Subjects
Information retrieval ,Sociology and Political Science ,Computer science ,Structured interview ,Structured Analysis and Design Technique - Published
- 2000
22. The textometric concept of active corpus
- Author
-
Pincemin, Bénédicte, Heiden, Serge, Mazuet, Franck, Institut d’Histoire des Représentations et des Idées dans les Modernités (IHRIM), École normale supérieure de Lyon (ENS de Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Université de Lyon-Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université Jean Monnet - Saint-Étienne (UJM)-Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA), Centre d'histoire sociale des mondes contemporains (CHS), Université Paris 1 Panthéon-Sorbonne (UP1)-Centre National de la Recherche Scientifique (CNRS), VADISTAT - Per Simona Balbi, Univ. of Naples Federico II, Misuraca, Michelangelo, Scepi, Germana, Spano, Maria, and ANR-17-CE38-0010,ANTRACT,Analyse Transdisciplinaire des Actualités filmées (1945-1969)(2017)
- Subjects
Digital Humanities ,Newsreels ,[SHS.STAT]Humanities and Social Sciences/Methods and statistics ,Corpus projection ,TXM software ,Les Actualités françaises ,Textometry ,Active corpus ,Corpus annotation ,Film grammar ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SHS.HIST]Humanities and Social Sciences/History - Abstract
International audience; Active corpus provides the possibility to apply searching and statistical computing as if corpus were reduced to selected words, whereas full text still remains visible in context display. This is mainly implemented in paradigmatic processing, yet it may concern syntagmatic processing or text display too. Here we experiment active corpus in syntagmatic processing. A projection generates a new corpus, in which words are semantic tags that were automatically assigned in a first step to the original data. This new corpus makes it easy to explore tag sequences, with any generic textometric tool available, however sparse the original annotation may be. This methodological path was applied to film grammar analysis on 10,000 archival descriptions of news reports. 19 camera shot and angle types were ed through queries and tagged. This annotation became the lexicon of the projected corpus that was used to study shot sequences. The annotation and projection tools we have run are available as utilities in TXM open-sourcesoftware and should usefully serve many research projects.
- Published
- 2022
23. A polarity-based strategy for ranking social media reviews
- Author
-
BALBI, SIMONA, SCEPI, GERMANA, Misuraca, Michelangelo, Simona Balbi, Michelangelo Misuraca, Germana Scepi, Alessandra Petrucci, Rosanna Verde, Balbi, Simona, Misuraca, Michelangelo, and Scepi, Germana
- Subjects
Textual Data,Opinion Mining, Ranking - Abstract
TheOpinionMining methods are widely used to analyse and classify the choices, preferences and behaviours of consumers through the opinions gathered on the Web. On social media like TripAdvisor such opinions are usually expressed with a score and a short text. This paper proposes a strategy for ranking reviews using a scale based jointly on the rating and on the text of the reviews.
- Published
- 2017
24. Combinatorial Inference in Geometric Data Analysis
- Author
-
Le Roux, Brigitte, Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Centre de recherches politiques de Sciences Po (CEVIPOF), Centre National de la Recherche Scientifique (CNRS)-Sciences Po (Sciences Po), Simona Balbi, Università degli Studi di Napoli 'Federico II', Jörg Blasius, University of Bonn, Michael Greenacre, Pompeu Fabra University, Barcelona, Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Centre de Recherches Politiques de Sciences Po ( CEVIPOF ), Sciences Po-Centre National de la Recherche Scientifique ( CNRS ), Sciences Po (Sciences Po)-Centre National de la Recherche Scientifique (CNRS), and Le Roux, Brigitte
- Subjects
Geometric Data Analysis ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Combinatorial Inference ,Permutation Tests - Abstract
International audience; Combinatorial Inference in Geometric Data AnalysisIn this talk, we present statistical inference methods for Geometric Data Analysis (GDA) that are not based on random modeling, but on permutation procedures recast in a combinatorial framework. The combinatorial approach, which is entirely free from assumptions, is the most in harmony with inductive data analysis. The methods are applicable to any IndividualsXvariables table, with structuring factors on individuals (i.e. external categorical variables not used for construction the clouds), and either numerical (PCA) or categorized (MCA) variables. In GDA the usual sampling models, with their drastic assumptions, are simply not appropriate. We first introduce the test of comparison of the mean of a subcloud to a point a reference. Then we develop procedures dealing with the typicality of a subcloud of individuals with a generalization of test-values. Lastly, we present homogeneity tests for comparing several subclouds. In each case, we define the p-value and a compatibility (confidence) zone.References:Bienaise S.(2013). Méthodes d'inférence combinatoire sur un nuage euclidien/Etude statistique de la cohorte EPIEG, PhD Université Paris Dauphine.Edgington E., & Onghena P. (2007). Randomization tests. CRC Press.Pesarin F. (2001). Multivariate permutation tests: with applications in biostatistics (Vol. 240). Chichester: Wiley.Le Roux B., Rouanet H. (2004). Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis}, Dordrecht: Kluwer.Rouanet H., Bernard J-M., Lecoutre B. (1986). Nonprobabilistic statistical inference: A set-theoretic approach, The American Statistician, 49, 60-65.Rouanet, H., Bernard, J. M., Bert, M. C., Lecoutre, B., Lecoutre, M. P., & Le Roux, B. (1998). New ways in statistical methodology. From Significance Tests to Bayesian Inference. Berne: Peter Lang.
- Published
- 2015
25. On the Variance of Eigenvalues in PCA and MCA
- Author
-
Durand, Jean-Luc, Laboratoire d'Ethologie Expérimentale et Comparée (LEEC), Université Sorbonne Paris Cité (USPC)-Université Paris 13 (UP13), Simona Balbi, Jörg Blasius, Michael Greenacre, and Durand, Jean-Luc
- Subjects
[STAT]Statistics [stat] ,[SHS.STAT]Humanities and Social Sciences/Methods and statistics ,principal component analysis ,[SHS.STAT] Humanities and Social Sciences/Methods and statistics ,contributions to axes ,multiple correspondence analysis ,eigenvalues ,[STAT] Statistics [stat] - Abstract
International audience; In this talk, we show that, in principal component analysis (PCA) and in multiplecorrespondence analysis (MCA), the strength of the relationship between variablesfirstly determines the variance of the eigenvalues (which is an indicator of deviation fromsphericity), and secondly highlights the axes to which the variables contribute the most.In PCA on correlation matrix, given a set of numerical variables, we call linkageindex of a variable the mean of the squared correlations between this variable and theothers and average linkage index the mean of the linkage indexes.One has the two following properties:1. The variance of eigenvalues is proportional to the average linkage index.2. For each variable, the variance of eigenvalues weighted by the contributions of thisvariable to axes is proportional to the linkage index of this variable.In MCA, similar properties hold.We illustrate these properties using two classical data sets: scholar evaluations(Spearman, 1904) for PCA and physical attributes (Burt, 1950) for MCA.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.