1. Forward and backward feature selection for query performance prediction
- Author
-
Radu Tudor Ionescu, Sébastien Déjean, Zia Ullah, Josiane Mothe, Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), University of Bucharest (ROMANIA), Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France), Institut de Mathématiques de Toulouse UMR5219 (IMT), Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), University of Bucharest (UniBuc), Systèmes d’Informations Généralisées (IRIT-SIG), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse - Jean Jaurès (UT2J), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), and Université de Toulouse (UT)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Inference ,Machine Learning (stat.ML) ,Feature selection ,02 engineering and technology ,QPP ,Machine learning ,computer.software_genre ,Machine Learning (cs.LG) ,Computer Science - Information Retrieval ,Statistics - Machine Learning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Performance prediction ,Information retrieval ,Relevance (information retrieval) ,Linear regression ,Query performance prediction ,Interpretation (logic) ,business.industry ,AIC criterion ,Recherche d'information ,020207 software engineering ,16. Peace & justice ,Task (computing) ,Predictive model ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,Selective model ,Artificial intelligence ,business ,computer ,Information Retrieval (cs.IR) - Abstract
The goal of query performance prediction (QPP) is to automatically estimate the effectiveness of a search result for any given query, without relevance judgements. Post-retrieval features have been shown to be more effective for this task while being more expensive to compute than pre-retrieval features. Combining multiple post-retrieval features is even more effective, but state-of-the-art QPP methods are impossible to interpret because of the black-box nature of the employed machine learning models. However, interpretation is useful for understanding the predictive model and providing more answers about its behavior. Moreover, combining many post-retrieval features is not applicable to real-world cases, since the query running time is of utter importance. In this paper, we investigate a new framework for feature selection in which the trained model explains well the prediction. We introduce a step-wise (forward and backward) model selection approach where different subsets of query features are used to fit different models from which the system selects the best one. We evaluate our approach on four TREC collections using standard QPP features. We also develop two QPP features to address the issue of query-drift in the query feedback setting. We found that: (1) our model based on a limited number of selected features is as good as more complex models for QPP and better than non-selective models; (2) our model is more efficient than complex models during inference time since it requires fewer features; (3) the predictive model is readable and understandable; and (4) one of our new QPP features is consistently selected across different collections, proving its usefulness., Accepted at SAC 2020
- Published
- 2020
- Full Text
- View/download PDF