41 results on '"Martinez-Plumed, Fernando"'
Search Results
2. How Resilient are Language Models to Text Perturbations?
- Author
-
Romero-Alvarado, Daniel, Hernández-Orallo, José, Martínez-Plumed, Fernando, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Julian, Vicente, editor, Camacho, David, editor, Yin, Hujun, editor, Alberola, Juan M., editor, Nogueira, Vitor Beires, editor, Novais, Paulo, editor, and Tallón-Ballesteros, Antonio, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Predictable Artificial Intelligence
- Author
-
Zhou, Lexin, Moreno-Casares, Pablo A., Martínez-Plumed, Fernando, Burden, John, Burnell, Ryan, Cheke, Lucy, Ferri, Cèsar, Marcoci, Alexandru, Mehrbakhsh, Behzad, Moros-Daval, Yael, hÉigeartaigh, Seán Ó, Rutar, Danaja, Schellaert, Wout, Voudouris, Konstantinos, and Hernández-Orallo, José
- Subjects
Computer Science - Artificial Intelligence ,I.2 - Abstract
We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key validity indicators (e.g., performance, safety) of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over performance. We formally characterise predictability, explore its most relevant components, illustrate what can be predicted, describe alternative candidates for predictors, as well as the trade-offs between maximising validity and predictability. To illustrate these concepts, we bring an array of illustrative examples covering diverse ecosystem configurations. Predictable AI is related to other areas of technical and non-technical AI research, but have distinctive questions, hypotheses, techniques and challenges. This paper aims to elucidate them, calls for identifying paths towards a landscape of predictably valid AI systems and outlines the potential impact of this emergent field., Comment: Paper Under Review
- Published
- 2023
4. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Author
-
Srivastava, Aarohi, Rastogi, Abhinav, Rao, Abhishek, Shoeb, Abu Awal Md, Abid, Abubakar, Fisch, Adam, Brown, Adam R., Santoro, Adam, Gupta, Aditya, Garriga-Alonso, Adrià, Kluska, Agnieszka, Lewkowycz, Aitor, Agarwal, Akshat, Power, Alethea, Ray, Alex, Warstadt, Alex, Kocurek, Alexander W., Safaya, Ali, Tazarv, Ali, Xiang, Alice, Parrish, Alicia, Nie, Allen, Hussain, Aman, Askell, Amanda, Dsouza, Amanda, Slone, Ambrose, Rahane, Ameet, Iyer, Anantharaman S., Andreassen, Anders, Madotto, Andrea, Santilli, Andrea, Stuhlmüller, Andreas, Dai, Andrew, La, Andrew, Lampinen, Andrew, Zou, Andy, Jiang, Angela, Chen, Angelica, Vuong, Anh, Gupta, Animesh, Gottardi, Anna, Norelli, Antonio, Venkatesh, Anu, Gholamidavoodi, Arash, Tabassum, Arfa, Menezes, Arul, Kirubarajan, Arun, Mullokandov, Asher, Sabharwal, Ashish, Herrick, Austin, Efrat, Avia, Erdem, Aykut, Karakaş, Ayla, Roberts, B. Ryan, Loe, Bao Sheng, Zoph, Barret, Bojanowski, Bartłomiej, Özyurt, Batuhan, Hedayatnia, Behnam, Neyshabur, Behnam, Inden, Benjamin, Stein, Benno, Ekmekci, Berk, Lin, Bill Yuchen, Howald, Blake, Orinion, Bryan, Diao, Cameron, Dour, Cameron, Stinson, Catherine, Argueta, Cedrick, Ramírez, César Ferri, Singh, Chandan, Rathkopf, Charles, Meng, Chenlin, Baral, Chitta, Wu, Chiyu, Callison-Burch, Chris, Waites, Chris, Voigt, Christian, Manning, Christopher D., Potts, Christopher, Ramirez, Cindy, Rivera, Clara E., Siro, Clemencia, Raffel, Colin, Ashcraft, Courtney, Garbacea, Cristina, Sileo, Damien, Garrette, Dan, Hendrycks, Dan, Kilman, Dan, Roth, Dan, Freeman, Daniel, Khashabi, Daniel, Levy, Daniel, González, Daniel Moseguí, Perszyk, Danielle, Hernandez, Danny, Chen, Danqi, Ippolito, Daphne, Gilboa, Dar, Dohan, David, Drakard, David, Jurgens, David, Datta, Debajyoti, Ganguli, Deep, Emelin, Denis, Kleyko, Denis, Yuret, Deniz, Chen, Derek, Tam, Derek, Hupkes, Dieuwke, Misra, Diganta, Buzan, Dilyar, Mollo, Dimitri Coelho, Yang, Diyi, Lee, Dong-Ho, Schrader, Dylan, Shutova, Ekaterina, Cubuk, Ekin Dogus, Segal, Elad, Hagerman, Eleanor, Barnes, Elizabeth, Donoway, Elizabeth, Pavlick, Ellie, Rodola, Emanuele, Lam, Emma, Chu, Eric, Tang, Eric, Erdem, Erkut, Chang, Ernie, Chi, Ethan A., Dyer, Ethan, Jerzak, Ethan, Kim, Ethan, Manyasi, Eunice Engefu, Zheltonozhskii, Evgenii, Xia, Fanyue, Siar, Fatemeh, Martínez-Plumed, Fernando, Happé, Francesca, Chollet, Francois, Rong, Frieda, Mishra, Gaurav, Winata, Genta Indra, de Melo, Gerard, Kruszewski, Germán, Parascandolo, Giambattista, Mariani, Giorgio, Wang, Gloria, Jaimovitch-López, Gonzalo, Betz, Gregor, Gur-Ari, Guy, Galijasevic, Hana, Kim, Hannah, Rashkin, Hannah, Hajishirzi, Hannaneh, Mehta, Harsh, Bogar, Hayden, Shevlin, Henry, Schütze, Hinrich, Yakura, Hiromu, Zhang, Hongming, Wong, Hugh Mee, Ng, Ian, Noble, Isaac, Jumelet, Jaap, Geissinger, Jack, Kernion, Jackson, Hilton, Jacob, Lee, Jaehoon, Fisac, Jaime Fernández, Simon, James B., Koppel, James, Zheng, James, Zou, James, Kocoń, Jan, Thompson, Jana, Wingfield, Janelle, Kaplan, Jared, Radom, Jarema, Sohl-Dickstein, Jascha, Phang, Jason, Wei, Jason, Yosinski, Jason, Novikova, Jekaterina, Bosscher, Jelle, Marsh, Jennifer, Kim, Jeremy, Taal, Jeroen, Engel, Jesse, Alabi, Jesujoba, Xu, Jiacheng, Song, Jiaming, Tang, Jillian, Waweru, Joan, Burden, John, Miller, John, Balis, John U., Batchelder, Jonathan, Berant, Jonathan, Frohberg, Jörg, Rozen, Jos, Hernandez-Orallo, Jose, Boudeman, Joseph, Guerr, Joseph, Jones, Joseph, Tenenbaum, Joshua B., Rule, Joshua S., Chua, Joyce, Kanclerz, Kamil, Livescu, Karen, Krauth, Karl, Gopalakrishnan, Karthik, Ignatyeva, Katerina, Markert, Katja, Dhole, Kaustubh D., Gimpel, Kevin, Omondi, Kevin, Mathewson, Kory, Chiafullo, Kristen, Shkaruta, Ksenia, Shridhar, Kumar, McDonell, Kyle, Richardson, Kyle, Reynolds, Laria, Gao, Leo, Zhang, Li, Dugan, Liam, Qin, Lianhui, Contreras-Ochando, Lidia, Morency, Louis-Philippe, Moschella, Luca, Lam, Lucas, Noble, Lucy, Schmidt, Ludwig, He, Luheng, Colón, Luis Oliveros, Metz, Luke, Şenel, Lütfi Kerem, Bosma, Maarten, Sap, Maarten, ter Hoeve, Maartje, Farooqi, Maheen, Faruqui, Manaal, Mazeika, Mantas, Baturan, Marco, Marelli, Marco, Maru, Marco, Quintana, Maria Jose Ramírez, Tolkiehn, Marie, Giulianelli, Mario, Lewis, Martha, Potthast, Martin, Leavitt, Matthew L., Hagen, Matthias, Schubert, Mátyás, Baitemirova, Medina Orduna, Arnaud, Melody, McElrath, Melvin, Yee, Michael A., Cohen, Michael, Gu, Michael, Ivanitskiy, Michael, Starritt, Michael, Strube, Michael, Swędrowski, Michał, Bevilacqua, Michele, Yasunaga, Michihiro, Kale, Mihir, Cain, Mike, Xu, Mimee, Suzgun, Mirac, Walker, Mitch, Tiwari, Mo, Bansal, Mohit, Aminnaseri, Moin, Geva, Mor, Gheini, Mozhdeh, T, Mukund Varma, Peng, Nanyun, Chi, Nathan A., Lee, Nayeon, Krakover, Neta Gur-Ari, Cameron, Nicholas, Roberts, Nicholas, Doiron, Nick, Martinez, Nicole, Nangia, Nikita, Deckers, Niklas, Muennighoff, Niklas, Keskar, Nitish Shirish, Iyer, Niveditha S., Constant, Noah, Fiedel, Noah, Wen, Nuan, Zhang, Oliver, Agha, Omar, Elbaghdadi, Omar, Levy, Omer, Evans, Owain, Casares, Pablo Antonio Moreno, Doshi, Parth, Fung, Pascale, Liang, Paul Pu, Vicol, Paul, Alipoormolabashi, Pegah, Liao, Peiyuan, Liang, Percy, Chang, Peter, Eckersley, Peter, Htut, Phu Mon, Hwang, Pinyu, Miłkowski, Piotr, Patil, Piyush, Pezeshkpour, Pouya, Oli, Priti, Mei, Qiaozhu, Lyu, Qing, Chen, Qinlang, Banjade, Rabin, Rudolph, Rachel Etta, Gabriel, Raefer, Habacker, Rahel, Risco, Ramon, Millière, Raphaël, Garg, Rhythm, Barnes, Richard, Saurous, Rif A., Arakawa, Riku, Raymaekers, Robbe, Frank, Robert, Sikand, Rohan, Novak, Roman, Sitelew, Roman, LeBras, Ronan, Liu, Rosanne, Jacobs, Rowan, Zhang, Rui, Salakhutdinov, Ruslan, Chi, Ryan, Lee, Ryan, Stovall, Ryan, Teehan, Ryan, Yang, Rylan, Singh, Sahib, Mohammad, Saif M., Anand, Sajant, Dillavou, Sam, Shleifer, Sam, Wiseman, Sam, Gruetter, Samuel, Bowman, Samuel R., Schoenholz, Samuel S., Han, Sanghyun, Kwatra, Sanjeev, Rous, Sarah A., Ghazarian, Sarik, Ghosh, Sayan, Casey, Sean, Bischoff, Sebastian, Gehrmann, Sebastian, Schuster, Sebastian, Sadeghi, Sepideh, Hamdan, Shadi, Zhou, Sharon, Srivastava, Shashank, Shi, Sherry, Singh, Shikhar, Asaadi, Shima, Gu, Shixiang Shane, Pachchigar, Shubh, Toshniwal, Shubham, Upadhyay, Shyam, Shyamolima, Debnath, Shakeri, Siamak, Thormeyer, Simon, Melzi, Simone, Reddy, Siva, Makini, Sneha Priscilla, Lee, Soo-Hwan, Torene, Spencer, Hatwar, Sriharsha, Dehaene, Stanislas, Divic, Stefan, Ermon, Stefano, Biderman, Stella, Lin, Stephanie, Prasad, Stephen, Piantadosi, Steven T., Shieber, Stuart M., Misherghi, Summer, Kiritchenko, Svetlana, Mishra, Swaroop, Linzen, Tal, Schuster, Tal, Li, Tao, Yu, Tao, Ali, Tariq, Hashimoto, Tatsu, Wu, Te-Lin, Desbordes, Théo, Rothschild, Theodore, Phan, Thomas, Wang, Tianle, Nkinyili, Tiberius, Schick, Timo, Kornev, Timofei, Tunduny, Titus, Gerstenberg, Tobias, Chang, Trenton, Neeraj, Trishala, Khot, Tushar, Shultz, Tyler, Shaham, Uri, Misra, Vedant, Demberg, Vera, Nyamai, Victoria, Raunak, Vikas, Ramasesh, Vinay, Prabhu, Vinay Uday, Padmakumar, Vishakh, Srikumar, Vivek, Fedus, William, Saunders, William, Zhang, William, Vossen, Wout, Ren, Xiang, Tong, Xiaoyu, Zhao, Xinran, Wu, Xinyi, Shen, Xudong, Yaghoobzadeh, Yadollah, Lakretz, Yair, Song, Yangqiu, Bahri, Yasaman, Choi, Yejin, Yang, Yichi, Hao, Yiding, Chen, Yifu, Belinkov, Yonatan, Hou, Yu, Hou, Yufang, Bai, Yuntao, Seid, Zachary, Zhao, Zhuoye, Wang, Zijian, Wang, Zijie J., Wang, Zirui, and Wu, Ziyi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting., Comment: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench
- Published
- 2022
5. Compute and Energy Consumption Trends in Deep Learning Inference
- Author
-
Desislavov, Radosvet, Martínez-Plumed, Fernando, and Hernández-Orallo, José
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The progress of some AI paradigms such as deep learning is said to be linked to an exponential growth in the number of parameters. There are many studies corroborating these trends, but does this translate into an exponential increase in energy consumption? In order to answer this question we focus on inference costs rather than training costs, as the former account for most of the computing effort, solely because of the multiplicative factors. Also, apart from algorithmic innovations, we account for more specific and powerful hardware (leading to higher FLOPS) that is usually accompanied with important energy efficiency optimisations. We also move the focus from the first implementation of a breakthrough paper towards the consolidated version of the techniques one or two year later. Under this distinctive and comprehensive perspective, we study relevant models in the areas of computer vision and natural language processing: for a sustained increase in performance we see a much softer growth in energy consumption than previously anticipated. The only caveat is, yet again, the multiplicative factor, as future AI increases penetration and becomes more pervasive., Comment: For a revised version and its published version refer to: Desislavov, Radosvet, Fernando Mart\'inez-Plumed, and Jos\'e Hern\'andez-Orallo. Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning. Sustainable Computing: Informatics and Systems, Volume 38, April 2023. (https://doi.org/10.1016/j.suscom.2023.100857)
- Published
- 2021
- Full Text
- View/download PDF
6. Assessing AI capabilities with education tests
- Author
-
Staneva, Mila, primary, Baret, Abel, additional, Aso-Mollar, Ángel, additional, Blass, Joseph, additional, Carrión Ponz, Salvador, additional, Conitzer, Vincent, additional, Cortes, Ulises, additional, Dasigi, Pradeep, additional, de Paula, Angel, additional, Galindo, Carlos, additional, Gobert, Janice, additional, Gonzàlez, Jordi, additional, Heintz, Fredrik, additional, Hendler, Jim, additional, Hendrycks, Daniel, additional, Hunter, Lawrence, additional, Izquierdo-Domenech, Juan, additional, Juarez, Maria, additional, Juraco Frias, Aina, additional, Keren, Aviv, additional, Koncel-Kedziorski, Rik, additional, Leake, David, additional, Loe, Bao Sheng, additional, Martinez-Plumed, Fernando, additional, Martin-Hammond, Aqueasha, additional, Matuszek, Cynthia, additional, Mestre Gascón, Antoni, additional, Moreno, Jose Andres, additional, Nakos, Constantine, additional, Olson, Taylor, additional, Rose, Carolyn, additional, Sarvazyan, Areg Mikael, additional, Scassellati, Brian, additional, Schellaert, Wout, additional, Strannegård, Claes, additional, Tan, Neset, additional, Taniguchi, Tadahiro, additional, Vold, Karina, additional, and Wooldridge, Michael, additional
- Published
- 2023
- Full Text
- View/download PDF
7. Fairness and Missing Values
- Author
-
Martínez-Plumed, Fernando, Ferri, Cèsar, Nieves, David, and Hernández-Orallo, José
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
The causes underlying unfair decision making are complex, being internalised in different ways by decision makers, other actors dealing with data and models, and ultimately by the individuals being affected by these decisions. One frequent manifestation of all these latent causes arises in the form of missing values: protected groups are more reluctant to give information that could be used against them, delicate information for some groups can be erased by human operators, or data acquisition may simply be less complete and systematic for minority groups. As a result, missing values and bias in data are two phenomena that are tightly coupled. However, most recent techniques, libraries and experimental results dealing with fairness in machine learning have simply ignored missing data. In this paper, we claim that fairness research should not miss the opportunity to deal properly with missing data. To support this claim, (1) we analyse the sources of missing data and bias, and we map the common causes, (2) we find that rows containing missing values are usually fairer than the rest, which should not be treated as the uncomfortable ugly data that different techniques and libraries get rid of at the first occasion, and (3) we study the trade-off between performance and fairness when the rows with missing values are used (either because the technique deals with them directly or by imputation methods). We end the paper with a series of recommended procedures about what to do with missing data when aiming for fair decision making., Comment: Preprint submitted to Decision Support Systems Journal
- Published
- 2019
8. Analysing Results from AI Benchmarks: Key Indicators and How to Obtain Them
- Author
-
Martínez-Plumed, Fernando and Hernández-Orallo, José
- Subjects
Computer Science - Artificial Intelligence - Abstract
Item response theory (IRT) can be applied to the analysis of the evaluation of results from AI benchmarks. The two-parameter IRT model provides two indicators (difficulty and discrimination) on the side of the item (or AI problem) while only one indicator (ability) on the side of the respondent (or AI agent). In this paper we analyse how to make this set of indicators dual, by adding a fourth indicator, generality, on the side of the respondent. Generality is meant to be dual to discrimination, and it is based on difficulty. Namely, generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. With the addition of generality, we see that this set of four key indicators can give us more insight on the results of AI benchmarks. In particular, we explore two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition. We provide some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions., Comment: This report is a preliminary version of a related paper with title "Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality", accepted for publication at IEEE Transactions on Games. Please refer to and cite the journal paper (https://doi.org/10.1109/TG.2018.2883773)
- Published
- 2018
- Full Text
- View/download PDF
9. General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation
- Author
-
Contreras-Ochando, Lidia, Ferri, César, Hernández-Orallo, José, Martínez-Plumed, Fernando, Ramírez-Quintana, María José, and Katayama, Susumu
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Databases - Abstract
Given one or two examples, humans are good at understanding how to solve a problem independently of its domain, because they are able to detect what the problem is and to choose the appropriate background knowledge according to the context. For instance, presented with the string "8/17/2017" to be transformed to "17th of August of 2017", humans will process this in two steps: (1) they recognise that it is a date and (2) they map the date to the 17th of August of 2017. Inductive Programming (IP) aims at learning declarative (functional or logic) programs from examples. Two key advantages of IP are the use of background knowledge and the ability to synthesise programs from a few input/output examples (as humans do). In this paper we propose to use IP as a means for automating repetitive data manipulation tasks, frequently presented during the process of {\em data wrangling} in many data manipulation problems. Here we show that with the use of general-purpose declarative (programming) languages jointly with generic IP systems and the definition of domain-specific knowledge, many specific data wrangling problems from different application domains can be automatically solved from very few examples. We also propose an integrated benchmark for data wrangling, which we share publicly for the community., Comment: 24 pages
- Published
- 2018
10. A multidisciplinary task-based perspective for evaluating the impact of AI autonomy and generality on the future of work
- Author
-
Fernández-Macías, Enrique, Gómez, Emilia, Hernández-Orallo, José, Loe, Bao Sheng, Martens, Bertin, Martínez-Plumed, Fernando, and Tolan, Songül
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,68T99 - Abstract
This paper presents a multidisciplinary task approach for assessing the impact of artificial intelligence on the future of work. We provide definitions of a task from two main perspectives: socio-economic and computational. We propose to explore ways in which we can integrate or map these perspectives, and link them with the skills or capabilities required by them, for humans and AI systems. Finally, we argue that in order to understand the dynamics of tasks, we have to explore the relevance of autonomy and generality of AI systems for the automation or alteration of the workplace., Comment: AEGAP2018 Workshop at ICML 2018, 7 pages, 1 table
- Published
- 2018
11. Between Progress and Potential Impact of AI: the Neglected Dimensions
- Author
-
Martínez-Plumed, Fernando, Avin, Shahar, Brundage, Miles, Dafoe, Allan, hÉigeartaigh, Sean Ó, and Hernández-Orallo, José
- Subjects
Computer Science - Artificial Intelligence - Abstract
We reframe the analysis of progress in AI by incorporating into an overall framework both the task performance of a system, and the time and resource costs incurred in the development and deployment of the system. These costs include: data, expert knowledge, human oversight, software resources, computing cycles, hardware and network facilities, and (what kind of) time. These costs are distributed over the life cycle of the system, and may place differing demands on different developers and users. The multidimensional performance and cost space we present can be collapsed to a single utility metric that measures the value of the system for different stakeholders. Even without a single utility function, AI advances can be generically assessed by whether they expand the Pareto surface. We label these types of costs as neglected dimensions of AI progress, and explore them using four case studies: Alpha* (Go, Chess, and other board games), ALE (Atari games), ImageNet (Image classification) and Virtual Personal Assistants (Siri, Alexa, Cortana, and Google Assistant). This broader model of progress in AI will lead to novel ways of estimating the potential societal use and impact of an AI system, and the establishment of milestones for future progress.
- Published
- 2018
12. CASP-DM: Context Aware Standard Process for Data Mining
- Author
-
Martínez-Plumed, Fernando, Contreras-Ochando, Lidia, Ferri, Cèsar, Flach, Peter, Hernández-Orallo, José, Kull, Meelis, Lachiche, Nicolas, and Ramírez-Quintana, María José
- Subjects
Computer Science - Databases - Abstract
We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs.
- Published
- 2017
13. BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation
- Author
-
Contreras-Ochando, Lidia, Ferri, César, Hernández-Orallo, José, Martínez-Plumed, Fernando, Ramírez-Quintana, María José, Katayama, Susumu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Brefeld, Ulf, editor, Fromont, Elisa, editor, Hotho, Andreas, editor, Knobbe, Arno, editor, Maathuis, Marloes, editor, and Robardet, Céline, editor
- Published
- 2020
- Full Text
- View/download PDF
14. Automated Data Transformation with Inductive Programming and Dynamic Background Knowledge
- Author
-
Contreras-Ochando, Lidia, Ferri, Cèsar, Hernández-Orallo, José, Martínez-Plumed, Fernando, Ramírez-Quintana, María José, Katayama, Susumu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Brefeld, Ulf, editor, Fromont, Elisa, editor, Hotho, Andreas, editor, Knobbe, Arno, editor, Maathuis, Marloes, editor, and Robardet, Céline, editor
- Published
- 2020
- Full Text
- View/download PDF
15. Forgetting and consolidation for incremental and cumulative knowledge acquisition systems
- Author
-
Martínez-Plumed, Fernando, Ferri, Cèsar, Hernández-Orallo, José, and Ramírez-Quintana, María José
- Subjects
Computer Science - Artificial Intelligence - Abstract
The application of cognitive mechanisms to support knowledge acquisition is, from our point of view, crucial for making the resulting models coherent, efficient, credible, easy to use and understandable. In particular, there are two characteristic features of intelligence that are essential for knowledge development: forgetting and consolidation. Both plays an important role in knowledge bases and learning systems to avoid possible information overflow and redundancy, and in order to preserve and strengthen important or frequently used rules and remove (or forget) useless ones. We present an incremental, long-life view of knowledge acquisition which tries to improve task after task by determining what to keep, what to consolidate and what to forget, overcoming The Stability-Plasticity dilemma. In order to do that, we rate rules by introducing several metrics through the first adaptation, to our knowledge, of the Minimum Message Length (MML) principle to a coverage graph, a hierarchical assessment structure which treats evidence and rules in a unified way. The metrics are not only used to forget some of the worst rules, but also to set a consolidation process to promote those selected rules to the knowledge base, which is also mirrored by a demotion system. We evaluate the framework with a series of tasks in a chess rule learning domain.
- Published
- 2015
16. SALER: A Data Science Solution to Detect and Prevent Corruption in Public Administration
- Author
-
Martínez-Plumed, Fernando, Casamayor, Juan Carlos, Ferri, Cèsar, Gómez, Jon Ander, Vendrell Vidal, Eduardo, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Alzate, Carlos, editor, Monreale, Anna, editor, Assem, Haytham, editor, Bifet, Albert, editor, Buda, Teodora Sandra, editor, Caglayan, Bora, editor, Drury, Brett, editor, García-Martín, Eva, editor, Gavaldà, Ricard, editor, Koprinska, Irena, editor, Kramer, Stefan, editor, Lavesson, Niklas, editor, Madden, Michael, editor, Molloy, Ian, editor, Nicolae, Maria-Irina, editor, and Sinn, Mathieu, editor
- Published
- 2019
- Full Text
- View/download PDF
17. On the definition of a general learning system with user-defined operators
- Author
-
Martínez-Plumed, Fernando, Ferri, Cèsar, Hernández-Orallo, José, and Ramírez-Quintana, María-José
- Subjects
Computer Science - Learning - Abstract
In this paper, we push forward the idea of machine learning systems whose operators can be modified and fine-tuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing operators affect how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is defined as a choice of operator and rule. As a result, the architecture can be seen as a 'system for writing machine learning systems' or to explore new operators where the policy reuse (as a kind of transfer learning) is allowed. States and actions are represented in a Q matrix which is actually a table, from which a supervised model is learnt. This makes it possible to have a more flexible mapping between old and new problems, since we work with an abstraction of rules and actions. We include some examples sharing reuse and the application of the system gErl to IQ problems. In order to evaluate gErl, we will test it against some structured problems: a selection of IQ test tasks and some experiments on some structured prediction problems (list patterns).
- Published
- 2013
18. Item response theory in AI: Analysing machine learning classifiers at the instance level
- Author
-
Martínez-Plumed, Fernando, Prudêncio, Ricardo B.C., Martínez-Usó, Adolfo, and Hernández-Orallo, José
- Published
- 2019
- Full Text
- View/download PDF
19. Modelling Machine Learning Models
- Author
-
Fabra-Boluda, Raül, Ferri, Cèsar, Hernández-Orallo, José, Martínez-Plumed, Fernando, Ramírez-Quintana, M. José, Magnani, Lorenzo, Series Editor, and Müller, Vincent C., editor
- Published
- 2018
- Full Text
- View/download PDF
20. Mapping Intelligence: Requirements and Possibilities
- Author
-
Bhatnagar, Sankalp, Alexandrova, Anna, Avin, Shahar, Cave, Stephen, Cheke, Lucy, Crosby, Matthew, Feyereisl, Jan, Halina, Marta, Loe, Bao Sheng, Ó hÉigeartaigh, Seán, Martínez-Plumed, Fernando, Price, Huw, Shevlin, Henry, Weller, Adrian, Winfield, Alan, Hernández-Orallo, José, Magnani, Lorenzo, Series Editor, and Müller, Vincent C., editor
- Published
- 2018
- Full Text
- View/download PDF
21. Identifying the Machine Learning Family from Black-Box Models
- Author
-
Fabra-Boluda, Raül, Ferri, Cèsar, Hernández-Orallo, José, Martínez-Plumed, Fernando, Ramírez-Quintana, María José, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Herrera, Francisco, editor, Damas, Sergio, editor, Montes, Rosana, editor, Alonso, Sergio, editor, Cordón, Óscar, editor, González, Antonio, editor, and Troncoso, Alicia, editor
- Published
- 2018
- Full Text
- View/download PDF
22. A computational analysis of general intelligence tests for evaluating cognitive development
- Author
-
Martínez-Plumed, Fernando, Ferri, Cèsar, Hernández-Orallo, José, and Ramírez-Quintana, María José
- Published
- 2017
- Full Text
- View/download PDF
23. Computer models solving intelligence test problems: Progress and implications
- Author
-
Hernández-Orallo, José, Martínez-Plumed, Fernando, Schmid, Ute, Siebers, Michael, and Dowe, David L.
- Published
- 2016
- Full Text
- View/download PDF
24. Rethink reporting of evaluation results in AI
- Author
-
Burnell, Ryan, primary, Schellaert, Wout, additional, Burden, John, additional, Ullman, Tomer D., additional, Martinez-Plumed, Fernando, additional, Tenenbaum, Joshua B., additional, Rutar, Danaja, additional, Cheke, Lucy G., additional, Sohl-Dickstein, Jascha, additional, Mitchell, Melanie, additional, Kiela, Douwe, additional, Shanahan, Murray, additional, Voorhees, Ellen M., additional, Cohn, Anthony G., additional, Leibo, Joel Z., additional, and Hernandez-Orallo, Jose, additional
- Published
- 2023
- Full Text
- View/download PDF
25. Learning with Configurable Operators and RL-Based Heuristics
- Author
-
Martínez-Plumed, Fernando, Ferri, Cèsar, Hernández-Orallo, José, Ramírez-Quintana, María José, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Appice, Annalisa, editor, Ceci, Michelangelo, editor, Loglisci, Corrado, editor, Manco, Giuseppe, editor, Masciari, Elio, editor, and Ras, Zbigniew W., editor
- Published
- 2013
- Full Text
- View/download PDF
26. Newton Trees
- Author
-
Martínez-Plumed, Fernando, Estruch, Vicent, Ferri, César, Hernández-Orallo, José, Ramírez-Quintana, María José, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, and Li, Jiuyong, editor
- Published
- 2011
- Full Text
- View/download PDF
27. A New AI Evaluation Cosmos: Ready to Play the Game?
- Author
-
Hernandez-Orallo, Jose, Baroni, Marco, Bieger, Jordi, Chmait, Nader, Dowe, David L., Hofmann, Katja, Martinez-Plumed, Fernando, Strannegard, Claes, and Thorissons, Kristinn R.
- Subjects
Artificial intelligence -- Innovations ,Artificial intelligence ,Business - Abstract
We report on a series of new platforms and events dealing with AI evaluation that may change the way in which AI systems are compared and their progress is measured. [...]
- Published
- 2017
28. Project-Based Learning for Scaffolding Data Scientists’ Skills
- Author
-
Martinez-Plumed, Fernando, primary and Hernandez-Orallo, Jose, additional
- Published
- 2021
- Full Text
- View/download PDF
29. CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
- Author
-
Martinez-Plumed, Fernando, primary, Contreras-Ochando, Lidia, additional, Ferri, Cesar, additional, Hernandez-Orallo, Jose, additional, Kull, Meelis, additional, Lachiche, Nicolas, additional, Ramirez-Quintana, Maria Jose, additional, and Flach, Peter, additional
- Published
- 2021
- Full Text
- View/download PDF
30. AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues
- Author
-
Hernandez-Orallo, Jose, Martinez-Plumed, Fernando, Avin, Shahar, Whittlestone, Jessica, and O H´Eigeartaigh, Sean
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,GeneralLiterature_MISCELLANEOUS - Abstract
AI safety often analyses a risk or safety issue, such as interruptibility, under a particular AI paradigm, such as reinforcement learning. But what is an AI paradigm and how does it affect the understanding and implications of the safety issue? Is AI safety research covering the most representative paradigms and the right combinations of paradigms with safety issues? Will current research directions in AI safety be able to anticipate more capable and powerful systems yet to come? In this paper we analyse these questions, introducing a distinction between two types of paradigms in AI: artefacts and techniques. We then use experimental data of research and media documents from AI Topics, an official publication of the AAAI, to examine how safety research is distributed across artefacts and techniques. We observe that AI safety research is not sufficiently anticipatory, and is heavily weighted towards certain research paradigms. We identify a need for AI safety to be more explicit about the artefacts and techniques for which a particular issue may be applicable, in order to identify gaps and cover a broader range of issues.
- Published
- 2020
- Full Text
- View/download PDF
31. Dual Indicators to Analyze AI Benchmarks: Difficulty, Discrimination, Ability, and Generality
- Author
-
Martinez-Plumed, Fernando, primary and Hernandez-Orallo, Jose, additional
- Published
- 2020
- Full Text
- View/download PDF
32. Artificial Intelligence at the JRC: 2nd workshop on Artificial Intelligence at the JRC, Ispra 5th July 2019
- Author
-
NATIVI STEFANO, ANASTASAKIS KONSTANTINOS, ASTURIOL BOFILL DAVID, BALAHUR-DOBRESCU ALEXANDRA, BARBAGLIA LUCA, BAUMANN KATHRIN, BESLAY LAURENT, BREMER SUSANNE, CARDONA MELISANDE, CASTILLO CARLOS, CHARISI VASILIKI, CONSOLI SERGIO, CORBAN CHRISTINA, D'ANDRIMONT RAPHAEL, DE PRATO GIUDITTA, DECEUNINCK PIERRE, DELIPETREV BLAGOJ, DEVOS WIM, DOTTORI FRANCESCO, DUCH BROWN NESTOR, FERRARA PASQUALE, FERRI STEFANO, GOMEZ GUTIERREZ EMILIA, GOMEZ LOSADA ALVARO, HALAMODA KENZAOUI BLANKA, HALKIA STAMATIA, HAMON RONAN, HRADEC JIRI, JENKINSON GABRIEL, JUNKLEWITZ HENRIK, KALAS MILAN, KEMPER THOMAS, LEMOINE GUIDO, LOPEZ COBO MONTSERRAT, LORINI VALERIO, MANZAN SEBASTIANO, MARTINEZ SANCHEZ LAURA, MARTINEZ PLUMED FERNANDO, MILENOV PAVEL, NAI FOVINO IGOR, NAPPO DOMENICO, NOCE LUCIA, PAPAZOGLOU MICHAIL, PETRILLO MAURO, PIOVESAN JACOPO, PUERTAS GALLARDO ANTONIO, RIGHI RICCARDO, ROLLAND ETIENNE, SABO FILIP, SALAMON PETER, SAMOILI SOFIA, SANCHEZ MARTIN JOSE IGNACIO, SANCHEZ BELENGUER CARLOS, SEQUEIRA VITOR, SOILLE PIERRE, SYRRIS VASILEIOS, THOMAKOS DIMITRIOS, TOLAN SONGUL, TOSETTI ELISA, VAN DAMME MARIE-SOPHIE, VAN DER VELDE MARIJN, VAZQUEZ-PRADA BAILLET MIGUEL, WHELAN MAURICE, WITTWEHR CLEMENS, WOLFART ERIK, WORTH ANDREW, YORDANOV MOMCHIL, and NATIVI STEFANO
- Abstract
This document presents the contributions discussed at the second institutional workshop on Artificial Intelligence (AI), organized by the Joint Research Centre (JRC) of the European Commission. This workshop was held on 05th July 2019 at the premises of the JRC in Ispra (Italy), with video-conference to all JRC's sites. The workshop aimed to gather JRC specialists on AI and Big Data and share their experience, identify opportunities for meeting EC demands on AI, and explore synergies among the different JRC's working groups on AI. In comparison with the first event, according to the JRC Director General Vladimír Šuchav, the activities and results presented in this second workshop demonstrated a significant development of AI research and applications by JRC in different policy areas. He suggested to think about replicating the event at the premises of diverse policy DGs in order to present and discuss the clear opportunities created by JRC activities. After the opening speech by the JRC Director General Vladimír Šuchav, the research and innovation presentation were anticipated by two presentations by Alessandro Annoni and Stefano Nativi. The first presentation dealt with the results of one year of AI@JRC and six months of fully operational AI&BD community of practice1. The second presentation reported the results of the AI competences survey at JRC. The research and innovation contributions consisted in flash presentations (5 minutes) covering a wide range of areas. This report is structured according to the diverse domain areas addressed by the presenters. While the first part of the workshop was mainly informative, in the second part we collectively discussed about how to move on and evolve the AI&BD community of practice., JRC.B.6-Digital Economy
- Published
- 2020
33. AI Watch: Methodology to Monitor the Evolution of AI Technologies
- Author
-
MARTINEZ PLUMED FERNANDO, HERNÁNDEZ-ORALLO JOSÉ, and GOMEZ GUTIERREZ EMILIA
- Abstract
In this report, we present a methodology to assess the evolution of AI technologies in the context of the AI WATCH initiative. The methodology is centred on building the AIcollaboratory, a data-driven framework to collect and explore data about AI results, progress and ultimately capabilities. From the collaborator framework we later extract qualitative information related to the state of the art, challenges and trends of AI research and development. This report first describes the administrative context of study, followed by the proposed methodology to build the AIcollaboratory framework and exploit it for qualitative assessment. In addition, we present some preliminary results of this monitoring process and some conclusions and suggestions for future work. This document is an internal report of the AI WATCH initiative, to be agreed for future work on Task 2 of the administrative arrangement between the Joint Research Centre and DG CNECT., JRC.B.6-Digital Economy
- Published
- 2020
34. AI Watch: Assessing Technology Readiness Levels for Artificial Intelligence
- Author
-
MARTINEZ PLUMED FERNANDO, GOMEZ GUTIERREZ EMILIA, and HERNÁNDEZ-ORALLO JOSÉ
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,GeneralLiterature_MISCELLANEOUS - Abstract
Artificial Intelligence (AI) offers the potential to transform our lives in radical ways. However, the main unanswered questions about this foreseen transformation are when and how this is going to happen. Not only do we lack the tools to determine what achievements will be attained in the near future, but we even underestimate what various technologies in AI are capable of today. Many so-called breakthroughs in AI are simply associated with highly-cited research papers or good performance on some particular benchmarks. Certainly, the translation from papers and benchmark performance to products is faster in AI than in other non-digital sectors. However, it is still the case that research breakthroughs do not directly translate to a technology that is ready to use in real-world environments. This document describes an exemplar-based methodology to categorise and assess several AI research and development technologies, by mapping them into Technology Readiness Levels (TRL) (e.g., maturity and availability levels). We first interpret the nine TRLs in the context of AI and identify different categories in AI to which they can be assigned. We then introduce new bidimensional plots, called readiness-vs-generality charts, where we see that higher TRLs are achievable for low-generality technologies focusing on narrow or specific abilities, while low TRLs are still out of reach for more general capabilities. We include numerous examples of AI technologies in a variety of fields, and show their readiness-vs-generality charts, serving as exemplars. Finally, we use the dynamics of several AI technology exemplars at different generality layers and moments of time to forecast some short-term and mid-term trends for AI., JRC.B.6-Digital Economy
- Published
- 2020
35. AI Watch. Defining Artificial Intelligence. Towards an operational definition and taxonomy of artificial intelligence
- Author
-
Samoili, Sofia, Cobo, Montserrat Lopez, Gomez, Emilia, De Prato, Giuditta, Martinez-Plumed, Fernando, and Delipetrev, Blagoj
- Subjects
Computer and information sciences - Abstract
This report proposes an operational definition of artificial intelligence to be adopted in the context of AI Watch, the Commission knowledge service to monitor the development, uptake and impact of artificial intelligence for Europe. The definition, which will be used as a basis for the AI Watch monitoring activity, is established by means of a flexible scientific methodology that allows regular revision. The operational definition is constituted by a concise taxonomy and a list of keywords that characterise the core domains of the AI research field, and transversal topics such as applications of the former or ethical and philosophical considerations, in line with the wider monitoring objective of AI Watch. The AI taxonomy is designed to inform the AI landscape analysis and will expectedly detect AI applications in neighbour technological domains such as robotics (in a broader sense), neuroscience or internet of things. The starting point to develop the operational definition is the definition of AI adopted by the High Level Expert Group on artificial intelligence.To derive this operational definition we have followed a mixed methodology. On one hand, we apply natural language processing methods to a large set of AI literature. On the other hand, we carry out a qualitative analysis on 55 key documents including artificial intelligence definitions from three complementary perspectives: policy, research and industry. A valuable contribution of this work is the collection of definitions developed between 1955 and 2019, and the summarisation of the main features of the concept of artificial intelligence as reflected in the relevant literature.
- Published
- 2020
36. AI WATCH. Defining Artificial Intelligence
- Author
-
SAMOILI SOFIA, LOPEZ COBO MONTSERRAT, GOMEZ GUTIERREZ EMILIA, DE PRATO GIUDITTA, MARTINEZ-PLUMED FERNANDO, and DELIPETREV BLAGOJ
- Abstract
This report proposes an operational definition of artificial intelligence to be adopted in the context of AI Watch, the Commission knowledge service to monitor the development, uptake and impact of artificial intelligence for Europe. The definition, which will be used as a basis for the AI Watch monitoring activity, is established by means of a flexible scientific methodology that allows regular revision. The operational definition is constituted by a concise taxonomy and a list of keywords that characterise the core domains of the AI research field, and transversal topics such as applications of the former or ethical and philosophical considerations, in line with the wider monitoring objective of AI Watch. The AI taxonomy is designed to inform the AI landscape analysis and will expectedly detect AI applications in neighbour technological domains such as robotics (in a broader sense), neuroscience or internet of things. The starting point to develop the operational definition is the definition of AI adopted by the High Level Expert Group on artificial intelligence. To derive this operational definition we have followed a mixed methodology. On one hand, we apply natural language processing methods to a large set of AI literature. On the other hand, we carry out a qualitative analysis on 55 key documents including artificial intelligence definitions from three complementary perspectives: policy, research and industry. A valuable contribution of this work is the collection of definitions developed between 1955 and 2019, and the summarisation of the main features of the concept of artificial intelligence as reflected in the relevant literature., JRC.B.6-Digital Economy
- Published
- 2020
37. A Knowledge Growth and Consolidation Framework for Lifelong Machine Learning Systems
- Author
-
Martinez Plumed, Fernando, primary, Ferri, Cesar, additional, Hernandez Orallo, Jose, additional, and Ramirez Quintana, Maria Jose, additional
- Published
- 2014
- Full Text
- View/download PDF
38. Newton Trees
- Author
-
Martinez-Plumed, Fernando, Estruch, Vicent, Ferri, Cesar, Jose Hernandez-Orallo, Jose Ramirez-Quintana, Maria, and Li, Jy
39. A framework for categorising AI evaluation instruments
- Author
-
Hernandez-Orallo, Jose, Cheke, Lucy, Tenenbaum, Joshua, Ullman, Tomer, Martinez-Plumed, Fernando, Rutar, Danaja, Burden, John, Burnell, Ryan, Schellaert, Wout, Cohn, Anthony G, Hernández-Orallo, José, Mboli, Julius Sechang, Moros-Daval, Yael, Xiang, Zhiliang, Zhou, Lexin, Hernandez-Orallo, Jose, Cheke, Lucy, Tenenbaum, Joshua, Ullman, Tomer, Martinez-Plumed, Fernando, Rutar, Danaja, Burden, John, Burnell, Ryan, Schellaert, Wout, Cohn, Anthony G, Hernández-Orallo, José, Mboli, Julius Sechang, Moros-Daval, Yael, Xiang, Zhiliang, and Zhou, Lexin
- Abstract
The current and future capabilities of Artificial Intelligence (AI) are typically assessed with an ever increasing number of benchmarks, competitions, tests and evaluation standards, which are meant to work as AI evaluation instruments (EI). These EIs are not only increasing in number, but also in complexity and diversity, making it hard to understand this evaluation landscape in a meaningful way. In this paper we present an approach for categorising EIs using a set of 18 facets, accompanied by a rubric to allow anyone to apply the framework to any existing or new EI. We apply the rubric to 23 EIs in different domains through a team of raters, and analyse how consistent the rubric is and how well it works to distinguish between EIs and map the evaluation landscape in AI.
40. Training Data Scientists through Project-based Learning_supp1-3302954.pdf
- Author
-
Martinez-Plumed, Fernando, primary
- Full Text
- View/download PDF
41. A framework for categorising AI evaluation instruments
- Author
-
Cohn, Anthony G, Hernández-Orallo, José, Mboli, Julius Sechang, Moros-Daval, Yael, Xiang, Zhiliang, Zhou, Lexin, Hernandez-Orallo, Jose, Cheke, Lucy, Tenenbaum, Joshua, Ullman, Tomer, Martinez-Plumed, Fernando, Rutar, Danaja, Burden, John, Burnell, Ryan, and Schellaert, Wout
- Subjects
QA75 - Abstract
The current and future capabilities of Artificial Intelligence (AI) are typically assessed with an ever increasing number of benchmarks, competitions, tests and evaluation standards, which are meant to work as AI evaluation instruments (EI). These EIs are not only increasing in number, but also in complexity and diversity, making it hard to understand this evaluation landscape in a meaningful way. In this paper we present an approach for categorising EIs using a set of 18 facets, accompanied by a rubric to allow anyone to apply the framework to any existing or new EI. We apply the rubric to 23 EIs in different domains through a team of raters, and analyse how consistent the rubric is and how well it works to distinguish between EIs and map the evaluation landscape in AI.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.