Author: "Rastogi, Charvi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Rastogi, Charvi"' showing total 29 results

Start Over Author "Rastogi, Charvi"

29 results on '"Rastogi, Charvi"'

1. Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups

Author: Rastogi, Charvi, Teh, Tian Huey, Mishra, Pushkar, Patel, Roma, Ashwood, Zoe, Davani, Aida Mostafazadeh, Diaz, Mark, Paganini, Michela, Parrish, Alicia, Wang, Ding, Prabhakaran, Vinodkumar, Aroyo, Lora, and Rieser, Verena
Subjects: Computer Science - Artificial Intelligence
Abstract: AI systems crucially rely on human ratings, but these ratings are often aggregated, obscuring the inherent diversity of perspectives in real-world phenomenon. This is particularly concerning when evaluating the safety of generative AI, where perceptions and associated harms can vary significantly across socio-cultural contexts. While recent research has studied the impact of demographic differences on annotating text, there is limited understanding of how these subjective variations affect multimodal safety in generative AI. To address this, we conduct a large-scale study employing highly-parallel safety ratings of about 1000 text-to-image (T2I) generations from a demographically diverse rater pool of 630 raters balanced across 30 intersectional groups across age, gender, and ethnicity. Our study shows that (1) there are significant differences across demographic groups (including intersectional groups) on how severe they assess the harm to be, and that these differences vary across different types of safety violations, (2) the diverse rater pool captures annotation patterns that are substantially different from expert raters trained on specific set of safety policies, and (3) the differences we observe in T2I safety are distinct from previously documented group level differences in text-based safety tasks. To further understand these varying perspectives, we conduct a qualitative analysis of the open-ended explanations provided by raters. This analysis reveals core differences into the reasons why different groups perceive harms in T2I generations. Our findings underscore the critical need for incorporating diverse perspectives into safety evaluation of generative AI ensuring these systems are truly inclusive and reflect the values of all users., Comment: 20 pages, 7 figures
Published: 2024

2. Imagen 3

Author: Imagen-Team-Google, Baldridge, Jason, Bauer, Jakob, Bhutani, Mukul, Brichtova, Nicole, Bunner, Andrew, Chan, Kelvin, Chen, Yichang, Dieleman, Sander, Du, Yuqing, Eaton-Rosen, Zach, Fei, Hongliang, de Freitas, Nando, Gao, Yilin, Gladchenko, Evgeny, Colmenarejo, Sergio Gómez, Guo, Mandy, Haig, Alex, Hawkins, Will, Hu, Hexiang, Huang, Huilian, Igwe, Tobenna Peter, Kaplanis, Christos, Khodadadeh, Siavash, Kim, Yelin, Konyushkova, Ksenia, Langner, Karol, Lau, Eric, Luo, Shixin, Mokrá, Soňa, Nandwani, Henna, Onoe, Yasumasa, Oord, Aäron van den, Parekh, Zarana, Pont-Tuset, Jordi, Qi, Hang, Qian, Rui, Ramachandran, Deepak, Rane, Poorva, Rashwan, Abdullah, Razavi, Ali, Riachi, Robert, Srinivasan, Hansa, Srinivasan, Srivatsan, Strudel, Robin, Uria, Benigno, Wang, Oliver, Wang, Su, Waters, Austin, Wolff, Chris, Wright, Auriel, Xiao, Zhisheng, Xiong, Hao, Xu, Keyang, van Zee, Marc, Zhang, Junlin, Zhang, Katie, Zhou, Wenlei, Zolna, Konrad, Aboubakar, Ola, Akbulut, Canfer, Akerlund, Oscar, Albuquerque, Isabela, Anderson, Nina, Andreetto, Marco, Aroyo, Lora, Bariach, Ben, Barker, David, Ben, Sherry, Berman, Dana, Biles, Courtney, Blok, Irina, Botadra, Pankil, Brennan, Jenny, Brown, Karla, Buckley, John, Bunel, Rudy, Bursztein, Elie, Butterfield, Christina, Caine, Ben, Carpenter, Viral, Casagrande, Norman, Chang, Ming-Wei, Chang, Solomon, Chaudhuri, Shamik, Chen, Tony, Choi, John, Churbanau, Dmitry, Clement, Nathan, Cohen, Matan, Cole, Forrester, Dektiarev, Mikhail, Du, Vincent, Dutta, Praneet, Eccles, Tom, Elue, Ndidi, Feden, Ashley, Fruchter, Shlomi, Garcia, Frankie, Garg, Roopal, Ge, Weina, Ghazy, Ahmed, Gipson, Bryant, Goodman, Andrew, Górny, Dawid, Gowal, Sven, Gupta, Khyatti, Halpern, Yoni, Han, Yena, Hao, Susan, Hayes, Jamie, Hertz, Amir, Hirst, Ed, Hou, Tingbo, Howard, Heidi, Ibrahim, Mohamed, Ike-Njoku, Dirichi, Iljazi, Joana, Ionescu, Vlad, Isaac, William, Jana, Reena, Jennings, Gemma, Jenson, Donovon, Jia, Xuhui, Jones, Kerry, Ju, Xiaoen, Kajic, Ivana, Ayan, Burcu Karagol, Kelly, Jacob, Kothawade, Suraj, Kouridi, Christina, Ktena, Ira, Kumakaw, Jolanda, Kurniawan, Dana, Lagun, Dmitry, Lavitas, Lily, Lee, Jason, Li, Tao, Liang, Marco, Li-Calis, Maggie, Liu, Yuchi, Alberca, Javier Lopez, Lu, Peggy, Lum, Kristian, Ma, Yukun, Malik, Chase, Mellor, John, Mosseri, Inbar, Murray, Tom, Nematzadeh, Aida, Nicholas, Paul, Oliveira, João Gabriel, Ortiz-Jimenez, Guillermo, Paganini, Michela, Paine, Tom Le, Paiss, Roni, Parrish, Alicia, Peckham, Anne, Peswani, Vikas, Petrovski, Igor, Pfaff, Tobias, Pirozhenko, Alex, Poplin, Ryan, Prabhu, Utsav, Qi, Yuan, Rahtz, Matthew, Rashtchian, Cyrus, Rastogi, Charvi, Raul, Amit, Rebuffi, Sylvestre-Alvise, Ricco, Susanna, Riedel, Felix, Robinson, Dirk, Rohatgi, Pankaj, Rosgen, Bill, Rumbley, Sarah, Ryu, Moonkyung, Salgado, Anthony, Singla, Sahil, Schroff, Florian, Schumann, Candice, Shah, Tanmay, Shillingford, Brendan, Shivakumar, Kaushik, Shtatnov, Dennis, Singer, Zach, Sluzhaev, Evgeny, Sokolov, Valerii, Sottiaux, Thibault, Stimberg, Florian, Stone, Brad, Stutz, David, Su, Yu-Chuan, Tabellion, Eric, Tang, Shuai, Tao, David, Thomas, Kurt, Thornton, Gregory, Toor, Andeep, Udrescu, Cristian, Upadhyay, Aayush, Vasconcelos, Cristina, Vasiloff, Alex, Voynov, Andrey, Walker, Amanda, Wang, Luyu, Wang, Miaosen, Wang, Simon, Wang, Stanley, Wang, Qifei, Wang, Yuxiao, Weisz, Ágoston, Wiles, Olivia, Wu, Chenxia, Xu, Xingyu Federico, Xue, Andrew, Yang, Jianbo, Yu, Luo, Yurtoglu, Mete, Zand, Ali, Zhang, Han, Zhang, Jiageng, Zhao, Catherine, Zhaxybay, Adilet, Zhou, Miao, Zhu, Shengqi, Zhu, Zhenkai, Bloxwich, Dawn, Bordbar, Mahyar, Cobo, Luis C., Collins, Eli, Dai, Shengyang, Doshi, Tulsee, Dragan, Anca, Eck, Douglas, Hassabis, Demis, Hsiao, Sissie, Hume, Tom, Kavukcuoglu, Koray, King, Helen, Krawczyk, Jack, Li, Yeqing, Meier-Hellstern, Kathy, Orban, Andras, Pinsky, Yury, Subramanya, Amar, Vinyals, Oriol, Yu, Ting, and Zwols, Yori
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
Published: 2024

3. A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions

Author: Rastogi, Charvi, Song, Xiangchen, Jin, Zhijing, Stelmakh, Ivan, Daumé III, Hal, Zhang, Kun, and Shah, Nihar B.
Subjects: Computer Science - Computers and Society, Computer Science - Digital Libraries
Abstract: Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one with anonymous discussions and the other with non-anonymous discussions, and conduct an anonymous survey of all reviewers, to address the following questions: 1. Do reviewers discuss more in one of the conditions? Marginally more in anonymous (n = 2281, p = 0.051). 2. Does seniority have more influence on final decisions when non-anonymous? Yes, the decisions are closer to senior reviewers' scores in the non-anonymous condition than in anonymous (n = 484, p = 0.04). 3. Are reviewers more polite in one of the conditions? No significant difference in politeness of reviewers' text-based responses (n = 1125, p = 0.72). 4. Do reviewers' self-reported experiences differ across the two conditions? No significant difference for each of the five questions asked (n = 132 and p > 0.3). 5. Do reviewers prefer one condition over the other? Yes, there is a weak preference for anonymous discussions (n = 159 and Cohen's d= 0.25). 6. What do reviewers consider important to make policy on anonymity among reviewers? Reviewers' feeling of safety in expressing their opinions was rated most important, while polite communication among reviewers was rated least important (n = 159). 7. Have reviewers experienced dishonest behavior due to non-anonymity in discussions? Yes, roughly 7% of respondents answered affirmatively (n = 167). Overall, this experiment reveals evidence supporting an anonymous discussion setup in the peer-review process, in terms of the evaluation criteria considered., Comment: 18 pages, 4 figures, 3 tables
Published: 2024

4. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

Author: Quaye, Jessica, Parrish, Alicia, Inel, Oana, Rastogi, Charvi, Kirk, Hannah Rose, Kahng, Minsuk, van Liemt, Erin, Bartolo, Max, Tsang, Jess, White, Justin, Clement, Nathan, Mosquera, Rafael, Ciro, Juan, Reddi, Vijay Janapa, and Aroyo, Lora
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover. To this end, we built the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing a diverse set of implicitly adversarial prompts. We have assembled a suite of state-of-the-art T2I models, employed a simple user interface to identify and annotate harms, and engaged diverse populations to capture long-tail safety issues that may be overlooked in standard testing. The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models. In this paper, we present an in-depth account of our methodology, a systematic study of novel attack strategies and discussion of safety failures revealed by challenge participants. We also release a companion visualization tool for easy exploration and derivation of insights from the dataset. The first challenge round resulted in over 10k prompt-image pairs with machine annotations for safety. A subset of 1.5k samples contains rich human annotations of harm types and attack styles. We find that 14% of images that humans consider harmful are mislabeled as ``safe'' by machines. We have identified new attack strategies that highlight the complexity of ensuring T2I model robustness. Our findings emphasize the necessity of continual auditing and adaptation as new vulnerabilities emerge. We are confident that this work will enable proactive, iterative safety assessments and promote responsible development of T2I models., Comment: 10 pages, 6 figures
Published: 2024

5. Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Author: Parrish, Alicia, Kirk, Hannah Rose, Quaye, Jessica, Rastogi, Charvi, Bartolo, Max, Inel, Oana, Ciro, Juan, Mosquera, Rafael, Howard, Addison, Cukierski, Will, Sculley, D., Reddi, Vijay Janapa, and Aroyo, Lora
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, 14J68 (Primary)
Abstract: The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors inherited from pretraining on uncurated internet-scraped datasets thus have the potential to cause wide-reaching harm, for example, through generated images which are violent, sexually explicit, or contain biased and derogatory stereotypes. Despite this risk of harm, we lack systematic and structured evaluation datasets to scrutinize model behavior, especially adversarial attacks that bypass existing safety filters. A typical bottleneck in safety evaluation is achieving a wide coverage of different types of challenging examples in the evaluation set, i.e., identifying 'unknown unknowns' or long-tail problems. To address this need, we introduce the Adversarial Nibbler challenge. The goal of this challenge is to crowdsource a diverse set of failure modes and reward challenge participants for successfully finding safety vulnerabilities in current state-of-the-art T2I models. Ultimately, we aim to provide greater awareness of these issues and assist developers in improving the future safety and reliability of generative AI models. Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.
Published: 2023

6. Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Author: Rastogi, Charvi, Ribeiro, Marco Tulio, King, Nicholas, and Amershi, Saleema
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures. In this work, we draw upon literature in human-AI collaboration and sensemaking, and conduct interviews with research experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro and Lundberg, 2022), which is powered by a generative large language model (LLM). Through the design process we highlight the importance of sensemaking and human-AI communication to leverage complementary strengths of humans and generative models in collaborative auditing. To evaluate the effectiveness of the augmented tool, AdaTest++, we conduct user studies with participants auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment analysis model. Qualitative analysis shows that AdaTest++ effectively leverages human strengths such as schematization, hypothesis formation and testing. Further, with our tool, participants identified a variety of failures modes, covering 26 different topics over 2 tasks, that have been shown before in formal audits and also those previously under-reported., Comment: 21 pages, 3 figures
Published: 2023

7. How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

Author: Rastogi, Charvi, Stelmakh, Ivan, Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman, Xue, Zhenyu, Daumé III, Hal, Pierson, Emma, and Shah, Nihar B.
Subjects: Computer Science - Machine Learning, Computer Science - Databases, Computer Science - Digital Libraries
Abstract: How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.
Published: 2022

8. DataPerf: Benchmarks for Data-Centric AI Development

Author: Mazumder, Mark, Banbury, Colby, Yao, Xiaozhe, Karlaš, Bojan, Rojas, William Gaviria, Diamos, Sudnya, Diamos, Greg, He, Lynn, Parrish, Alicia, Kirk, Hannah Rose, Quaye, Jessica, Rastogi, Charvi, Kiela, Douwe, Jurado, David, Kanter, David, Mosquera, Rafael, Ciro, Juan, Aroyo, Lora, Acun, Bilge, Chen, Lingjiao, Raje, Mehul Smriti, Bartolo, Max, Eyuboglu, Sabri, Ghorbani, Amirata, Goodman, Emmett, Inel, Oana, Kane, Tariq, Kirkpatrick, Christine R., Kuo, Tzu-Sheng, Mueller, Jonas, Thrush, Tristan, Vanschoren, Joaquin, Warren, Margaret, Williams, Adina, Yeung, Serena, Ardalani, Newsha, Paritosh, Praveen, Bat-Leah, Lilith, Zhang, Ce, Zou, James, Wu, Carole-Jean, Coleman, Cody, Ng, Andrew, Mattson, Peter, and Reddi, Vijay Janapa
Subjects: Computer Science - Machine Learning
Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry., Comment: NeurIPS 2023 Datasets and Benchmarks Track
Published: 2022

9. Cite-seeing and reviewing: A study on citation bias in peer review.

Author: Stelmakh, Ivan, Rastogi, Charvi, Liu, Ryan, Chawla, Shuchi, Shah, Nihar, and Echenique, Federico
Subjects: Humans, Prospective Studies, Peer Review, Bias, Research Personnel, Machine Learning, Peer Review, Research
Abstract: Citations play an important role in researchers careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewers own work in a submission cause them to be positively biased towards the submission? In conjunction with the review process of two flagship conferences in machine learning and algorithmic economics, we execute an observational study to test for citation bias in peer review. In our analysis, we carefully account for various confounding factors such as paper quality and reviewer expertise, and apply different modeling techniques to alleviate concerns regarding the model mismatch. Overall, our analysis involves 1,314 papers and 1,717 reviewers and detects citation bias in both venues we consider. In terms of the effect size, by citing a reviewers work, a submission has a non-trivial chance of getting a higher score from the reviewer: an expected increase in the score is approximately 0.23 on a 5-point Likert item. For reference, a one-point increase of a score by a single reviewer improves the position of a submission by 11% on average.
Published: 2023

10. A Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML Complementarity

Author: Rastogi, Charvi, Leqi, Liu, Holstein, Kenneth, and Heidari, Hoda
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Hybrid human-ML systems increasingly make consequential decisions in a wide range of domains. These systems are often introduced with the expectation that the combined human-ML system will achieve complementary performance, that is, the combined decision-making system will be an improvement compared with either decision-making agent in isolation. However, empirical results have been mixed, and existing research rarely articulates the sources and mechanisms by which complementary performance is expected to arise. Our goal in this work is to provide conceptual tools to advance the way researchers reason and communicate about human-ML complementarity. Drawing upon prior literature in human psychology, machine learning, and human-computer interaction, we propose a taxonomy characterizing distinct ways in which human and ML-based decision-making can differ. In doing so, we conceptually map potential mechanisms by which combining human and ML decision-making may yield complementary performance, developing a language for the research community to reason about design of hybrid systems in any decision-making domain. To illustrate how our taxonomy can be used to investigate complementarity, we provide a mathematical aggregation framework to examine enabling conditions for complementarity. Through synthetic simulations, we demonstrate how this framework can be used to explore specific aspects of our taxonomy and shed light on the optimal mechanisms for combining human-ML judgments, Comment: 19 pages, 5 figures, Proceedings of HCOMP
Published: 2022

11. Cite-seeing and Reviewing: A Study on Citation Bias in Peer Review

Author: Stelmakh, Ivan, Rastogi, Charvi, Liu, Ryan, Chawla, Shuchi, Echenique, Federico, and Shah, Nihar B.
Subjects: Computer Science - Digital Libraries, Statistics - Applications
Abstract: Citations play an important role in researchers' careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewer's own work in a submission cause them to be positively biased towards the submission? In conjunction with the review process of two flagship conferences in machine learning and algorithmic economics, we execute an observational study to test for citation bias in peer review. In our analysis, we carefully account for various confounding factors such as paper quality and reviewer expertise, and apply different modeling techniques to alleviate concerns regarding the model mismatch. Overall, our analysis involves 1,314 papers and 1,717 reviewers and detects citation bias in both venues we consider. In terms of the effect size, by citing a reviewer's work, a submission has a non-trivial chance of getting a higher score from the reviewer: an expected increase in the score is approximately 0.23 on a 5-point Likert item. For reference, a one-point increase of a score by a single reviewer improves the position of a submission by 11% on average., Comment: 19 pages, 3 figures
Published: 2022
Full Text: View/download PDF

12. To ArXiv or not to ArXiv: A Study Quantifying Pros and Cons of Posting Preprints Online

Author: Rastogi, Charvi, Stelmakh, Ivan, Shen, Xinwei, Meila, Marina, Echenique, Federico, Chawla, Shuchi, and Shah, Nihar B.
Subjects: Computer Science - Digital Libraries, Statistics - Applications
Abstract: Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conducted surveys of reviewers in two top-tier double-blind computer science conferences -- ICML 2021 (5361 submissions and 4699 reviewers) and EC 2021 (498 submissions and 190 reviewers). Our two main findings are as follows. First, more than a third of the reviewers self-report searching online for a paper they are assigned to review. Second, outside the review process, we find that preprints from better-ranked affiliations see a weakly higher visibility, with a correlation of 0.06 in ICML and 0.05 in EC. In particular, papers associated with the top-10-ranked affiliations had a visibility of approximately 11% in ICML and 22% in EC, whereas the remaining papers had a visibility of 7% and 18% respectively., Comment: 17 pages, 3 figures
Published: 2022

13. A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions

Author: Stelmakh, Ivan, Rastogi, Charvi, Shah, Nihar B., Singh, Aarti, and Daumé III, Hal
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, Statistics - Applications
Abstract: Peer review is the backbone of academia and humans constitute a cornerstone of this process, being responsible for reviewing papers and making the final acceptance/rejection decisions. Given that human decision making is known to be susceptible to various cognitive biases, it is important to understand which (if any) biases are present in the peer-review process and design the pipeline such that the impact of these biases is minimized. In this work, we focus on the dynamics of between-reviewers discussions and investigate the presence of herding behaviour therein. In that, we aim to understand whether reviewers and more senior decision makers get disproportionately influenced by the first argument presented in the discussion when (in case of reviewers) they form an independent opinion about the paper before discussing it with others. Specifically, in conjunction with the review process of ICML 2020 -- a large, top tier machine learning conference -- we design and execute a randomized controlled trial with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper.
Published: 2020

14. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making

Author: Rastogi, Charvi, Zhang, Yunfeng, Wei, Dennis, Varshney, Kush R., Dhurandhar, Amit, and Tomsett, Richard
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Several strands of research have aimed to bridge the gap between artificial intelligence (AI) and human decision-makers in AI-assisted decision-making, where humans are the consumers of AI model predictions and the ultimate decision-makers in high-stakes applications. However, people's perception and understanding are often distorted by their cognitive biases, such as confirmation bias, anchoring bias, availability bias, to name a few. In this work, we use knowledge from the field of cognitive science to account for cognitive biases in the human-AI collaborative decision-making setting, and mitigate their negative effects on collaborative performance. To this end, we mathematically model cognitive biases and provide a general framework through which researchers and practitioners can understand the interplay between cognitive biases and human-AI accuracy. We then focus specifically on anchoring bias, a bias commonly encountered in human-AI collaboration. We implement a time-based de-anchoring strategy and conduct our first user experiment that validates its effectiveness in human-AI collaborative decision-making. With this result, we design a time allocation strategy for a resource-constrained setting that achieves optimal human-AI collaboration under some assumptions. We, then, conduct a second user experiment which shows that our time allocation strategy with explanation can effectively de-anchor the human and improve collaborative performance when the AI model has low confidence and is incorrect., Comment: 22 pages, 4 figures
Published: 2020

15. Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions

Author: Rastogi, Charvi, Balakrishnan, Sivaraman, Shah, Nihar B., and Singh, Aarti
Subjects: Statistics - Machine Learning, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to ratings-converted-to-comparisons. Other examples include sports data analysis and peer grading. In this paper, we design two-sample tests for pairwise comparison data and ranking data. For our two-sample test for pairwise comparison data, we establish an upper bound on the sample complexity required to correctly distinguish between the distributions of the two sets of samples. Our test requires essentially no assumptions on the distributions. We then prove complementary lower bounds showing that our results are tight (in the minimax sense) up to constant factors. We investigate the role of modeling assumptions by proving lower bounds for a range of pairwise comparison models (WST, MST,SST, parameter-based such as BTL and Thurstone). We also provide testing algorithms and associated sample complexity bounds for the problem of two-sample testing with partial (or total) ranking data.Furthermore, we empirically evaluate our results via extensive simulations as well as two real-world datasets consisting of pairwise comparisons. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently. On the other hand, our test recognizes no significant difference in the relative performance of European football teams across two seasons. Finally, we apply our two-sample test on a real-world partial and total ranking dataset and find a statistically significant difference in Sushi preferences across demographic divisions based on gender, age and region of residence., Comment: 40 pages, 4 figures
Published: 2020

16. A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms

Author: Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Rastogi, Charvi, Varshney, Pramod K., and Bremer, Peer-Timo
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence
Abstract: This paper proposes a new approach to construct high quality space-filling sample designs. First, we propose a novel technique to quantify the space-filling property and optimally trade-off uniformity and randomness in sample designs in arbitrary dimensions. Second, we connect the proposed metric (defined in the spatial domain) to the objective measure of the design performance (defined in the spectral domain). This connection serves as an analytic framework for evaluating the qualitative properties of space-filling designs in general. Using the theoretical insights provided by this spatial-spectral analysis, we derive the notion of optimal space-filling designs, which we refer to as space-filling spectral designs. Third, we propose an efficient estimator to evaluate the space-filling properties of sample designs in arbitrary dimensions and use it to develop an optimization framework to generate high quality space-filling designs. Finally, we carry out a detailed performance comparison on two different applications in 2 to 6 dimensions: a) image reconstruction and b) surrogate modeling on several benchmark optimization functions and an inertial confinement fusion (ICF) simulation code. We demonstrate that the propose spectral designs significantly outperform existing approaches especially in high dimensions.
Published: 2017

17. Mobile Sensing of Two-Dimensional Bandlimited Fields on Random Paths

Author: Rastogi, Charvi and Kumar, Animesh
Subjects: Computer Science - Information Theory
Abstract: Mobile sensing has been recently proposed for sampling spatial fields, where mobile sensors record the field along various paths for reconstruction. Classical and contemporary sampling typically assumes that the sampling locations are approximately known. This work explores multiple sampling strategies along random paths to sample and reconstruct a two dimensional bandlimited field. Extensive simulations are carried out, with insights from sensing matrices and their properties, to evaluate the sampling strategies. Their performance is measured by evaluating the stability of field reconstruction from field samples. The effect of location unawareness on some sampling strategies is also evaluated by simulations., Comment: 6 pages, 2 figures; submitted to ICASSP 2018
Published: 2017

18. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

Author: Quaye, Jessica, primary, Parrish, Alicia, additional, Inel, Oana, additional, Rastogi, Charvi, additional, Kirk, Hannah Rose, additional, Kahng, Minsuk, additional, Van Liemt, Erin, additional, Bartolo, Max, additional, Tsang, Jess, additional, White, Justin, additional, Clement, Nathan, additional, Mosquera, Rafael, additional, Ciro, Juan, additional, Janapa Reddi, Vijay, additional, and Aroyo, Lora, additional
Published: 2024
Full Text: View/download PDF

19. How do authors’ perceptions of their papers compare with co-authors’ perceptions and peer-review decisions?

Author: Rastogi, Charvi, primary, Stelmakh, Ivan, additional, Beygelzimer, Alina, additional, Dauphin, Yann N., additional, Liang, Percy, additional, Wortman Vaughan, Jennifer, additional, Xue, Zhenyu, additional, Daumé III, Hal, additional, Pierson, Emma, additional, and Shah, Nihar B., additional
Published: 2024
Full Text: View/download PDF

20. A Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML Complementarity

Author: Rastogi, Charvi, primary, Leqi, Liu, additional, Holstein, Kenneth, additional, and Heidari, Hoda, additional
Published: 2023
Full Text: View/download PDF

21. Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Author: Rastogi, Charvi, primary, Tulio Ribeiro, Marco, additional, King, Nicholas, additional, Nori, Harsha, additional, and Amershi, Saleema, additional
Published: 2023
Full Text: View/download PDF

22. Investigating the Relative Strengths of Humans and Machine Learning in Decision-Making

Author: Rastogi, Charvi, primary
Published: 2023
Full Text: View/download PDF

23. A large scale randomized controlled trial on herding in peer-review discussions.

Author: Stelmakh, Ivan, Rastogi, Charvi, Shah, Nihar B., Singh, Aarti, and Daumé III, Hal
Subjects: *SCHOLARLY peer review, *ANCHORING effect, *COGNITIVE bias, *RANDOMIZED controlled trials, *MACHINE learning
Abstract: Peer review is the backbone of academia and humans constitute a cornerstone of this process, being responsible for reviewing submissions and making the final acceptance/rejection decisions. Given that human decision-making is known to be susceptible to various cognitive biases, it is important to understand which (if any) biases are present in the peer-review process, and design the pipeline such that the impact of these biases is minimized. In this work, we focus on the dynamics of discussions between reviewers and investigate the presence of herding behaviour therein. Specifically, we aim to understand whether reviewers and discussion chairs get disproportionately influenced by the first argument presented in the discussion when (in case of reviewers) they form an independent opinion about the paper before discussing it with others. In conjunction with the review process of a large, top tier machine learning conference, we design and execute a randomized controlled trial that involves 1,544 papers and 2,797 reviewers with the goal of testing for the conditional causal effect of the discussion initiator's opinion on the outcome of a paper. Our experiment reveals no evidence of herding in peer-review discussions. This observation is in contrast with past work that has documented an undue influence of the first piece of information on the final decision (e.g., anchoring effect) and analyzed herding behaviour in other applications (e.g., financial markets). Regarding policy implications, the absence of the herding effect suggests that the current status quo of the absence of a unified policy towards discussion initiation does not result in an increased arbitrariness of the resulting decisions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. No Rose for MLE: Inadmissibility of MLE for Evaluation Aggregation Under Levels of Expertise

Author: Rastogi, Charvi, primary, Stelmakh, Ivan, additional, Shah, Nihar, additional, and Balakrishnan, Sivaraman, additional
Published: 2022
Full Text: View/download PDF

25. A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making

Author: Rastogi, Charvi, Leqi, Liu, Holstein, Kenneth, and Heidari, Hoda
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Human-Computer Interaction, Human-Computer Interaction (cs.HC), Machine Learning (cs.LG)
Abstract: Hybrid human-ML systems are increasingly in charge of consequential decisions in a wide range of domains. A growing body of empirical and theoretical work has advanced our understanding of these systems. However, existing empirical results are mixed, and theoretical proposals are often mutually incompatible. In this work, we propose a unifying framework for understanding conditions under which combining the complementary strengths of humans and ML leads to higher quality decisions than those produced by each of them individually -- a state which we refer to as human-ML complementarity. We focus specifically on the context of human-ML predictive decision-making and investigate optimal ways of combining human and ML predictive decisions, accounting for the underlying sources of variation in their judgments. Within this scope, we present two crucial contributions. First, taking a computational perspective of decision-making and drawing upon prior literature in psychology, machine learning, and human-computer interaction, we introduce a taxonomy characterizing a wide range of criteria across which human and machine decision-making differ. Second, formalizing our taxonomy allows us to study how human and ML predictive decisions should be aggregated optimally. We show that our proposed framework encompasses several existing models of human-ML complementarity as special cases. Last but not least, an initial exploratory analysis of our framework presents a critical insight for future work in human-ML complementarity: the mechanism by which we combine human and ML judgments should be informed by the underlying causes of divergence in their decisions., Comment: 21 pages, 1 figure
Published: 2022
Full Text: View/download PDF

26. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making

Author: Rastogi, Charvi, primary, Zhang, Yunfeng, additional, Wei, Dennis, additional, Varshney, Kush R., additional, Dhurandhar, Amit, additional, and Tomsett, Richard, additional
Published: 2022
Full Text: View/download PDF

27. Two-Sample Testing on Pairwise Comparison Data and the Role of Modeling Assumptions

Author: Rastogi, Charvi, primary, Balakrishnan, Sivaraman, additional, Shah, Nihar, additional, and Singh, Aarti, additional
Published: 2020
Full Text: View/download PDF

28. Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions.

Author: Rastogi, Charvi, Balakrishnan, Sivaraman, Shah, Nihar B., and Singh, Aarti
Subjects: *DATA modeling, *TURF management, *DISCRETE choice models, *DATA analysis, *SOCIAL choice, *CROWDSOURCING
Abstract: A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise-comparison data provided by people is distributed identically to ratings-converted-to-comparisons. Other applications include sports data analysis and peer grading. In this paper, we design twosample tests for pairwise-comparison data and ranking data. For our two-sample test for pairwise-comparison data, we establish an upper bound on the sample complexity required to correctly test whether the distributions of the two sets of samples are identical. Our test requires essentially no assumptions on the distributions. We then prove complementary lower bounds showing that our results are tight (in the minimax sense) up to constant factors. We investigate the role of modeling assumptions by proving lower bounds for a range of pairwise-comparison models (WST, MST, SST, parameter-based such as BTL and Thurstone). We also provide tests and associated sample complexity bounds for partial (or total) ranking data. Furthermore, we empirically evaluate our results via extensive simulations as well as three real-world data sets consisting of pairwise-comparisons and rankings. By applying our two-sample test on real-world pairwise-comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently. [ABSTRACT FROM AUTHOR]
Published: 2022

29. A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms.

Author: Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Rastogi, Charvi, Varshney, Pramod K., and Bremer, Peer-Timo
Subjects: *ALGORITHMS, *SPATIAL analysis (Statistics), *REGRESSION analysis, *MATHEMATICAL optimization, *DIMENSIONS
Abstract: This paper proposes a new approach to construct high quality space-filling sample designs. First, we propose a novel technique to quantify the space-filling property and optimally trade-o uniformity and randomness in sample designs in arbitrary dimensions. Second, we connect the proposed metric (defined in the spatial domain) to the quality metric of the design performance (defined in the spectral domain). This connection serves as an analytic framework for evaluating the qualitative properties of space-filling designs in general. Us- ing the theoretical insights provided by this spatial-spectral analysis, we derive the notion of optimal space-filling designs, which we refer to as space-filling spectral designs. Third, we propose an e cient estimator to evaluate the space-filling properties of sample designs in arbitrary dimensions and use it to develop an optimization framework for generating high quality space-filling designs. Finally, we carry out a detailed performance comparison on two di erent applications in varying dimensions: a) image reconstruction and b) surro- gate modeling for several benchmark optimization functions and a physics simulation code for inertial confinement fusion (ICF). Our results clearly evidence the superiority of the proposed space-filling designs over existing approaches, particularly in high dimensions. [ABSTRACT FROM AUTHOR]
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

29 results on '"Rastogi, Charvi"'

1. Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups

2. Imagen 3

3. A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions

4. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

5. Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

6. Supporting Human-AI Collaboration in Auditing LLMs with LLMs

7. How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

8. DataPerf: Benchmarks for Data-Centric AI Development

9. Cite-seeing and reviewing: A study on citation bias in peer review.

10. A Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML Complementarity

11. Cite-seeing and Reviewing: A Study on Citation Bias in Peer Review

12. To ArXiv or not to ArXiv: A Study Quantifying Pros and Cons of Posting Preprints Online

13. A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions

14. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making

15. Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions

16. A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms

17. Mobile Sensing of Two-Dimensional Bandlimited Fields on Random Paths

18. Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

19. How do authors’ perceptions of their papers compare with co-authors’ perceptions and peer-review decisions?

20. A Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML Complementarity

21. Supporting Human-AI Collaboration in Auditing LLMs with LLMs

22. Investigating the Relative Strengths of Humans and Machine Learning in Decision-Making

23. A large scale randomized controlled trial on herding in peer-review discussions.

24. No Rose for MLE: Inadmissibility of MLE for Evaluation Aggregation Under Levels of Expertise

25. A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making

26. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-making

27. Two-Sample Testing on Pairwise Comparison Data and the Role of Modeling Assumptions

28. Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions.

29. A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

29 results on '"Rastogi, Charvi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources