Author: "Mukherjee, Animesh" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mukherjee, Animesh"' showing total 700 results

Start Over Author "Mukherjee, Animesh"

700 results on '"Mukherjee, Animesh"'

1. DENOASR: Debiasing ASRs through Selective Denoising

Author: Rai, Anand Kumar, Jaiswal, Siddharth D, Prakash, Shubham, Sree, Bendi Pragnya, and Mukherjee, Animesh
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatic Speech Recognition (ASR) systems have been examined and shown to exhibit biases toward particular groups of individuals, influenced by factors such as demographic traits, accents, and speech styles. Noise can disproportionately impact speakers with certain accents, dialects, or speaking styles, leading to biased error rates. In this work, we introduce a novel framework DENOASR, which is a selective denoising technique to reduce the disparity in the word error rates between the two gender groups, male and female. We find that a combination of two popular speech denoising techniques, viz. DEMUCS and LE, can be effectively used to mitigate ASR disparity without compromising their overall performance. Experiments using two state-of-the-art open-source ASRs - OpenAI WHISPER and NVIDIA NEMO - on multiple benchmark datasets, including TIE, VOX-POPULI, TEDLIUM, and FLEURS, show that there is a promising reduction in the average word error rate gap across the two gender groups. For a given dataset, the denoising is selectively applied on speech samples having speech intelligibility below a certain threshold, estimated using a small validation sample, thus ameliorating the need for large-scale human-written ground-truth transcripts. Our findings suggest that selective denoising can be an elegant approach to mitigate biases in present-day ASR systems., Comment: Paper accepted at IEEE ICKG 2024
Published: 2024

2. Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Author: Banerjee, Somnath, Layek, Sayan, Shrawgi, Hari, Mandal, Rajarshi, Halder, Avik, Kumar, Shanu, Basu, Sagnik, Agrawal, Parag, Hazra, Rima, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.
Published: 2024

3. CrowdCounter: A benchmark type-specific multi-target counterspeech dataset

Author: Saha, Punyajoy, Datta, Abhilash, Jana, Abhik, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: Counterspeech presents a viable alternative to banning or suspending users for hate speech while upholding freedom of expression. However, writing effective counterspeech is challenging for moderators/users. Hence, developing suggestion tools for writing counterspeech is the need of the hour. One critical challenge in developing such a tool is the lack of quality and diversity of the responses in the existing datasets. Hence, we introduce a new dataset - CrowdCounter containing 3,425 hate speech-counterspeech pairs spanning six different counterspeech types (empathy, humor, questioning, warning, shaming, contradiction), which is the first of its kind. The design of our annotation platform itself encourages annotators to write type-specific, non-redundant and high-quality counterspeech. We evaluate two frameworks for generating counterspeech responses - vanilla and type-controlled prompts - across four large language models. In terms of metrics, we evaluate the responses using relevance, diversity and quality. We observe that Flan-T5 is the best model in the vanilla framework across different models. Type-specific prompts enhance the relevance of the responses, although they might reduce the language quality. DialoGPT proves to be the best at following the instructions and generating the type-specific counterspeech accurately., Comment: 19 pages, 1 figure, 14 tables, Code available https://github.com/hate-alert/CrowdCounter
Published: 2024

4. Sponsored is the New Organic: Implications of Sponsored Results on Quality of Search Results in the Amazon Marketplace

Author: Dash, Abhisek, Ghosh, Saptarshi, Mukherjee, Animesh, Chakraborty, Abhijnan, and Gummadi, Krishna P.
Subjects: Computer Science - Computers and Society, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Information Retrieval
Abstract: Interleaving sponsored results (advertisements) amongst organic results on search engine result pages (SERP) has become a common practice across multiple digital platforms. Advertisements have catered to consumer satisfaction and fostered competition in digital public spaces; making them an appealing gateway for businesses to reach their consumers. However, especially in the context of digital marketplaces, due to the competitive nature of the sponsored results with the organic ones, multiple unwanted repercussions have surfaced affecting different stakeholders. From the consumers' perspective the sponsored ads/results may cause degradation of search quality and nudge consumers to potentially irrelevant and costlier products. The sponsored ads may also affect the level playing field of the competition in the marketplaces among sellers. To understand and unravel these potential concerns, we analyse the Amazon digital marketplace in four different countries by simulating 4,800 search operations. Our analyses over SERPs consisting 2M organic and 638K sponsored results show items with poor organic ranks (beyond 100th position) appear as sponsored results even before the top organic results on the first page of Amazon SERP. Moreover, we also observe that in majority of the cases, these top sponsored results are costlier and are of poorer quality than the top organic results. We believe these observations can motivate researchers for further deliberation to bring in more transparency and guard rails in the advertising practices followed in digital marketplaces., Comment: This work has been accepted as a full paper in AAAI/ACM conference on Artificial Intelligence, Ethics and Society (AIES) 2024
Published: 2024

5. Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems

Author: Jaiswal, Siddharth D, Ganai, Animesh, Dash, Abhisek, Ghosh, Saptarshi, and Mukherjee, Animesh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computers and Society
Abstract: Facial Recognition Systems (FRSs) are being developed and deployed globally at unprecedented rates. Most platforms are designed in a limited set of countries but deployed in worldwide, without adequate checkpoints. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of FRS functionality and low-resource bias mitigation measures accentuate the problem. In this work, we propose a new face dataset composed of 6,579 unique male and female sportspersons from eight countries around the world. More than 50% of the dataset comprises individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, each image has four adversarial variants, totaling over 40,000 images. We also benchmark five popular FRSs, both commercial and open-source, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2%--38.1%, with a large disparity between males and females in the Global South (max difference of 38.5%). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). Grad-CAM analysis identifies the nose, forehead and mouth as the regions of interest on one of the open-source FRSs. Utilizing this insight, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques significantly improving the accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. In the red-teaming experiment with the open-source Deepface model, contrastive learning proves more effective than simple fine-tuning., Comment: This work has been accepted for publication at AAAI/ACM AIES 2024
Published: 2024

6. Auditing the Grid-Based Placement of Private Label Products on E-commerce Search Result Pages

Author: Jaiswal, Siddharth D, Dash, Abhisek, Shroff, Nitika, Vunnam, Yashwanth Babu, Ghosh, Saptarshi, and Mukherjee, Animesh
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction, Computer Science - Information Retrieval
Abstract: E-commerce platforms support the needs and livelihoods of their two most important stakeholders -- customers and producers/sellers. Multiple algorithmic systems, like ``search'' systems mediate the interactions between these stakeholders by connecting customers to producers with relevant items. Search results include (i) private label (PL) products that are manufactured/sold by the platform itself, as well as (ii) third-party products on advertised / sponsored and organic positions. In this paper, we systematically quantify the extent of PL product promotion on e-commerce search results for the two largest e-commerce platforms operating in India -- Amazon.in and Flipkart. By analyzing snapshots of search results across the two platforms, we discover high PL promotion on the initial result pages (~ 15% PLs are advertised on the first SERP of Amazon). Both platforms use different strategies to promote their PL products, such as placing more PLs on the advertised positions -- while Amazon places them on the first, middle, and last rows of the search results, Flipkart places them on the first two positions and the (entire) last column of the search results. We discover that these product placement strategies of both platforms conform with existing user attention strategies proposed in the literature. Finally, to supplement the findings from the collected data, we conduct a survey among 68 participants on Amazon Mechanical Turk. The click pattern from our survey shows that users strongly prefer to click on products placed at positions that correspond to the PL products on the search results of Amazon, but not so strongly on Flipkart. The click-through rate follows previously proposed theoretically grounded user attention distribution patterns in a two-dimensional layout.
Published: 2024

7. Sampling-Based Attack for Centrality Disruption in Complex Networks

Author: Irany, Fariba Afrin, Sarakar, Soumya, Mukherjee, Animesh, and Bhowmick, Sanjukta
Subjects: Computer Science - Social and Information Networks
Abstract: Many mobile networks are represented as graphs to obtain insight to their connectivity and transmission properties. Among these properties centrality resilience, that is, how well centralities, such as closeness and betweennesss, are maintained under attacks is a critical factor for proper functioning of a network. In this paper, we study the centrality resilience of complex networks by developing attack models to disrupt the rank of the top path-based centrality vertices. To develop our attack models, we extend the concept of rich clubs of influential vertices to the more general framework of scattered rich clubs. We define scattered rich clubs as dense subgraphs of high centrality vertices that are spread (scattered) across the network. Finding scattered rich clubs, although of polynomial time complexity, is extremely expensive computationally. We use snowball sampling to identify these important substructures as well as to identify which edges to target in our proposed attack models. Our results over a set of real world networks demonstrate that our proposed algorithm is effective in finding the single or scattered rich clubs efficiently and in successfully disrupting the centrality rankings of the network. To summarize, we propose sampling-based attack models for testing the resilience of networks with respect to centrality rankings. As part of this process, we introduce scattered rich clubs, a generalized form of the rich club model, efficient algorithms to detect them, and demonstrate their relation to network resilience., Comment: 10 pages, 8 figures, 3 tables
Published: 2024

8. Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Author: Dash, Abhisek, Chakraborty, Abhijnan, Ghosh, Saptarshi, Mukherjee, Animesh, and Gummadi, Krishna P.
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction, Computer Science - Information Retrieval
Abstract: E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved., Comment: This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction
Published: 2024

9. Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

Author: Yimam, Seid Muhie, Dementieva, Daryna, Fischer, Tim, Moskovskiy, Daniil, Rizwan, Naquee, Saha, Punyajoy, Roy, Sarthak, Semmann, Martin, Panchenko, Alexander, Biemann, Chris, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks
Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale -- and suggesting more options of actions like detoxification, counter speech generation, blocking, or, as a final measure, human intervention. Through a thorough analysis of abusive speech regulations across diverse jurisdictions, platforms, and research papers we highlight the gap in preventing measures and advocate for tailored proactive steps to combat its multifaceted manifestations. Our work aims to inform future strategies for effectively addressing abusive speech online.
Published: 2024

10. SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Author: Banerjee, Somnath, Tripathy, Soham, Layek, Sayan, Kumar, Shanu, Mukherjee, Animesh, and Hazra, Rima
Subjects: Computer Science - Computation and Language
Abstract: Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants., Comment: Under review
Published: 2024

11. Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance

Author: Banerjee, Somnath, Halder, Avik, Mandal, Rajarshi, Layek, Sayan, Soboroff, Ian, Hazra, Rima, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: The integration of pretrained language models (PLMs) like BERT and GPT has revolutionized NLP, particularly for English, but it has also created linguistic imbalances. This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts. We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada. Our research identifies significant discrepancies in normal and merged models concerning cross-lingual consistency. We employ strategies like 'each language for itself' (ELFI) and 'each language for others' (ELFO) to stress-test these models. Our findings demonstrate the potential for LLMs to overcome linguistic barriers, laying the groundwork for future research in achieving linguistic inclusivity in AI technologies., Comment: Under review
Published: 2024

12. Antitrust, Amazon, and Algorithmic Auditing

Author: Dash, Abhisek, Chakraborty, Abhijnan, Ghosh, Saptarshi, Mukherjee, Animesh, Frankenreiter, Jens, Bechtold, Stefan, and Gummadi, Krishna P.
Subjects: Computer Science - Computers and Society, Computer Science - Human-Computer Interaction, Computer Science - Information Retrieval
Abstract: In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically described as self-preferencing. We discuss how the computer science tools used in this paper can be used in a regulatory environment that is based on algorithmic auditing and requires regulating digital markets at scale., Comment: The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024
Published: 2024

13. On Zero-Shot Counterspeech Generation by LLMs

Author: Saha, Punyajoy, Agrawal, Aalok, Jana, Abhik, Biemann, Chris, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models., Comment: 12 pages, 7 tables, accepted at LREC-COLING 2024
Published: 2024

14. Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

Author: Nag, Arijit, Mukherjee, Animesh, Ganguly, Niloy, and Chakrabarti, Soumen
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) exhibit impressive zero/few-shot inference and generation quality for high-resource languages (HRLs). A few of them have been trained on low-resource languages (LRLs) and give decent performance. Owing to the prohibitive costs of training LLMs, they are usually used as a network service, with the client charged by the count of input and output tokens. The number of tokens strongly depends on the script and language, as well as the LLM's subword vocabulary. We show that LRLs are at a pricing disadvantage, because the well-known LLMs produce more tokens for LRLs than HRLs. This is because most currently popular LLMs are optimized for HRL vocabularies. Our objective is to level the playing field: reduce the cost of processing LRLs in contemporary LLMs while ensuring that predictive and generative qualities are not compromised. As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs. We perform an extensive study using the IndicXTREME classification and six generative tasks dataset, covering 15 Indic and 3 other languages, while using GPT-4 (one of the costliest LLM services released so far) as a commercial LLM. We observe and analyze interesting patterns involving token count, cost, and quality across a multitude of languages and tasks. We show that choosing the best policy to interact with the LLM can reduce cost by 90% while giving better or comparable performance compared to communicating with the LLM in the original LRL.
Published: 2024

15. Understanding how social discussion platforms like Reddit are influencing financial behavior

Author: Thukral, Sachin, Sangwan, Suyash, Chatterjee, Arnab, Dey, Lipika, Agrawal, Aaditya, Chandra, Pramit Kumar, and Mukherjee, Animesh
Subjects: Computer Science - Social and Information Networks
Abstract: This study proposes content and interaction analysis techniques for a large repository created from social media content. Though we have presented our study for a large platform dedicated to discussions around financial topics, the proposed methods are generic and applicable to all platforms. Along with an extension of topic extraction method using Latent Dirichlet Allocation, we propose a few measures to assess user participation, influence and topic affinities specifically. Our study also maps user-generated content to components of behavioral finance. While these types of information are usually gathered through surveys, it is obvious that large scale data analysis from social media can reveal many potentially unknown or rare insights. Characterising users based on their platform behavior to provide critical insights about how communities are formed and trust is established in these platforms using graphical analysis is also studied., Comment: 8 pages, 8 figures, 3 tables, and 1 algorithm; Published in WI-IAT 2022 (The 21st IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology)
Published: 2024
Full Text: View/download PDF

16. DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

Author: Banerjee, Somnath, Dutta, Avik, Agrawal, Aaditya, Hazra, Rima, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent. A crucial requirement in the automation of support tools for such systems is the early identification of named entities, which serves as a foundation for developing specialized functionalities. However, due to the specific nature of each domain, different technical terminologies and specialized languages, expert annotation of available data becomes expensive and challenging. In light of these challenges, this paper proposes a novel named entity recognition (NER) technique specifically tailored for the open-source software systems. Our approach aims to address the scarcity of annotated software data by employing a comprehensive two-step distantly supervised annotation process. This process strategically leverages language heuristics, unique lookup tables, external knowledge sources, and an active learning approach. By harnessing these powerful techniques, we not only enhance model performance but also effectively mitigate the limitations associated with cost and the scarcity of expert annotators. It is noteworthy that our model significantly outperforms the state-of-the-art LLMs by a substantial margin. We also show the effectiveness of NER in the downstream task of relation extraction., Comment: Accepted at ECML-PKDD 2024 (Long Paper)
Published: 2024

17. How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Author: Banerjee, Somnath, Layek, Sayan, Hazra, Rima, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we report the harmfulness score metric as well as judgements from GPT-4 and humans. Overall, we observe that asking LLMs to produce instruction-centric responses enhances the unethical response generation by ~2-38% across the models. As an additional objective, we investigate the impact of model editing using the ROME technique, which further increases the propensity for generating undesirable content. In particular, asking edited LLMs to generate instruction-centric responses further increases the unethical response generation by ~3-16% across the different models., Comment: Under review. {https://huggingface.co/datasets/SoftMINER-Group/TechHazardQA}
Published: 2024

18. InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks

Author: Banerjee, Somnath, Sarkar, Maulindu, Saha, Punyajoy, Mathew, Binny, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance. To meet these objectives, in this paper, we introduce InfFeed, which uses influence functions to compute the influential instances for a target instance. Toward the first objective, we adjust the label of the target instance based on its influencer(s) label. In doing this, InfFeed outperforms the state-of-the-art baselines (including LLMs) by a maximum macro F1-score margin of almost 4% for hate speech classification, 3.5% for stance classification, and 3% for irony and 2% for sarcasm detection. Toward the second objective we show that manually re-annotating only those silver annotated data points in the extension set that have a negative influence can immensely improve the model performance bringing it very close to the scenario where all the data points in the extension set have gold labels. This allows for huge reduction of the number of data points that need to be manually annotated since out of the silver annotated extension dataset, the influence function scheme picks up ~1/1000 points that need manual correction., Comment: Accepted at LREC-COLING 2024 (Long Paper)
Published: 2024

19. Mask-up: Investigating Biases in Face Re-identification for Masked Faces

Author: Jaiswal, Siddharth D, Verma, Ankit Kr., and Mukherjee, Animesh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Human-Computer Interaction
Abstract: AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases., Comment: This work has been submitted to the IEEE for possible publication
Published: 2024

20. TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text

Author: Adak, Sayantan, Agrawal, Daivik, Mukherjee, Animesh, and Aditya, Somak
Subjects: Computer Science - Computation and Language
Abstract: We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances -- Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Affordance
Published: 2024

21. Zero shot VLMs for hate meme detection: Are we there yet?

Author: Rizwan, Naquee, Bhaskar, Paramananda, Das, Mithun, Majhi, Swadhin Satyaprakash, Saha, Punyajoy, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Multimedia content on social media is rapidly evolving, with memes gaining prominence as a distinctive form. Unfortunately, some malicious users exploit memes to target individuals or vulnerable communities, making it imperative to identify and address such instances of hateful memes. Extensive research has been conducted to address this issue by developing hate meme detection models. However, a notable limitation of traditional machine/deep learning models is the requirement for labeled datasets for accurate classification. Recently, the research community has witnessed the emergence of several visual language models that have exhibited outstanding performance across various tasks. In this study, we aim to investigate the efficacy of these visual language models in handling intricate tasks such as hate meme detection. We use various prompt settings to focus on zero-shot classification of hateful/harmful memes. Through our analysis, we observe that large VLMs are still vulnerable for zero-shot hate meme detection.
Published: 2024

22. Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

Author: Das, Mithun, Pandey, Saurabh Kumar, Sethi, Shivansh, Saha, Punyajoy, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Abstract: With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network. However, most of the efforts so far have been primarily focused on English. To bridge the gap for low-resource languages such as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive speech/counterspeech pairs, of which 2,460 pairs are in Bengali and 2,602 pairs are in Hindi. We implement several baseline models considering various interlingual transfer mechanisms with different configurations to generate suitable counterspeech to set up an effective benchmark. We observe that the monolingual setup yields the best performance. Further, using synthetic transfer, language models can generate counterspeech to some extent; specifically, we notice that transferability is better when languages belong to the same language family., Comment: Accepted to the Findings of the ACL: EACL 2024
Published: 2024

23. Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context

Author: Banerjee, Somnath, Sahoo, Amruit, Layek, Sayan, Dutta, Avik, Hazra, Rima, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: In the continuously advancing AI landscape, crafting context-rich and meaningful responses via Large Language Models (LLMs) is essential. Researchers are becoming more aware of the challenges that LLMs with fewer parameters encounter when trying to provide suitable answers to open-ended questions. To address these hurdles, the integration of cutting-edge strategies, augmentation of rich external domain knowledge to LLMs, offers significant improvements. This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement, honing the proficiency of LLMs, especially in domain specific community question answering platforms like AskUbuntu, Unix, and ServerFault. We conduct experiments on various LLMs with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions. Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases. This advancement highlights the importance of pairing context rich data retrieval with LLMs, offering a renewed approach to knowledge sourcing and generation in AI systems. We also show that, due to rich contextual data retrieval, the crucial entities, along with the generated answer, remain factually coherent with the gold answer., Comment: Accepted at EMNLP 2024
Published: 2024

24. Probing LLMs for hate speech detection: strengths and vulnerabilities

Author: Roy, Sarthak, Harshavardhan, Ashish, Mukherjee, Animesh, and Saha, Punyajoy
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without adding any in-context examples). We select three large language models (GPT-3.5, text-davinci and Flan-T5) and three datasets - HateXplain, implicit hate and ToxicSpans. We find that on average including the target information in the pipeline improves the model performance substantially (~20-30%) over the baseline across the datasets. There is also a considerable effect of adding the rationales/explanations into the pipeline (~10-20%) over the baseline across the datasets. In addition, we further provide a typology of the error cases where these large language models fail to (i) classify and (ii) explain the reason for the decisions they take. Such vulnerable points automatically constitute 'jailbreak' prompts for these models and industry scale safeguard techniques need to be developed to make the models robust against such prompts., Comment: 13 pages, 9 figures, 7 tables, accepted to findings of EMNLP 2023
Published: 2023

25. BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

Author: Das, Mithun and Mukherjee, Animesh
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The dramatic increase in the use of social media platforms for information sharing has also fueled a steep growth in online abuse. A simple yet effective way of abusing individuals or communities is by creating memes, which often integrate an image with a short piece of text layered on top of it. Such harmful elements are in rampant use and are a threat to online safety. Hence it is necessary to develop efficient models to detect and flag abusive memes. The problem becomes more challenging in a low-resource setting (e.g., Bengali memes, i.e., images with Bengali text embedded on it) because of the absence of benchmark datasets on which AI models could be trained. In this paper we bridge this gap by building a Bengali meme dataset. To setup an effective benchmark we implement several baseline models for classifying abusive memes using this dataset. We observe that multimodal models that use both textual and visual information outperform unimodal models. Our best-performing model achieves a macro F1 score of 70.51. Finally, we perform a qualitative error analysis of the misclassified memes of the best-performing text-based, image-based and multimodal models., Comment: EMNLP 2023 (main conference)
Published: 2023

26. Auditing Gender Analyzers on Text Data

Author: Jaiswal, Siddharth D, Verma, Ankit Kumar, and Mukherjee, Animesh
Subjects: Computer Science - Computers and Society, Computer Science - Computation and Language
Abstract: AI models have become extremely popular and accessible to the general public. However, they are continuously under the scanner due to their demonstrable biases toward various sections of the society like people of color and non-binary people. In this study, we audit three existing gender analyzers -- uClassify, Readable and HackerFactor, for biases against non-binary individuals. These tools are designed to predict only the cisgender binary labels, which leads to discrimination against non-binary members of the society. We curate two datasets -- Reddit comments (660k) and, Tumblr posts (2.05M) and our experimental evaluation shows that the tools are highly inaccurate with the overall accuracy being ~50% on all platforms. Predictions for non-binary comments on all platforms are mostly female, thus propagating the societal bias that non-binary individuals are effeminate. To address this, we fine-tune a BERT multi-label classifier on the two datasets in multiple combinations, observe an overall performance of ~77% on the most realistically deployable setting and a surprisingly higher performance of 90% for the non-binary class. We also audit ChatGPT using zero-shot prompts on a small dataset (due to high pricing) and observe an average accuracy of 58% for Reddit and Tumblr combined (with overall better results for Reddit). Thus, we show that existing systems, including highly advanced ones like ChatGPT are biased, and need better audits and moderation and, that such societal biases can be addressed and alleviated through simple off-the-shelf models like BERT trained on more gender inclusive datasets., Comment: This work has been accepted at IEEE/ACM ASONAM 2023. Please cite the version appearing in the ASONAM proceedings
Published: 2023

27. Modeling interdisciplinary interactions among Physics, Mathematics & Computer Science

Author: Hazra, Rima, Singh, Mayank, Goyal, Pawan, Adhikari, Bibhas, and Mukherjee, Animesh
Subjects: Computer Science - Digital Libraries, Computer Science - Computation and Language
Abstract: Interdisciplinarity has over the recent years have gained tremendous importance and has become one of the key ways of doing cutting edge research. In this paper we attempt to model the citation flow across three different fields -- Physics (PHY), Mathematics (MA) and Computer Science (CS). For instance, is there a specific pattern in which these fields cite one another? We carry out experiments on a dataset comprising more than 1.2 million articles taken from these three fields. We quantify the citation interactions among these three fields through temporal bucket signatures. We present numerical models based on variants of the recently proposed relay-linking framework to explain the citation dynamics across the three disciplines. These models make a modest attempt to unfold the underlying principles of how citation links could have been formed across the three fields over time., Comment: Accepted at Journal of Physics: Complexity
Published: 2023

28. Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

Author: Hazra, Rima, Saha, Agnik, Banerjee, Somnath, and Mukherjee, Animesh
Subjects: Computer Science - Social and Information Networks, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries. The swiftness of these responses is contingent on a mixture of query-specific and user-related elements. This paper scrutinizes these contributing factors within the context of six highly popular CQA platforms, identified through their standout answering speed. Our investigation reveals a correlation between the time taken to yield the first response to a question and several variables: the metadata, the formulation of the questions, and the level of interaction among users. Additionally, by employing conventional machine learning models to analyze these metadata and patterns of user interaction, we endeavor to predict which queries will receive their initial responses promptly., Comment: Accepted as POSTER
Published: 2023

29. Personality Detection and Analysis using Twitter Data

Author: Datta, Abhilash, Chakraborty, Souvic, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Personality types are important in various fields as they hold relevant information about the characteristics of a human being in an explainable format. They are often good predictors of a person's behaviors in a particular environment and have applications ranging from candidate selection to marketing and mental health. Recently automatic detection of personality traits from texts has gained significant attention in computational linguistics. Most personality detection and analysis methods have focused on small datasets making their experimental observations often limited. To bridge this gap, we focus on collecting and releasing the largest automatically curated dataset for the research community which has 152 million tweets and 56 thousand data points for the Myers-Briggs personality type (MBTI) prediction task. We perform a series of extensive qualitative and quantitative studies on our dataset to analyze the data patterns in a better way and infer conclusions. We show how our intriguing analysis results often follow natural intuition. We also perform a series of ablation studies to show how the baselines perform for our dataset., Comment: Submitted to ASONAM 2023
Published: 2023

30. Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

Author: Hazra, Rima, Saha, Debanjan, Sahoo, Amruit, Banerjee, Somnath, and Mukherjee, Animesh
Subjects: Computer Science - Information Retrieval, Computer Science - Software Engineering, Computer Science - Social and Information Networks
Abstract: Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users. With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question and identify and confirm existing question pairs as duplicates at the right time. This problem is even more critical in CQAs corresponding to large software systems like askubuntu where moderators need to be experts to comprehend something as a duplicate. Note that the prime challenge in such CQA platforms is that the moderators are themselves experts and are therefore usually extremely busy with their time being extraordinarily expensive. To facilitate the task of the moderators, in this work, we have tackled two significant issues for the askubuntu CQA platform: (1) retrieval of duplicate questions given a new question and (2) duplicate question confirmation time prediction. In the first task, we focus on retrieving duplicate questions from a question pool for a particular newly posted question. In the second task, we solve a regression problem to rank a pair of questions that could potentially take a long time to get confirmed as duplicates. For duplicate question retrieval, we propose a Siamese neural network based approach by exploiting both text and network-based features, which outperforms several state-of-the-art baseline techniques. Our method outperforms DupPredictor and DUPE by 5% and 7% respectively. For duplicate confirmation time prediction, we have used both the standard machine learning models and neural network along with the text and graph-based features. We obtain Spearman's rank correlation of 0.20 and 0.213 (statistically significant) for text and graph based features respectively., Comment: Full paper accepted at ASONAM 2023: The 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Published: 2023

31. A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Author: Rai, Anand Kumar, Jaiswal, Siddharth D, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Abstract: Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.
Published: 2023

32. Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Author: Das, Mithun, Pandey, Saurabh Kumar, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Hate speech is a severe issue that affects many online platforms. So far, several studies have been performed to develop robust hate speech detection systems. Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection. However, it is crucial to comprehend the limitations of these models to build robust hate speech detection systems. To bridge this gap, our study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages. Our evaluation employs a series of functionality tests that reveals various intricate failures of the model which the aggregate metrics like macro F1 or accuracy are not able to unfold. In addition, we investigate the influence of complex emotions, such as the use of emojis in hate speech, on the performance of the ChatGPT model. Our analysis highlights the shortcomings of the generative models in detecting certain types of hate speech and highlighting the need for further research and improvements in the workings of these models.
Published: 2023

33. HateMM: A Multi-Modal Dataset for Hate Video Classification

Author: Das, Mithun, Raj, Rohit, Saha, Punyajoy, Mathew, Binny, Gupta, Manish, and Mukherjee, Animesh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate ~43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute., Comment: Accepted at ICWSM 2023(dataset track)
Published: 2023

34. On the rise of fear speech in online social media

Author: Saha, Punyajoy, Garimella, Kiran, Kalyan, Narla Komal, Pandey, Saurabh Kumar, Meher, Pauras Mangesh, Mathew, Binny, and Mukherjee, Animesh
Subjects: Computer Science - Social and Information Networks, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a target community. Although subtle, it might be highly effective, often pushing communities toward a physical conflict. Therefore, understanding their prevalence in social media is of paramount importance. This article presents a large-scale study to understand the prevalence of 400K fear speech and over 700K hate speech posts collected from Gab.com. Remarkably, users posting a large number of fear speech accrue more followers and occupy more central positions in social networks than users posting a large number of hate speech. They can also reach out to benign users more effectively than hate speech users through replies, reposts, and mentions. This connects to the fact that, unlike hate speech, fear speech has almost zero toxic content, making it look plausible. Moreover, while fear speech topics mostly portray a community as a perpetrator using a (fake) chain of argumentation, hate speech topics hurl direct multitarget insults, thus pointing to why general users could be more gullible to fear speech. Our findings transcend even to other platforms (Twitter and Facebook) and thus necessitate using sophisticated moderation policies and mass awareness to combat fear speech., Comment: 16 pages, 9 tables, 15 figures, accepted in Proceedings of the National Academy of Sciences of the United States of America
Published: 2023
Full Text: View/download PDF

35. Diversity matters: Robustness of bias measurements in Wikidata

Author: Das, Paramita, Karnam, Sai Keerthana, Panda, Anirban, Guda, Bhanu Prakash Reddy, Sarkar, Soumya, and Mukherjee, Animesh
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society, Computer Science - Information Retrieval
Abstract: With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''., Comment: 11 pages
Published: 2023

36. HateProof: Are Hateful Meme Detection Systems really Robust?

Author: Aggarwal, Piush, Chawla, Pranit, Das, Mithun, Saha, Punyajoy, Mathew, Binny, Zesch, Torsten, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future., Comment: Accepted at TheWebConf'2023 (WWW'2023)
Published: 2023
Full Text: View/download PDF

37. Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification Condensed Lab Overview

Author: Ayele, Abinew Ali, Babakov, Nikolay, Bevendorff, Janek, Casals, Xavier Bonet, Chulvi, Berta, Dementieva, Daryna, Elnagar, Ashaf, Freitag, Dayne, Fröbe, Maik, Korenčić, Damir, Mayerl, Maximilian, Moskovskiy, Daniil, Mukherjee, Animesh, Panchenko, Alexander, Potthast, Martin, Rangel, Francisco, Rizwan, Naquee, Rosso, Paolo, Schneider, Florian, Smirnova, Alisa, Stamatatos, Efstathios, Stakovskii, Elisei, Stein, Benno, Taulé, Mariona, Ustalov, Dmitry, Wang, Xintong, Wiegmann, Matti, Yimam, Seid Muhie, Zangerle, Eva, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goeuriot, Lorraine, editor, Mulhem, Philippe, editor, Quénot, Georges, editor, Schwab, Didier, editor, Di Nunzio, Giorgio Maria, editor, Soulier, Laure, editor, Galuščáková, Petra, editor, García Seco de Herrera, Alba, editor, Faggioli, Guglielmo, editor, and Ferro, Nicola, editor
Published: 2024
Full Text: View/download PDF

38. Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification : Extended Abstract

Author: Bevendorff, Janek, Casals, Xavier Bonet, Chulvi, Berta, Dementieva, Daryna, Elnagar, Ashaf, Freitag, Dayne, Fröbe, Maik, Korenčić, Damir, Mayerl, Maximilian, Mukherjee, Animesh, Panchenko, Alexander, Potthast, Martin, Rangel, Francisco, Rosso, Paolo, Smirnova, Alisa, Stamatatos, Efstathios, Stein, Benno, Taulé, Mariona, Ustalov, Dmitry, Wiegmann, Matti, Zangerle, Eva, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goharian, Nazli, editor, Tonellotto, Nicola, editor, He, Yulan, editor, Lipani, Aldo, editor, McDonald, Graham, editor, Macdonald, Craig, editor, and Ounis, Iadh, editor
Published: 2024
Full Text: View/download PDF

39. Rationale-Guided Few-Shot Classification to Detect Abusive Language

Author: Saha, Punyajoy, Sheth, Divyanshu, Kedia, Kushal, Mathew, Binny, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RGFS-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RGFS-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness., Comment: 11 pages, 14 tables, 3 figures, The code repository is https://github.com/punyajoy/RGFS_ECAI
Published: 2022

40. Hate Speech and Offensive Language Detection in Bengali

Author: Das, Mithun, Banerjee, Somnath, Saha, Punyajoy, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others., Comment: Accepted at AACL-IJCNLP 2022
Published: 2022

41. Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Author: Chakraborty, Souvic, Goyal, Pawan, and Mukherjee, Animesh
Subjects: Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, K.4.1, I.2.7, I.2.6
Abstract: With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios., Comment: Accepted to ICADL'22
Published: 2022

42. Decoding Demographic un-fairness from Indian Names

Author: Vahini, Medidoddi, Bantupalli, Jalend, Chakraborty, Souvic, and Mukherjee, Animesh
Subjects: Computer Science - Computers and Society, Computer Science - Computation and Language, Computer Science - Digital Libraries, Computer Science - Machine Learning, Computer Science - Social and Information Networks, J.4, K.4.1
Abstract: Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste., Comment: Accepted to SocInfo'22; code hosted at https://github.com/vahini01/IndianDemographics
Published: 2022

43. 'Dummy Grandpa, do you know anything?': Identifying and Characterizing Ad hominem Fallacy Usage in the Wild

Author: Patel, Utkarsh, Mukherjee, Animesh, and Mondal, Mainack
Subjects: Computer Science - Computation and Language
Abstract: Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users. Naturally, twisting the flow of the argument can have a strong impact on the minds of naive users, which in the long run might have socio-political ramifications, for example, winning an election or spreading targeted misinformation. Thus, these platforms are potentially highly vulnerable to malicious players who might act individually or as a cohort to breed fallacious arguments with a motive to sway public opinion. Ad hominem arguments are one of the most effective forms of such fallacies. Although a simple fallacy, it is effective enough to sway public debates in offline world and can be used as a precursor to shutting down the voice of opposition by slander. In this work, we take a first step in shedding light on the usage of ad hominem fallacies in the wild. First, we build a powerful ad hominem detector with high accuracy (F1 more than 83%, showing a significant improvement over prior work), even for datasets for which annotated instances constitute a very small fraction. We then used our detector on 265k arguments collected from the online debate forum - CreateDebate. Our crowdsourced surveys validate our in-the-wild predictions on CreateDebate data (94% match with manual annotation). Our analysis revealed that a surprising 31.23% of CreateDebate content contains ad hominem fallacy, and a cohort of highly active users post significantly more ad hominem to suppress opposing views. Then, our temporal analysis revealed that ad hominem argument usage increased significantly since the 2016 US Presidential election, not only for topics like Politics, but also for Science and Law. We conclude by discussing important implications of our work to detect and defend against ad hominem fallacies., Comment: Proceedings of the 17th International AAAI Conference on Web and Social Media
Published: 2022

44. Is this bug severe? A text-cum-graph based model for bug severity prediction

Author: Hazra, Rima, Dwivedi, Arpit, and Mukherjee, Animesh
Subjects: Computer Science - Information Retrieval, Computer Science - Software Engineering
Abstract: Repositories of large software systems have become commonplace. This massive expansion has resulted in the emergence of various problems in these software platforms including identification of (i) bug-prone packages, (ii) critical bugs, and (iii) severity of bugs. One of the important goals would be to mine these bugs and recommend them to the developers to resolve them. The first step to this is that one has to accurately detect the extent of severity of the bugs. In this paper, we take up this task of predicting the severity of bugs in the near future. Contextualized neural models built on the text description of a bug and the user comments about the bug help to achieve reasonably good performance. Further information on how the bugs are related to each other in terms of the ways they affect packages can be summarised in the form of a graph and used along with the text to get additional benefits., Comment: Accepted at ECML PKDD 2022, Research and ADS Track
Published: 2022

45. Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

Author: Adak, Sayantan, Ahmad, Altaf, Basu, Aditya, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: A timeline provides one of the most effective ways to visualize the important historical facts that occurred over a period of time, presenting the insights that may not be so apparent from reading the equivalent information in textual form. By leveraging generative adversarial learning for important sentence classification and by assimilating knowledge based tags for improving the performance of event coreference resolution we introduce a two staged system for event timeline generation from multiple (historical) text documents. We demonstrate our results on two manually annotated historical text documents. Our results can be extremely helpful for historians, in advancing research in history and in understanding the socio-political landscape of a country as reflected in the writings of famous personas., Comment: Accepted at the main conference of ECML PKDD 2022 as a long paper. The camera-ready version
Published: 2022

46. CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

Author: Saha, Punyajoy, Singh, Kanishk, Kumar, Adarsh, Mathew, Binny, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls, Comment: Accepted at IJCAI-ECAI 2022, 10 pages, 2 figures, 11 tables, Code is available at https://github.com/hate-alert/CounterGEDI
Published: 2022

47. HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

Author: Das, Mithun, Saha, Punyajoy, Mathew, Binny, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language
Abstract: Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than one language have been used for conversation in social media. Typically, hate speech detection models are evaluated by measuring their performance on the held-out test data using metrics such as accuracy and F1-score. While these metrics are useful, it becomes difficult to identify using them where the model is failing, and how to resolve it. To enable more targeted diagnostic insights of such multilingual hate speech models, we introduce a set of functionalities for the purpose of evaluation. We have been inspired to design this kind of functionalities based on real-world conversation on social media. Considering Hindi as a base language, we craft test cases for each functionality. We name our evaluation dataset HateCheckHIn. To illustrate the utility of these functionalities , we test state-of-the-art transformer based m-BERT model and the Perspective API., Comment: Accepted at: 13th Edition of its Language Resources and Evaluation Conference. arXiv admin note: text overlap with arXiv:2012.15606 by other authors
Published: 2022

48. Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

Author: Das, Mithun, Banerjee, Somnath, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models' misclassified posts across various settings. We have made our code and models public for other researchers., Comment: Accepted at HT '22: 33rd ACM Conference on Hypertext and Social Media
Published: 2022

49. CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection

Author: Chakraborty, Souvic, Dutta, Parag, Roychowdhury, Sumegh, and Mukherjee, Animesh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Human-Computer Interaction, Computer Science - Social and Information Networks, I.2.7, J.4
Abstract: The last decade has witnessed a surge in the interaction of people through social networking platforms. While there are several positive aspects of these social platforms, the proliferation has led them to become the breeding ground for cyber-bullying and hate speech. Recent advances in NLP have often been used to mitigate the spread of such hateful content. Since the task of hate speech detection is usually applicable in the context of social networks, we introduce CRUSH, a framework for hate speech detection using user-anchored self-supervision and contextual regularization. Our proposed approach secures ~ 1-12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular english social media datasets., Comment: Accepted in NAACL HLT 2022 (Long Paper)
Published: 2022

50. FaiRIR: Mitigating Exposure Bias from Related Item Recommendations in Two-Sided Platforms

Author: Dash, Abhisek, Chakraborty, Abhijnan, Ghosh, Saptarshi, Mukherjee, Animesh, and Gummadi, Krishna P.
Subjects: Computer Science - Information Retrieval, Computer Science - Computers and Society
Abstract: Related Item Recommendations (RIRs) are ubiquitous in most online platforms today, including e-commerce and content streaming sites. These recommendations not only help users compare items related to a given item, but also play a major role in bringing traffic to individual items, thus deciding the exposure that different items receive. With a growing number of people depending on such platforms to earn their livelihood, it is important to understand whether different items are receiving their desired exposure. To this end, our experiments on multiple real-world RIR datasets reveal that the existing RIR algorithms often result in very skewed exposure distribution of items, and the quality of items is not a plausible explanation for such skew in exposure. To mitigate this exposure bias, we introduce multiple flexible interventions (FaiRIR) in the RIR pipeline. We instantiate these mechanisms with two well-known algorithms for constructing related item recommendations -- rating-SVD and item2vec -- and show on real-world data that our mechanisms allow for a fine-grained control on the exposure distribution, often at a small or no cost in terms of recommendation quality, measured in terms of relatedness and user satisfaction., Comment: This work has been accepted as a regular paper in IEEE Transactions on Computational Social Systems 2022 (IEEE TCSS'22)
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

700 results on '"Mukherjee, Animesh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources