Author: "Wan, Herun" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wan, Herun"' showing total 21 results

Start Over Author "Wan, Herun"

21 results on '"Wan, Herun"'

1. On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

Author: Wan, Herun, Luo, Minnan, Su, Zhixiong, Dai, Guang, and Zhao, Xiang
Subjects: Computer Science - Computation and Language
Abstract: Evidence-enhanced detectors present remarkable abilities in identifying malicious social text with related evidence. However, the rise of large language models (LLMs) brings potential risks of evidence pollution to confuse detectors. This paper explores how to manipulate evidence, simulating potential misuse scenarios including basic pollution, and rephrasing or generating evidence by LLMs. To mitigate its negative impact, we propose three defense strategies from both the data and model sides, including machine-generated text detection, a mixture of experts, and parameter updating. Extensive experiments on four malicious social text detection tasks with ten datasets present that evidence pollution, especially the generate strategy, significantly compromises existing detectors. On the other hand, the defense strategies could mitigate evidence pollution, but they faced limitations for practical employment, such as the need for annotated data and huge inference costs. Further analysis illustrates that polluted evidence is of high quality, would compromise the model calibration, and could ensemble to amplify the negative impact.
Published: 2024

2. How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

Author: Wan, Herun, Luo, Minnan, Ma, Zihan, Dai, Guang, and Zhao, Xiang
Subjects: Computer Science - Social and Information Networks, Computer Science - Computers and Society
Abstract: Information spreads faster through social media platforms than traditional media, thus becoming an ideal medium to spread misinformation. Meanwhile, automated accounts, known as social bots, contribute more to the misinformation dissemination. In this paper, we explore the interplay between social bots and misinformation on the Sina Weibo platform. We propose a comprehensive and large-scale misinformation dataset, containing 11,393 misinformation and 16,416 unbiased real information with multiple modality information, with 952,955 related users. We propose a scalable weak-surprised method to annotate social bots, obtaining 68,040 social bots and 411,635 genuine accounts. To the best of our knowledge, this dataset is the largest dataset containing misinformation and social bots. We conduct comprehensive experiments and analysis on this dataset. Results show that social bots play a central role in misinformation dissemination, participating in news discussions to amplify echo chambers, manipulate public sentiment, and reverse public stances.
Published: 2024

3. Disentangled Noisy Correspondence Learning

Author: Dang, Zhuohang, Luo, Minnan, Wang, Jihong, Jia, Chengyou, Han, Haochen, Wan, Herun, Dai, Guang, Chang, Xiaojun, and Wang, Jingdong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of MII and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal input for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL's efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.
Published: 2024

4. DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

Author: Wan, Herun, Feng, Shangbin, Tan, Zhaoxuan, Wang, Heng, Tsvetkov, Yulia, and Luo, Minnan
Subjects: Computer Science - Computation and Language
Abstract: Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount. In this work, we propose DELL that identifies three key stages in misinformation detection where LLMs could be incorporated as part of the pipeline: 1) LLMs could \emph{generate news reactions} to represent diverse perspectives and simulate user-news interaction networks; 2) LLMs could \emph{generate explanations} for proxy tasks (e.g., sentiment, stance) to enrich the contexts of news articles and produce experts specializing in various aspects of news understanding; 3) LLMs could \emph{merge task-specific experts} and provide an overall prediction by incorporating the predictions and confidence scores of varying experts. Extensive experiments on seven datasets with three LLMs demonstrate that DELL outperforms state-of-the-art baselines by up to 16.8\% in macro f1-score. Further analysis reveals that the generated reactions and explanations are greatly helpful in misinformation detection, while our proposed LLM-guided expert merging helps produce better-calibrated predictions.
Published: 2024

5. What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

Author: Feng, Shangbin, Wan, Herun, Wang, Ningnan, Tan, Zhaoxuan, Luo, Minnan, and Tsvetkov, Yulia
Subjects: Computer Science - Computation and Language
Abstract: Social media bot detection has always been an arms race between advancements in machine learning bot detectors and adversarial bot strategies to evade detection. In this work, we bring the arms race to the next level by investigating the opportunities and risks of state-of-the-art large language models (LLMs) in social bot detection. To investigate the opportunities, we design novel LLM-based bot detectors by proposing a mixture-of-heterogeneous-experts framework to divide and conquer diverse user information modalities. To illuminate the risks, we explore the possibility of LLM-guided manipulation of user textual and structured information to evade detection. Extensive experiments with three LLMs on two datasets demonstrate that instruction tuning on merely 1,000 annotated examples produces specialized LLMs that outperform state-of-the-art baselines by up to 9.1% on both datasets, while LLM-guided manipulation strategies could significantly bring down the performance of existing bot detectors by up to 29.6% and harm the calibration and reliability of bot detection systems., Comment: ACL 2024
Published: 2024

6. BotPercent: Estimating Bot Populations in Twitter Communities

Author: Tan, Zhaoxuan, Feng, Shangbin, Sclar, Melanie, Wan, Herun, Luo, Minnan, Choi, Yejin, and Tsvetkov, Yulia
Subjects: Computer Science - Social and Information Networks
Abstract: Twitter bot detection is vital in combating misinformation and safeguarding the integrity of social media discourse. While malicious bots are becoming more and more sophisticated and personalized, standard bot detection approaches are still agnostic to social environments (henceforth, communities) the bots operate at. In this work, we introduce community-specific bot detection, estimating the percentage of bots given the context of a community. Our method -- BotPercent -- is an amalgamation of Twitter bot detection datasets and feature-, text-, and graph-based models, adjusted to a particular community on Twitter. We introduce an approach that performs confidence calibration across bot detection models, which addresses generalization issues in existing community-agnostic models targeting individual bots and leads to more accurate community-level bot estimations. Experiments demonstrate that BotPercent achieves state-of-the-art performance in community-level Twitter bot detection across both balanced and imbalanced class distribution settings, %outperforming existing approaches and presenting a less biased estimator of Twitter bot populations within the communities we analyze. We then analyze bot rates in several Twitter groups, including users who engage with partisan news media, political communities in different countries, and more. Our results reveal that the presence of Twitter bots is not homogeneous, but exhibiting a spatial-temporal distribution with considerable heterogeneity that should be taken into account for content moderation and social media policy making. The implementation of BotPercent is available at https://github.com/TamSiuhin/BotPercent., Comment: Accepted to findings of EMNLP 2023
Published: 2023

7. FNDPro: Evaluating the Importance of Propagations during Fake News Spread

Author: Wan, Herun, Wang, Ningnan, Zhao, Xiang, Li, Rui, Yang, Hui, Luo, Minnan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Onizuka, Makoto, editor, Lee, Jae-Gil, editor, Tong, Yongxin, editor, Xiao, Chuan, editor, Ishikawa, Yoshiharu, editor, Amer-Yahia, Sihem, editor, Jagadish, H. V., editor, and Lu, Kejing, editor
Published: 2024
Full Text: View/download PDF

8. GraTO: Graph Neural Network Framework Tackling Over-smoothing with Neural Architecture Search

Author: Feng, Xinshun, Wan, Herun, Feng, Shangbin, Wang, Hongrui, Zhou, Jun, Zheng, Qinghua, and Luo, Minnan
Subjects: Computer Science - Machine Learning
Abstract: Current Graph Neural Networks (GNNs) suffer from the over-smoothing problem, which results in indistinguishable node representations and low model performance with more GNN layers. Many methods have been put forward to tackle this problem in recent years. However, existing tackling over-smoothing methods emphasize model performance and neglect the over-smoothness of node representations. Additional, different approaches are applied one at a time, while there lacks an overall framework to jointly leverage multiple solutions to the over-smoothing challenge. To solve these problems, we propose GraTO, a framework based on neural architecture search to automatically search for GNNs architecture. GraTO adopts a novel loss function to facilitate striking a balance between model performance and representation smoothness. In addition to existing methods, our search space also includes DropAttribute, a novel scheme for alleviating the over-smoothing challenge, to fully leverage diverse solutions. We conduct extensive experiments on six real-world datasets to evaluate GraTo, which demonstrates that GraTo outperforms baselines in the over-smoothing metrics and achieves competitive performance in accuracy. GraTO is especially effective and robust with increasing numbers of GNN layers. Further experiments bear out the quality of node representations learned with GraTO and the effectiveness of model architecture. We make cide of GraTo available at Github (\url{https://github.com/fxsxjtu/GraTO})., Comment: accepted at CIKM2022
Published: 2022
Full Text: View/download PDF

9. BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

Author: Lei, Zhenyu, Wan, Herun, Zhang, Wenqian, Feng, Shangbin, Chen, Zilong, Li, Jundong, Zheng, Qinghua, and Luo, Minnan
Subjects: Computer Science - Artificial Intelligence
Abstract: Twitter bots are automatic programs operated by malicious actors to manipulate public opinion and spread misinformation. Research efforts have been made to automatically identify bots based on texts and networks on social media. Existing methods only leverage texts or networks alone, and while few works explored the shallow combination of the two modalities, we hypothesize that the interaction and information exchange between texts and graphs could be crucial for holistically evaluating bot activities on social media. In addition, according to a recent survey (Cresci, 2020), Twitter bots are constantly evolving while advanced bots steal genuine users' tweets and dilute their malicious content to evade detection. This results in greater inconsistency across the timeline of novel Twitter bots, which warrants more attention. In light of these challenges, we propose BIC, a Twitter Bot detection framework with text-graph Interaction and semantic Consistency. Specifically, in addition to separately modeling the two modalities on social media, BIC employs a text-graph interaction module to enable information exchange across modalities in the learning process. In addition, given the stealing behavior of novel Twitter bots, BIC proposes to model semantic consistency in tweets based on attention weights while using it to augment the decision process. Extensive experiments demonstrate that BIC consistently outperforms state-of-the-art baselines on two widely adopted datasets. Further analyses reveal that text-graph interactions and modeling semantic consistency are essential improvements and help combat bot evolution.
Published: 2022

10. TwiBot-22: Towards Graph-Based Twitter Bot Detection

Author: Feng, Shangbin, Tan, Zhaoxuan, Wan, Herun, Wang, Ningnan, Chen, Zilong, Zhang, Binchi, Zheng, Qinghua, Zhang, Wenqian, Lei, Zhenyu, Yang, Shujie, Feng, Xinshun, Zhang, Qingyue, Wang, Hongrui, Liu, Yuhan, Bai, Yuyang, Wang, Heng, Cai, Zijian, Wang, Yanbo, Zheng, Lijing, Ma, Zihan, Li, Jundong, and Luo, Minnan
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence
Abstract: Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation framework are publicly available at https://twibot22.github.io/, Comment: NeurIPS 2022, Datasets and Benchmarks Track
Published: 2022

11. TeST: Temporal–spatial separated transformer for temporal action localization

Author: Wan, Herun, Luo, Minnan, Li, Zhihui, and Wang, Yang
Published: 2025
Full Text: View/download PDF

12. BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks

Author: Feng, Shangbin, Wan, Herun, Wang, Ningnan, and Luo, Minnan
Subjects: Computer Science - Social and Information Networks
Abstract: Twitter bot detection is an important and challenging task. Existing bot detection measures fail to address the challenge of community and disguise, falling short of detecting bots that disguise as genuine users and attack collectively. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and applies relational graph convolutional networks. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships., Comment: accepted at ASONAM 2021
Published: 2021
Full Text: View/download PDF

13. SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection

Author: Feng, Shangbin, Wan, Herun, Wang, Ningnan, Li, Jundong, and Luo, Minnan
Subjects: Computer Science - Social and Information Networks
Abstract: Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots., Comment: accepted at CIKM 2021
Published: 2021
Full Text: View/download PDF

14. TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

Author: Feng, Shangbin, Wan, Herun, Wang, Ningnan, Li, Jundong, and Luo, Minnan
Subjects: Computer Science - Social and Information Networks
Abstract: Twitter has become a vital social media platform while an ample amount of malicious Twitter bots exist and induce undesirable social effects. Successful Twitter bot detection proposals are generally supervised, which rely heavily on large-scale datasets. However, existing benchmarks generally suffer from low levels of user diversity, limited user information and data scarcity. Therefore, these datasets are not sufficient to train and stably benchmark bot detection measures. To alleviate these problems, we present TwiBot-20, a massive Twitter bot detection benchmark, which contains 229,573 users, 33,488,192 tweets, 8,723,736 user property items and 455,958 follow relationships. TwiBot-20 covers diversified bots and genuine users to better represent the real-world Twittersphere. TwiBot-20 also includes three modals of user information to support both binary classification of single users and community-aware approaches. To the best of our knowledge, TwiBot-20 is the largest Twitter bot detection benchmark to date. We reproduce competitive bot detection methods and conduct a thorough evaluation on TwiBot-20 and two other public datasets. Experiment results demonstrate that existing bot detection measures fail to match their previously claimed performance on TwiBot-20, which suggests that Twitter bot detection remains a challenging task and requires further research efforts., Comment: accepted at CIKM 2021
Published: 2021
Full Text: View/download PDF

15. BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

Author: Lei, Zhenyu, primary, Wan, Herun, additional, Zhang, Wenqian, additional, Feng, Shangbin, additional, Chen, Zilong, additional, Li, Jundong, additional, Zheng, Qinghua, additional, and Luo, Minnan, additional
Published: 2023
Full Text: View/download PDF

16. BotPercent: Estimating Bot Populations in Twitter Communities

Author: Tan, Zhaoxuan, primary, Feng, Shangbin, additional, Sclar, Melanie, additional, Wan, Herun, additional, Luo, Minnan, additional, Choi, Yejin, additional, and Tsvetkov, Yulia, additional
Published: 2023
Full Text: View/download PDF

17. BotPercent: Estimating Twitter Bot Populations from Groups to Crowds

Author: Tan, Zhaoxuan, Feng, Shangbin, Sclar, Melanie, Wan, Herun, Luo, Minnan, Choi, Yejin, and Tsvetkov, Yulia
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Social and Information Networks
Abstract: Twitter bot detection has become increasingly important in combating misinformation, identifying malicious online campaigns, and protecting the integrity of social media discourse. While existing bot detection literature mostly focuses on identifying individual bots, it remains underexplored how to estimate the proportion of bots within specific communities and social networks, which has great implications for both content moderators and day-to-day users. In this work, we propose community-level bot detection, a novel approach to estimating the amount of malicious interference in online communities by estimating the percentage of bot accounts. Specifically, we introduce BotPercent, an amalgamation of Twitter bot-detection datasets and feature-, text-, and graph-based models that overcome generalization issues in existing individual-level models, resulting in a more accurate community-level bot estimation. Experiments demonstrate that BotPercent achieves state-of-the-art community-level bot detection performance on the TwiBot-22 benchmark while showing great robustness towards the tampering of specific user features. Armed with BotPercent, we analyze bot rates in different Twitter groups and communities, such as all active Twitter users, users that interact with partisan news media, users that participate in Elon Musk's content moderation votes, and the political communities in different countries and regions. Our experimental results demonstrate that the existence of Twitter bots is not homogeneous, but rather a spatial-temporal distribution whose heterogeneity should be taken into account for content moderation, social media policy making, and more. The BotPercent implementation is available at https://github.com/TamSiuhin/BotPercent
Published: 2023

18. GraTO

Author: Feng, Xinshun, primary, Wan, Herun, additional, Feng, Shangbin, additional, Wang, Hongrui, additional, Zheng, Qinghua, additional, Zhou, Jun, additional, and Luo, Minnan, additional
Published: 2022
Full Text: View/download PDF

19. BotRGCN

Author: Feng, Shangbin, Wan, Herun, Wang, Ningnan, and Luo, Minnan
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Computer Science - Social and Information Networks
Abstract: Twitter bot detection is an important and challenging task. Existing bot detection measures fail to address the challenge of community and disguise, falling short of detecting bots that disguise as genuine users and attack collectively. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and applies relational graph convolutional networks. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships., accepted at ASONAM 2021
Published: 2021

20. TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

Author: Feng, Shangbin, primary, Wan, Herun, additional, Wang, Ningnan, additional, Li, Jundong, additional, and Luo, Minnan, additional
Published: 2021
Full Text: View/download PDF

21. SATAR

Author: Feng, Shangbin, primary, Wan, Herun, additional, Wang, Ningnan, additional, Li, Jundong, additional, and Luo, Minnan, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"Wan, Herun"'

1. On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

2. How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

3. Disentangled Noisy Correspondence Learning

4. DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

5. What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

6. BotPercent: Estimating Bot Populations in Twitter Communities

7. FNDPro: Evaluating the Importance of Propagations during Fake News Spread

8. GraTO: Graph Neural Network Framework Tackling Over-smoothing with Neural Architecture Search

9. BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

10. TwiBot-22: Towards Graph-Based Twitter Bot Detection

11. TeST: Temporal–spatial separated transformer for temporal action localization

12. BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks

13. SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection

14. TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

15. BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

16. BotPercent: Estimating Bot Populations in Twitter Communities

17. BotPercent: Estimating Twitter Bot Populations from Groups to Crowds

18. GraTO

19. BotRGCN

20. TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

21. SATAR

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"Wan, Herun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources