Back to Search Start Over

BEYONDWORDS is All You Need: Agentic Generative AI based Social Media Themes Extractor

Authors :
Ghali, Mohammed-Khalil
Farrag, Abdelrahman
Lam, Sarah
Won, Daehan
Publication Year :
2025

Abstract

Thematic analysis of social media posts provides a major understanding of public discourse, yet traditional methods often struggle to capture the complexity and nuance of unstructured, large-scale text data. This study introduces a novel methodology for thematic analysis that integrates tweet embeddings from pre-trained language models, dimensionality reduction using and matrix factorization, and generative AI to identify and refine latent themes. Our approach clusters compressed tweet representations and employs generative AI to extract and articulate themes through an agentic Chain of Thought (CoT) prompting, with a secondary LLM for quality assurance. This methodology is applied to tweets from the autistic community, a group that increasingly uses social media to discuss their experiences and challenges. By automating the thematic extraction process, the aim is to uncover key insights while maintaining the richness of the original discourse. This autism case study demonstrates the utility of the proposed approach in improving thematic analysis of social media data, offering a scalable and adaptable framework that can be applied to diverse contexts. The results highlight the potential of combining machine learning and Generative AI to enhance the depth and accuracy of theme identification in online communities.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2503.01880
Document Type :
Working Paper