1. YouTube Video Analysis
- Author
-
Bachubhay, Akhil, Chhour, Danny, Deng, Heji, and Tran, Trung
- Subjects
Data Analysis ,Web Scraper ,YouTube ,Data Collection ,Plotly ,NLTK ,Frequency Count ,Transcripts ,Social Media ,Comments ,Python ,Jupyter Notebook - Abstract
YouTube (youtube.com) is an online video-sharing platform that allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Over 2 billion logged-in users visit YouTube each month, and every day people watch over a billion hours of video and generate billions of views. UGC (User-Generated Content) makes up a good portion of the content available on YouTube, and more and more people post videos on YouTube, many of which become well-known YouTubers. A notable trend to look at for these YouTubers is how their channel grows over time. We were tasked with analyzing how certain YouTubers become successful over time, how their early videos differ from later ones in terms of scripts, and how comments change with fame. Such analysis requires us to look into two sets of data. The first set is numerical data of the channels, which consists of view counts of videos, likes and dislikes on videos, published dates of the video, the interactions between the video creator and the audience, etc. The second set is textual data, which consists of the auto-generated scripts from videos as well as comments from the users. With the help of YouTube APIs and other available helper tools, we are able to scrape the metadata from data of videos and output them as CSV files for future studies. For the analysis, we generate some scatter graphs where each dot stands for one instance of the video, where the x-axis represents the published date while the y-axis represents the views it gets, and then the color of the dot represents some other metrics for evaluation (for instance, the duration of videos). With the Python NLTK package, we are able to conduct analyses over the transcripts from the videos and comments, to see what words are spoken the most, what words appear frequently in the comments and if they are positive or negative, how many words the creator says in a minute, etc. Combining these data we can generate a more thorough scatter graph for discovering if there is a pattern on how certain YouTubers become more and more successful. This project was developed using data solely from one channel called Biffa Plays Indie Games as the basis, but it is expected to function correctly when used on other channels as well. The two versions of the final report are in YouTubeVideoAnalysisReport.docx (Word) and YouTubeVideoAnalysisReport.pdf (PDF). The two versions of the final presentation are in YouTubeVideoAnalysisPresentation.pptx (PowerPoint) and YouTubeVideoAnalysisPresentation.pdf (PDF).
- Published
- 2021