1. The Tell-Tale Tweet: extracting, enriching, and modelling people’s activity and movement from social media data
- Author
-
Yang, Bing
- Subjects
- Mobility pattern, Social media, Text mining, Clustering, Neural network
- Abstract
Social media data have been used in many studies in the recent years. Compared to the traditional household travel survey data, social media data have a lower cost, and they can be obtained from abundant sources. However, several pre-processing tasks are required before social media data can be used for mobility analyse. For instance, distinguishment needs to be made between location and activity. In this thesis, text analytics, machine learning, and “Tweet Block” are applied in order to differentiate between location and activity and to extract more accurate information from social media data which can then be used for mobility analyse. In Section 3, the focus is to extract and analyse users’ movement and lifestyle. Unlike the state-of-the-practice, this research clearly distinguishes between location and activity. Text mining technique was applied to identify location and activity information respectively, and a clustering algorithm was applied to analyse the lifestyle of users. The strict distinguish between activity and location led to a result that the identified data is limited compared to traditional ways of labelling. To solve this problem, the information extracted from data was enriched by applying a method called “Tweet Block”. Tweet Block enable to identify 1,745 location and 98 activity which were not identified in text mining process. With the enriched data in hand, a method was purposed to infer information of user’s movement from the data point that is previously unusable (i.e. a single record from a day.) The average generated trip rate using this method was increased by 26%-50% compared to the method used in previous research. Travelling track was also generated to analyse the movement of these users. In Section 4, the primary purpose is to build a valid activity prediction model from the data. Machine learning algorithms were applied to build an activity prediction model from the data. Land use data were overlapped to the original data set, which acted as a supportive data to location information. Random Forest (RF) and Neural Network (NN) algorithms were used to build models and NN models were kept after model selection. A Stratified K-fold cross-validation was used to validate the model.
- Published
- 2020