With the consumption pattern of online shopping becoming increasingly common, the way for consumers to obtain product evaluation information has changed from word of mouth in the past to online reviews. More than 70% of consumers refer to e-commerce platform product evaluation information when shopping online, and more than 90% of enterprises believe that reviews will play a decisive role in future consumer behavior. Unlike subjective survey data, product online reviews are not subject to the subjective judgment of researchers during the survey process, and can reflect the real user experience and emotional tendency. Therefore, it is important to study the ways in which user evaluation big data drive product design research and development, such as accelerating the shift in product design, promoting marketing, and improving user satisfaction. Current research on running shoes mainly focuses on product function development, shoe last redesign and market demand classification. At present, no scholars have explored the factors that consumers pay attention to when buying and using running shoes from the perspective of e-commerce big data. Understanding the consumption trend and consumer preference of running shoes is of great significance for industrial development and marketing strategy formulation. To mine consumers’ attention information in their buying running shoes online, firstly, Requests library and Pymysql library in python3.11 were used to collect the sales feature data of top 600 running shoes sold on Jingdong Mall and 100,000 user comments data. Secondly, text preprocessing of online review text was performed by using the precise mode in the Chinese word segmentation system of Jieba Database. Thirdly, Origin 2021 was used to analyze the basic information of the sales characteristics of running shoes. Fourthly, LDA model and Gibbs sampling were used to cluster review texts to explore the distribution of product feature words under different themes. Finally, SnowNLP was used to score the text for emotion, so as to obtain positive and negative labels, and topic analysis was performed based on emotion labels to compare the difference in topic distribution of positive and negative comments. From the perspective of big data analysis, this paper used LDA model to conduct text mining on 100,000 online reviews of running shoes, conduct word frequency co-occurrence analysis, topic clustering and sentiment analysis on product review data, analyze the causes of problems from the dimensions of brand, technology and after-sales service and put forward relevant suggestions. Domestic running shoes have a complete product layout from the entry market to the high-end market, but due to the world brand effect, technology accumulation and user reputation, there is still a long way to go compared with the top brands in sales and focusing on the mid-to-high-end market. Most of the running shoes with top sales rank participate in full reduction and coupon activities, and the proportion of self-operated, store sales and quality certification labels is higher than the overall level, while self-operated and coupon labels have a significant role in promoting the purchase of running shoes. When consumers buy running shoes online, they mainly pay attention to the appearance details, functional attributes, cost performance, wearing feelings, service concessions and so on. A small number of consumers have a poor attitude towards the wearing experience of running shoes, product quality and service promotion. In the future, in-depth research can be carried out according to various characteristics of consumers. On the basis of collecting big data of user comments, it is necessary to further mine user information such as age, region and occupation, so that the comment topic can be mapped to specific user groups, which is helpful to meet the targeted research and development of specific consumer groups and the implementation of precision marketing. In addition, to make the results more universally valuable, it is necessary to continue to increase the amount of data in the future to make the model better understand the various topics, domains and contexts, so as to improve the reliability and validity of the results. [ABSTRACT FROM AUTHOR]