Research on data retrieval and analysis system based on Baidu reptile technology in big data era.

Authors :: Jin, Jiangang
Elhoseny, Mohamed
Yuan, X.
Source :: Journal of Intelligent & Fuzzy Systems; 2020, Vol. 38 Issue 2, p1203-1213, 11p
Publication Year :: 2020
Abstract: With the rapid development of the Internet, the current Web has become the main platform for people to publish and retrieve information. How to quickly and accurately find the information required by users in a large amount of network information resources has become an urgent need of the people. Web crawlers are research fields that appear to meet this demand. Based on this, the paper designs and implements a distributed web crawler system based on the existing research work, and its goal is to provide high quality data support for the network public opinion system. The web crawler system designed and implemented in this paper solves the problems of low efficiency, poor scalability and low automation of single-machine crawlers, which improves the speed of webpage collection and data extraction precision and expands the scale of webpage collection. At the end of the article, the system related interface screenshots and test results are displayed. It can be seen from the test results that the crawler system can effectively collect dynamic web pages, and the result of automatic extraction of web pages has high precision, and also realizes the entire crawling system. [ABSTRACT FROM AUTHOR]

Subjects :: INFORMATION retrieval
SYSTEM analysis
BIG data
WEBSITES
DATA analysis
SCALABILITY
INFORMATION networks
TEMPORAL databases

Full Text Access

Tools