Back to Search
Start Over
Canonization rules for detecting different URLs
- Source :
- 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).
- Publication Year :
- 2016
- Publisher :
- IEEE, 2016.
-
Abstract
- A Uniform Resource Locator (URL) represents the address of a web page in World Wide Web (WWW). A URL grants access to a single web page on the WWW. Here in this paper the main focus is on the URLs addressing the same web page/same content. A web Page can have two or more URLs with which the web page can be accessed. These duplicate URLs can be a serious threat to the entire pipeline of internet searcher administration for indexing and creeping. I am presenting a novel algorithm for detecting canonization rules for normalizing URLs to the original single URL. Here a pattern recognition approach has been used for analyzing textual data. This approach benefits search engines from information about duplicate URLs to optimize the performance of search engine in terms of reduced cost and improved quality.
- Subjects :
- Information retrieval
Computer science
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Static web page
02 engineering and technology
Page view
URL redirection
World Wide Web
Rewrite engine
URL normalization
020204 information systems
Web page
0202 electrical engineering, electronic engineering, information engineering
Semantic URL
020201 artificial intelligence & image processing
Web crawler
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)
- Accession number :
- edsair.doi...........186a25070f83cfc291935c7474695976