TEXTUAL-BASED CLUSTERING OF WEB DOCUMENTS.

Authors :: Brzeminski, Pawel
Pedrycz, Witold
Source :: International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems. Dec2004, Vol. 12 Issue 6, p715-743. 29p.
Publication Year :: 2004
Abstract: In our study we presented an effective method for clustering of Web pages. From flat HTML files we extracted keywords, formed feature vectors as representation of Web pages and applied them to a clustering method. We took advantage of the Fuzzy C-Means clustering algorithm (FCM), We demonstrated an organized and schematic manner of data collection. Various categories of Web pages were retrieved from ODP (Open Directory Project) in order to create our datasets. The results of clustering proved that the method performs well for all datasets. Finally, we presented a comprehensive experimental study examining: the behavior of the algorithm for different input parameters, internal structure of datasets and classification experiments. [ABSTRACT FROM AUTHOR]

Subjects :: *WEBSITES
*ELECTRONIC records
*ELECTRONIC information resources
*RECORDS management
*HTML (Document markup language)
*ALGORITHMS

Language :: English
ISSN :: 02184885
Volume :: 12
Issue :: 6
Database :: Academic Search Index
Journal :: International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems
Publication Type :: Academic Journal
Accession number :: 16257974
Full Text :: https://doi.org/10.1142/S021848850400317X