Back to Search Start Over

Relational clustering for knowledge discovery in life sciences

Authors :
4401
DIPARTIMENTO DI INFORMATICA, SISTEMISTICA E COMUNICAZIONE
AREA MIN. 01 - SCIENZE MATEMATICHE E INFORMATICHE
4401
DIPARTIMENTO DI INFORMATICA, SISTEMISTICA E COMUNICAZIONE
AREA MIN. 01 - SCIENZE MATEMATICHE E INFORMATICHE
Publication Year :
2010

Abstract

Clustering is one of the most common machines learning technique, which has been widely applied in genomics, proteomics and more generally in Life Sciences. In particular, clustering is an unsupervised technique that, based on geometric concepts like distance or similarity, partitions objects into groups, such that objects with similar characteristics are clustered together and dissimilar objects are in different clusters. In many domains where clustering is applied, some background knowledge is available in different forms: labelled data (specifying the category to which an instance belongs); complementary information about "true" similarity between pairs of objects or about the relationships structure present in the input data; user preferences (for example specifying whether two instances should be in same or different clusters). In particular, in many real-world applications like biological data processing, social network analysis and text mining, data do not exist in isolation, but a rich structure of relationships subsists between them. A simple example can be viewed in biological domain, where there are al lot of relationships between genes and proteins based on many experimental conditions. Another example, maybe common, is the Web search domain where there are relations between documents and words in a text or web pages, search queries and web users. Our research is focalized on how this background knowledge can be incorporated into traditional clustering algorithms to optimize the process of pattern discovery (clustering) between instances. In this thesis, we first provide an overview of traditional clustering methods with some important distance measures and then we analyze three particular challenges that we try to overcome with different proposed methods: "feature selection" to reduce high dimensional input space and remove noise from data; "mixed data types" to handle in clustering procedure both numeric and categorical values, typically of life scienc<br />Il lavoro di tesi è stato svolto all'interno del laboratorio MIND (Models In Decision Making and Data Analysis)<br />ARCHETTI, FRANCESCO<br />1712<br />open<br />open<br />Giordani<br />Giordani, I

Details

Database :
OAIster
Notes :
22, 2008/2009, application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1308906947
Document Type :
Electronic Resource