9 results on '"YIZHOU SUN"'
Search Results
2. Heterogeneous information networks
- Author
-
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu
- Subjects
General Engineering - Abstract
In 2011, we proposed PathSim to systematically define and compute similarity between nodes in a heterogeneous information network (HIN), where nodes and links are from different types. In the PathSim paper, we for the first time introduced HIN with general network schema and proposed the concept of meta-paths to systematically define new relation types between nodes. In this paper, we summarize the impact of PathSim paper in both academia and industry. We start from the algorithms that are based on meta-path-based feature engineering, then move on to the recent development in heterogeneous network representation learning, including both shallow network embedding and heterogeneous graph neural networks. In the end, we make the connection between knowledge graphs and HINs and discuss the implication of meta-paths in the symbolic reasoning scenario. Finally, we point out several future directions.
- Published
- 2022
3. Network Embedding for Community Detection in Attributed Networks
- Author
-
Jianbin Huang, Zhongbin Sun, Chenyu Wang, Yizhou Sun, Yang Li, Liang He, Fang He, Heli Sun, and Xiaolin Jia
- Subjects
Theoretical computer science ,General Computer Science ,Computer science ,Node (networking) ,Community structure ,02 engineering and technology ,Autoencoder ,Graph drawing ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Cluster analysis ,Feature learning ,Clustering coefficient - Abstract
Community detection aims to partition network nodes into a set of clusters, such that nodes are more densely connected to each other within the same cluster than other clusters. For attributed networks, apart from the denseness requirement of topology structure, the attributes of nodes in the same community should also be homogeneous. Network embedding has been proved extremely useful in a variety of tasks, such as node classification, link prediction, and graph visualization, but few works dedicated to unsupervised embedding of node features specified for clustering task, which is vital for community detection and graph clustering. By post-processing with clustering algorithms like k -means, most existing network embedding methods can be applied to clustering tasks. However, the learned embeddings are not designed for clustering task, they only learn topological and attributed information of networks, and no clustering-oriented information is explored. In this article, we propose an algorithm named Network Embedding for node Clustering (NEC) to learn network embedding for node clustering in attributed graphs. Specifically, the presented work introduces a framework that simultaneously learns graph structure-based representations and clustering-oriented representations together. The framework consists of the following three modules: graph convolutional autoencoder module, soft modularity maximization module, and self-clustering module. Graph convolutional autoencoder module learns node embeddings based on topological structure and node attributes. We introduce soft modularity, which can be easily optimized using gradient descent algorithms, to exploit the community structure of networks. By integrating clustering loss and embedding loss, NEC can jointly optimize node cluster labels assignment and learn representations that keep local structure of network. This model can be effectively optimized using stochastic gradient algorithm. Empirical experiments on real-world networks and synthetic networks validate the feasibility and effectiveness of our algorithm on community detection task compared with network embedding based methods and traditional community detection methods.
- Published
- 2020
4. Recurrent Meta-Structure for Robust Similarity Measure in Heterogeneous Information Networks
- Author
-
Shaojie Qiao, Jianbin Huang, Stephen Wambura, Yu Zhou, Yizhou Sun, and Heli Sun
- Subjects
Commuting matrices ,General Computer Science ,Computer science ,02 engineering and technology ,Similarity measure ,Semantics ,computer.software_genre ,Weighting ,Schema (genetic algorithms) ,Ranking ,Similarity (network science) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Cluster analysis ,computer - Abstract
Similarity measure is one of the fundamental task in heterogeneous information network (HIN) analysis. It has been applied to many areas, such as product recommendation, clustering, and Web search. Most of the existing metrics can provide personalized services for users by taking a meta-path or meta-structure as input. However, these metrics may highly depend on the user-specified meta-path or meta-structure. In addition, users must know how to select an appropriate meta-path or meta-structure. In this article, we propose a novel similarity measure in HINs, called Recurrent Meta-Structure (RecurMS)-based Similarity (RMSS). The RecurMS as a schematic structure in HINs provides a unified framework for integrating all of the meta-paths and meta-structures, and can be constructed automatically by means of repetitively traversing the network schema. In order to formalize the semantics, the RecurMS is decomposed into several recurrent meta-paths and recurrent meta-trees, and we then define the commuting matrices of the recurrent meta-paths and meta-trees. All of these commuting matrices are combined together according to different weights. We propose two kinds of weighting strategies to determine the weights. The first is called the local weighting strategy that depends on the sparsity of the commuting matrices, and the second is called the global weighting strategy that depends on the strength of the commuting matrices. As a result, RMSS is defined by means of the weighted summation of the commuting matrices. Note that RMSS can also provide personalized services for users by means of the weights of the recurrent meta-paths and meta-trees. Experimental evaluations show that the proposed RMSS is robust and outperforms the existing metrics in terms of ranking and clustering task.
- Published
- 2019
5. PathSelClus
- Author
-
Xiao Yu, Yizhou Sun, Jiawei Han, Philip S. Yu, Brandon Norick, and Xifeng Yan
- Subjects
Fuzzy clustering ,General Computer Science ,business.industry ,Computer science ,Correlation clustering ,Conceptual clustering ,Constrained clustering ,computer.software_genre ,Machine learning ,Data stream clustering ,CURE data clustering algorithm ,Canopy clustering algorithm ,Artificial intelligence ,Data mining ,business ,Cluster analysis ,computer - Abstract
Real-world, multiple-typed objects are often interconnected, forming heterogeneous information networks. A major challenge for link-based clustering in such networks is their potential to generate many different results, carrying rather diverse semantic meanings. In order to generate desired clustering, we propose to use meta-path , a path that connects object types via a sequence of relations, to control clustering with distinct semantics. Nevertheless, it is easier for a user to provide a few examples (seeds) than a weighted combination of sophisticated meta-paths to specify her clustering preference. Thus, we propose to integrate meta-path selection with user-guided clustering to cluster objects in networks, where a user first provides a small set of object seeds for each cluster as guidance. Then the system learns the weight for each meta-path that is consistent with the clustering result implied by the guidance, and generates clusters under the learned weights of meta-paths. A probabilistic approach is proposed to solve the problem, and an effective and efficient iterative algorithm, PathSelClus , is proposed to learn the model, where the clustering quality and the meta-path weights mutually enhance each other. Our experiments with several clustering tasks in two real networks and one synthetic network demonstrate the power of the algorithm in comparison with the baselines.
- Published
- 2013
6. Mining heterogeneous information networks
- Author
-
Yizhou Sun and Jiawei Han
- Subjects
Point (typography) ,Relational database ,Computer science ,Geography, Planning and Development ,Network science ,computer.software_genre ,Data science ,Set (abstract data type) ,Homogeneous ,General Earth and Planetary Sciences ,Leverage (statistics) ,Heterogeneous information ,Data mining ,computer ,Water Science and Technology ,Meaning (linguistics) - Abstract
Most objects and data in the real world are of multiple types, interconnected, forming complex, heterogeneous but often semi-structured information networks. However, most network science researchers are focused on homogeneous networks, without distinguishing different types of objects and links in the networks. We view interconnected, multityped data, including the typical relational database data, as heterogeneous information networks, study how to leverage the rich semantic meaning of structural types of objects and links in the networks, and develop a structural analysis approach on mining semi-structured, multi-typed heterogeneous information networks. In this article, we summarize a set of methodologies that can effectively and efficiently mine useful knowledge from such information networks, and point out some promising research directions.
- Published
- 2013
7. PathSim
- Author
-
Philip S. Yu, Xifeng Yan, Yizhou Sun, Jiawei Han, and Tianyi Wu
- Subjects
Search engine ,Theoretical computer science ,Similarity (network science) ,Computer science ,Nearest neighbor search ,Path (graph theory) ,General Engineering ,Domain knowledge ,Data mining ,Similarity measure ,computer.software_genre ,computer ,Heterogeneous network - Abstract
Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media networks, it is important to study similarity search in such networks. Intuitively, two objects are similar if they are linked by many paths in the network. However, most existing similarity measures are defined for homogeneous networks. Different semantic meanings behind paths are not taken into consideration. Thus they cannot be directly applied to heterogeneous networks. In this paper, we study similarity search that is defined among the same type of objects in heterogeneous networks. Moreover, by considering different linkage paths in a network, one could derive various similarity semantics. Therefore, we introduce the concept of meta path-based similarity , where a meta path is a path consisting of a sequence of relations defined between different object types ( i.e. , structural paths at the meta level). No matter whether a user would like to explicitly specify a path combination given sufficient domain knowledge, or choose the best path by experimental trials, or simply provide training examples to learn it, meta path forms a common base for a network-based similarity search engine. In particular, under the meta path framework we define a novel similarity measure called PathSim that is able to find peer objects in the network ( e.g. , find authors in the similar field and with similar reputation), which turns out to be more meaningful in many scenarios compared with random-walk based similarity measures. In order to support fast online query processing for PathSim queries, we develop an efficient solution that partially materializes short meta paths and then concatenates them online to compute top- k results. Experiments on real data sets demonstrate the effectiveness and efficiency of our proposed paradigm.
- Published
- 2011
8. iNextCube
- Author
-
Tianyi Wu, ChengXiang Zhai, Bo Zhao, Yintao Yu, Duo Zhang, Chen Chen, Yizhou Sun, Jiawei Han, Binbin Liao, and Cindy Xide Lin
- Subjects
Information retrieval ,Data exploration ,Text database ,Computer science ,business.industry ,Online analytical processing ,General Engineering ,Database schema ,Full text search ,Probabilistic database ,computer.software_genre ,Database design ,Database testing ,Text mining ,Data model ,Database theory ,Data mining ,business ,computer ,Intelligent database ,Data administration ,Database model - Abstract
Nowadays, most business, administration, and/or scientific databases contain both structured attributes and text attributes. We call a database that consists of both multidimensional structured data and narrative text data as multidimensional text database . Searching, OLAP, and mining such databases pose many research challenges. To enhance the power of data analysis, interesting entities and relationships can be extracted from such databases to derive heterogeneous information networks, which in turn will substantially increase the power and flexibility of data exploration in such databases. Based on our previous studies on TextCube [1], TopicCube [2], and information network analysis, such as RankClus [3] and NetClus [4], we construct iNextCube , an i nformation- N etwork- e nhanced te xt Cube . In this demo, we show the power of iNextCube in the search and analysis of two multidimensional text databases: (i) a DBLP-based CS bibliographic database, and (ii) an online news database.
- Published
- 2009
9. Mining knowledge from interconnected data
- Author
-
Philip S. Yu, Xifeng Yan, Yizhou Sun, and Jiawei Han
- Subjects
Set (abstract data type) ,Evolving networks ,business.industry ,Computer science ,Computer data storage ,General Engineering ,Leverage (statistics) ,Network science ,Information repository ,business ,Data science ,Meaning (linguistics) ,Network analysis - Abstract
Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semi-structured information networks. However, most people consider a database merely as a data repository that supports data storage and retrieval rather than one or a set of heterogeneous information networks that contain rich, inter-related, multi-typed data and information. Most network science researchers only study homogeneous networks, without distinguishing the different types of objects and links in the networks. In this tutorial, we view database and other interconnected data as heterogeneous information networks, and study how to leverage the rich semantic meaning of types of objects and links in the networks. We systematically introduce the technologies that can effectively and efficiently mine useful knowledge from such information networks.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.