411 results on '"Han Wook-Shin"'
Search Results
402. Efficient Streaming Detection of Hidden Clusters in Big Data Using Subspace Stream Clustering
- Author
-
Hassani, Marwan, Seidl, Thomas, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Han, Wook-Shin, editor, Lee, Mong Li, editor, Muliantara, Agus, editor, Sanjaya, Ngurah Agus, editor, Thalheim, Bernhard, editor, and Zhou, Shuigeng, editor
- Published
- 2014
- Full Text
- View/download PDF
403. Vertical Bit-Packing: Optimizing Operations on Bit-Packed Vectors Leveraging SIMD Instructions
- Author
-
Faust, Martin, Grund, Martin, Berning, Tim, Schwalb, David, Plattner, Hasso, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Han, Wook-Shin, editor, Lee, Mong Li, editor, Muliantara, Agus, editor, Sanjaya, Ngurah Agus, editor, Thalheim, Bernhard, editor, and Zhou, Shuigeng, editor
- Published
- 2014
- Full Text
- View/download PDF
404. Scalable and parallelizable influence maximization with Random Walk Ranking and Rank Merge Pruning.
- Author
-
Kim, Seungkeol, Kim, Dongeun, Oh, Jinoh, Hwang, Jeong-Hyon, Han, Wook-Shin, Chen, Wei, and Yu, Hwanjo
- Subjects
- *
RANDOM walks , *VIRAL marketing , *ONLINE social networks , *GRAPH theory , *PARALLEL processing - Abstract
As social networking services become a large part of modern life, interest in applications using social networks has rapidly increased. One interesting application is viral marketing , which can be formulated in graph theory as the influence maximization problem. Specifically, the goal of the influence maximization problem is to find a set of k nodes(corresponding to individuals in social network) whose influence spread is maximum. Several methods have been proposed to tackle this problem but to select the k most influential nodes, they suffer from the high computational cost of approximating the influence spread of every individual node. In this paper, we propose an effective pruning method for the influence maximization problem based on Random Walk and Rank Merge. The key idea is to efficiently find and prune out uninfluential nodes in order to dramatically reduce the amount of computation for evaluating influence spread. Our experimental results demonstrate the efficiency of the proposed method compared to previous state-of-the-art methods. Additionally, our method is easily parallelizable, resulting in further speed up. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
405. Influence maximization based on reachability sketches in dynamic graphs.
- Author
-
Kim, Dongeun, Hyeon, Dongmin, Oh, Jinoh, Han, Wook-Shin, and Yu, Hwanjo
- Subjects
- *
GRAPH theory , *ONLINE social networks , *VIRAL marketing , *ALGORITHMS , *PROBLEM solving - Abstract
Influence maximization is the problem of selecting the most influential nodes in a given graph. The problem is applicable to viral marketing and is actively researched as social networks become the media of information propagation. To solve the challenges of influence maximization, previous works approximate the influence evaluations to reduce the running time and to simultaneously guarantee the quality of those evaluations. We propose a new influence maximization algorithm that overcomes the limitations of the state of the art algorithms. We also devise our algorithm to process update operations of dynamic graphs. Our algorithm outperforms the state of the art algorithms TIM + and SKIM in running time, and its influence spread is also comparable to the others. Our experiments show that processing update operations is faster than executing baselines each time. Additional experiments with synthetic graphs show that the process preserves the quality of influence spread. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
406. DB-IR integration using tight-coupling in the Odysseus DBMS.
- Author
-
Whang, Kyu-Young, Lee, Jae-Gil, Lee, Min-Jae, Han, Wook-Shin, Kim, Min-Soo, and Kim, Jun-Sung
- Subjects
- *
DATA integration , *DATABASES , *INFORMATION retrieval , *SYSTEMS design , *QUERY (Information retrieval system) - Abstract
As many recent applications require integration of structured data and text data, unifying database (DB) and information retrieval (IR) technologies has become one of major challenges in our field. There have been active discussions on the system architecture for DB-IR integration, but a clear agreement has not been reached yet. Along this direction, we have advocated the use of the tight-coupling architecture and developed a novel structure of the IR index as well as tightly-coupled query processing algorithms. In tight-coupling, the text data type is supported from the storage system just like a built-in data type so that the query processor can efficiently handle queries involving both structured data and text data. In this paper, for archival purposes, we consolidate our achievements reported at non-regular publications over the last ten years or so, extending them by adding greater details on the IR index and the query processing algorithms. All the features in this paper are fully implemented in the Odysseus DBMS that has been under development at KAIST for over 23 years. We show that Odysseus significantly outperforms two open-source DBMSs and one open-source search engine (with some exceptional cases) in processing DB-IR integration queries. These results indeed demonstrate superiority of the tight-coupling architecture for DB-IR integration. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
407. Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance.
- Author
-
Whang, Kyu-Young, Lee, Jae-Gil, Kim, Min-Soo, Lee, Min-Jae, Lee, Ki-Hoon, Han, Wook-Shin, and Kim, Jun-Sung
- Subjects
- *
GEOGRAPHIC information systems , *DATABASE management , *INFORMATION storage & retrieval systems , *GEODATABASES , *DATABASE management software - Abstract
Conventional object-relational database management system (ORDBMS) vendors provide extension mechanisms for adding user-defined types and functions to their own DBMSs. Here, the extension mechanisms are implemented using a high-level (typically, SQL-level) interface. We call this mechanism loose-coupling. The advantage of loose-coupling is that it is easy to implement. However, it is not preferable for implementing new data types and operations in large databases when high performance is required. We have earlier proposed the tight-coupling architecture (Whang et al. , ) to satisfy this requirement. In tight-coupling, new data types and operations are integrated into the core of the DBMS engine in the extensible type layer. Thus, they are supported in a consistent manner with high performance. This tight-coupling architecture is being used to incorporate information retrieval features and spatial database features into the Odysseus ORDBMS that has been under development at KAIST/AITrc for 19 years. In this paper, we introduce the tightly-coupled spatial database features of Odysseus/OpenGIS. By taking advantage of tight-coupling, Odysseus/OpenGIS provides excellent performance in processing spatial queries as well as flexible concurrency control and recovery on spatial data. We show the performance through extensive experiments. Finally, we present sample applications of a geographical information system (GIS) implemented using Odysseus/OpenGIS. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
408. An efficient algorithm for updating regular expression indexes in RDF databases.
- Author
-
Lee J, Kasperovics R, Han WS, Lee JH, Kim MS, and Cho H
- Subjects
- Programming Languages, Semantics, Terminology as Topic, Algorithms, Biological Ontologies, Database Management Systems, Databases, Factual, Information Storage and Retrieval methods, Natural Language Processing
- Abstract
The Resource Description Framework (RDF) is widely used for sharing biomedical data, such as gene ontology or the online protein database UniProt. SPARQL is a native query language for RDF, featuring regular expressions in queries for which exact values are either irrelevant or unknown. The use of regular expression indexes in SPARQL query processing improves the performance of queries containing regular expressions by up to two orders of magnitude. In this study, we address the update operation for regular expression indexes in RDF databases. We identify major performance problems of straightforward index update algorithms and propose a new algorithm that utilises unique properties of regular expression indexes to increase performance. Our contributions can be summarised as follows: (1) we propose an efficient update algorithm for regular expression indexes in RDF databases, (2) we build a prototype system for the proposed algorithm in C++ and (3) we conduct extensive experiments demonstrating the improvement of our algorithm over the straightforward approaches by an order of magnitude.
- Published
- 2015
- Full Text
- View/download PDF
409. Developing a hybrid dictionary-based bio-entity recognition technique.
- Author
-
Song M, Yu H, and Han WS
- Subjects
- Data Mining methods, Medical Informatics methods, Vocabulary, Controlled
- Abstract
Background: Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques., Methods: This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance., Results: The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure., Conclusions: The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.
- Published
- 2015
- Full Text
- View/download PDF
410. Processing SPARQL queries with regular expressions in RDF databases.
- Author
-
Lee J, Pham MD, Lee J, Han WS, Cho H, Yu H, and Lee JH
- Subjects
- Algorithms, Internet, Knowledge Bases, Programming Languages, Semantics, Computational Biology methods, Databases, Factual, Information Storage and Retrieval methods, Software
- Abstract
Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph., Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique., Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
- Published
- 2011
- Full Text
- View/download PDF
411. Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.
- Author
-
Yu H, Kim T, Oh J, Ko I, Kim S, and Han WS
- Subjects
- Data Interpretation, Statistical, Feedback, Reproducibility of Results, User-Computer Interface, Algorithms, Artificial Intelligence, Computational Biology methods, Database Management Systems, PubMed
- Abstract
Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed., Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed., Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.