Author: "Fan, Wenfei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fan, Wenfei"' showing total 625 results

Start Over Author "Fan, Wenfei"

625 results on '"Fan, Wenfei"'

601. Boosting graph computation with generic methods: partitioning and incrementalization

Author: Xu, Ruiqi, Fan, Wenfei, and Libkin, Leonid
Subjects: graph computation, incremental computation, graph partition
Abstract: In this thesis we develop a package of generic methods for boosting the velocity of graph computations, regarding partitioning and incrementalization. The former is to deal with the challenges of volume and velocity over big data, and make computations scalable. The latter is to speed up computations for dynamic graph analytics. On the one hand, existing partitioners handle graphs in an “one-size-fits-all” way and do not consider the applications running on the top. This results in sub-optimal performances for distributed computations. On the other hand, however, current incremental graph algorithms are ad hoc designed, which require a lot of efforts even for domain experts. Worse still, most of these algorithms lack provable guarantees for their efficiency. To this end, this thesis presents a series of generic yet effective approaches for boosting the efficiency and scalability of graph computation. Firstly, for graph partitioning, we propose an application-driven hybrid partitioning strategy that takes the top-level application into consideration. Given a graph algorithm A, the strategy learns a cost model for A as polynomial regression. We develop partitioners that given the learned cost model, refine an edge-cut or vertex-cut partition to a hybrid partition and reduce the parallel cost of A. Moreover, we extend the partitioners to handle multiple cost models of mixed workloads in one batch. Secondly, we study generic guidelines for incrementalizing graph algorithms. We identify a class of incrementalizable algorithms abstracted in a fixpoint model. We show how to deduce an incremental algorithm A∆ from such an algorithm A. Moreover, A∆ can be made bounded relative to A, i.e., its cost is determined by the size of changes to graphs and changes to the affected area that is necessarily checked by batch algorithm A. We provide generic conditions under which a deduced algorithm A∆ warrants to be correct and relatively bounded, by adopting the same logic and data structures of A, at most using timestamps as an additional auxiliary structure. Finally, we go beyond the incrementalization of generic graph algorithms, and focus on graph partitioners where the algorithms are heuristic and exact results are not required. We propose to incrementalize widely-used graph partitioners A into heuristically-bounded incremental algorithms A∆. Given graph G, updates ∆G to G and a partition A(G) of G by A, A∆ computes changes ∆O to A(G) such that (1) applying ∆O to A(G) produces a new partition of the updated graph although it may not be exactly the one derived by A, (2) it retains the same bounds on balance and cut sizes as A, and (3) ∆O is decided by ∆G alone. We show that we can deduce A∆ from both vertex-cut and edge-cut partitioners A, retaining their bounds
Published: 2021

602. Querying graphs on large-scale data

Author: Li, Yuanhao, Fan, Wenfei, and Libkin, Leonid
Subjects: Graph data management, Graph algorithms
Abstract: This doctoral thesis will present the results of my work into querying graphs on large-scale data, from both the data perspective and query perspective. We first propose a scheme to reduce large graphs into small ones. It contracts obsolete parts, stars, cliques and paths into supernodes. We then build a hierarchical scheme to further reduce the graph, under limited resources. For both the contraction scheme and the hierarchy, we show that it is generic and lossless. We show that the same contracted graph is able to support multiple query classes at the same time, no matter whether their queries are label-based or not, local or non-local. Moreover, existing algorithms for these queries can be readily adapted to compute exact answers by using the synopses when possible, and decontracting the supernodes only when necessary. Using real-life graphs, we experimentally verify the efficiency and effectiveness of our contraction schemes. Meanwhile, we propose an extension of graph patterns, referred to as conditional graph patterns (CGPs). The CGPs allows one to express several conventional queries in a conditioned one, and annotate similar patterns to compute answers for all patterns in a single enumeration. In a CGP, one can specify a simple condition on each edge such that the edge exists if and only if the condition is satisfied. We show that CGPs allow us to catch missing links, increase the expressivity of graph functional dependencies, and provide a succinct representation of graph patterns. We studies their consistency, matching, incremental matching and containment problems and settles down their complexity bounds. We develop algorithms for matching, incremental matching and parallel matching of CGPs, and for (incremental, parallel) multi-CGP matching and optimization. We empiricallyverify the efficiency and effectiveness of our algorithms on real-life and synthetic graphs.
Published: 2021

603. XPath Rewriting Using Views: The More the Merrier

Author: Cautis, Bogdan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

604. Preface of the 2nd International Workshop on XML Data Management

Author: Lu, Jiaheng, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

605. Preface to the 2nd International Workshop on Unstructured Data Management (USDM 2011)

Author: Wang, Tengjiao, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

606. Change Tracer: A Protégé Plug-In for Ontology Recovery and Visualization

Author: Khattak, Asad Masood, Latif, Khalid, Pervez, Zeeshan, Fatima, Iram, Lee, Sungyoung, Lee, Young-Koo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

607. Information Networks Mining and Analysis

Author: Yu, Philip S., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

608. On the Querying for Places on the Mobile Web

Author: Jensen, Christian S., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

609. iMiner: From Passive Searching to Active Pushing

Author: Li, Deyi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Du, Xiaoyong, editor, Fan, Wenfei, editor, Wang, Jianmin, editor, Peng, Zhiyong, editor, and Sharaf, Mohamed A., editor
Published: 2011
Full Text: View/download PDF

610. Forms-XML: Generating Form-Based User Interfaces for XML Vocabularies

Author: Kuo, Y. S., Shih, N. C., Tseng, Lendle, Hu, Hsun-Cheng, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fan, Wenfei, editor, Wu, Zhaohui, editor, and Yang, Jun, editor
Published: 2005
Full Text: View/download PDF

611. Towards effective analysis of big graphs: from scalability to quality

Author: Tian, Chao, Fan, Wenfei, Libkin, Leonid, and Engineering and Physical Sciences Research Council (EPSRC)
Subjects: linear arithmetic expressions, graph data management, graph dependencies, quality, NGDs, graph querying, subgraph isomorphism, clean graphs, scalability, graph analysis
Abstract: This thesis investigates the central issues underlying graph analysis, namely, scalability and quality. We first study the incremental problems for graph queries, which aim to compute the changes to the old query answer, in response to the updates to the input graph. The incremental problem is called bounded if its cost is decided by the sizes of the query and the changes only. No matter how desirable, however, our first results are negative: for common graph queries such as graph traversal, connectivity, keyword search and pattern matching, their incremental problems are unbounded. In light of the negative results, we propose two new characterizations for the effectiveness of incremental computation, and show that the incremental computations above can still be effectively conducted, by either reducing the computations on big graphs to small data, or incrementalizing batch algorithms by minimizing unnecessary recomputation. We next study the problems with regards to improving the quality of the graphs. To uniquely identify entities represented by vertices in a graph, we propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. As an application, we study the entity matching problem, which is to find all pairs of entities in a graph that are identified by a given set of keys. Although the problem is proved to be intractable, and cannot be parallelized in logarithmic rounds, we provide two parallel scalable algorithms for it. In addition, to catch numeric inconsistencies in real-life graphs, we extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. Indeed, NGDs strike a balance between expressivity and complexity, since if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. A localizable incremental algorithm is developed to detect errors using NGDs, where the cost is determined by small neighbors of nodes in the updates instead of the entire graph. Finally, a rule-based method to clean graphs is proposed. We extend graph entity dependencies (GEDs) as data quality rules. Given a graph, a set of GEDs and a block of ground truth, we fix violations of GEDs in the graph by combining data repairing and object identification. The method finds certain fixes to errors detected by GEDs, i.e., as long as the GEDs and the ground truth are correct, the fixes are assured correct as their logical consequences. Several fundamental results underlying the method are established, and an algorithm is developed to implement the method. We also parallelize the method and guarantee to reduce its running time with the increase of processors.
Published: 2017

612. GRAPE: Parallel Graph Query Engine

Author: Xu, Jingbo, Fan, Wenfei, and Libkin, Leonid
Subjects: graph computation, incremental evaluation, distributed system
Abstract: The need for graph computations is evident in a multitude of use cases. To support computations on large-scale graphs, several parallel systems have been developed. However, existing graph systems require users to recast algorithms into new models, which makes parallel graph computations as a privilege to experienced users only. Moreover, real world applications often require much more complex graph processing workflows than previously evaluated. In response to these challenges, the thesis presents GRAPE, a distributed graph computation system, shipped with various applications for social network analysis, social media marketing and functional dependencies on graphs. Firstly, the thesis presents the foundation of GRAPE. The principled approach of GRAPE is based on partial evaluation and incremental computation. Sequential graph algorithms can be plugged into GRAPE with minor changes, and get parallelized as a whole. The termination and correctness are guaranteed under a monotonic condition. Secondly, as an application on GRAPE, the thesis proposes graph-pattern association rules (GPARs) for social media marketing. GPARs help users discover regularities between entities in social graphs and identify potential customers by exploring social influence. The thesis studies the problem of discovering top-k diversified GPARs and the problem of identifying potential customers with GPARs. Although both are NP- hard, parallel scalable algorithms on GRAPE are developed, which guarantee a polynomial speedup over sequential algorithms with the increase of processors. Thirdly, the thesis proposes quantified graph patterns (QGPs), an extension of graph patterns by supporting simple counting quantifiers on edges. QGPs naturally express universal and existential quantification, numeric and ratio aggregates, as well as negation. The thesis proves that the matching problem of QGPs remains NP-complete in the absence of negation, and is DP-complete for general QGPs. In addition, the thesis introduces quantified graph association rules defined with QGPs, to identify potential customers in social media marketing. Finally, to address the issue of data consistency, the thesis proposes a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities. The satisfiability and implication problems for GFDs are studied and proved to be coNP-complete and NP-complete, respectively. The thesis also proves that the validation problem for GFDs is coNP- complete. The parallel algorithms developed on GRAPE verify that GFDs provide an effective approach to detecting inconsistencies in knowledge and social graphs.
Published: 2017

613. Querying big data with bounded data access

Author: Cao, Yang, Fan, Wenfei, and Libkin, Leonid
Subjects: scale independence, bounded resource, bounded evaluability, query evaluation
Abstract: Query answering over big data is cost-prohibitive. A linear scan of a dataset D may take days with a solid state device if D is of PB size and years if D is of EB size. In other words, polynomial-time (PTIME) algorithms for query evaluation are already not feasible on big data. To tackle this, we propose querying big data with bounded data access, such that the cost of query evaluation is independent of the scale of D. First of all, we propose a class of boundedly evaluable queries. A query Q is boundedly evaluable under a set A of access constraints if for any dataset D that satisfies constraints in A, there exists a subset DQ ⊆ D such that (a) Q(DQ) = Q(D), and (b) the time for identifying DQ from D, and hence the size |DQ| of DQ, are independent of |D|. That is, we can compute Q(D) by accessing a bounded amount of data no matter how big D grows.We study the problem of deciding whether a query is boundedly evaluable under A. It is known that the problem is undecidable for FO without access constraints. We show that, in the presence of access constraints, it is decidable in 2EXPSPACE for positive fragments of FO queries, but is already EXPSPACE-hard even for CQ. To handle the undecidability and high complexity of the analysis, we develop effective syntax for boundedly evaluable queries under A, referred to as queries covered by A, such that, (a) any boundedly evaluable query under A is equivalent to a query covered by A, (b) each covered query is boundedly evaluable, and (c) it is efficient to decide whether Q is covered by A. On top of DBMS, we develop practical algorithms for checking whether queries are covered by A, and generating bounded plans if so. For queries that are not boundedly evaluable, we extend bounded evaluability to resource-bounded approximation and bounded query rewriting using views. (1) Resource-bounded approximation is parameterized with a resource ratio a ∈ (0,1], such that for any query Q and dataset D, it computes approximate answers with an accuracy bound h by accessing at most a|D| tuples. It is based on extended access constraints and a new accuracy measure. (2) Bounded query rewriting tackles the problem by incorporating bounded evaluability with views, such that the queries can be exactly answered by accessing cached views and a bounded amount of data in D. We study the problem of deciding whether a query has a bounded rewriting, establish its complexity bounds, and develop effective syntax for FO queries with a bounded rewriting. Finally, we extend bounded evaluability to graph pattern queries, by extending access constraints to graph data. We characterize bounded evaluability for subgraph and simulation patterns and develop practical algorithms for associated problems.
Published: 2016

614. Querying graphs with data

Author: Vrgoc, Domagoj, Libkin, Leonid, Fan, Wenfei, Santhanam, Rahul, and Engineering and Physical Sciences Research Council (EPSRC)
Subjects: big data, graph database, query language
Abstract: Graph data is becoming more and more pervasive. Indeed, services such as Social Networks or the Semantic Web can no longer rely on the traditional relational model, as its structure is somewhat too rigid for the applications they have in mind. For this reason we have seen a continuous shift towards more non-standard models. First it was the semi-structured data in the 1990s and XML in 2000s, but even such models seem to be too restrictive for new applications that require navigational properties naturally modelled by graphs. Social networks fit into the graph model by their very design: users are nodes and their connections are specified by graph edges. The W3C committee, on the other hand, describes RDF, the model underlying the Semantic Web, by using graphs. The situation is quite similar with crime detection networks and tracking workflow provenance, namely they all have graphs inbuilt into their definition. With pervasiveness of graph data the important question of querying and maintaining it has emerged as one of the main priorities, both in theoretical and applied sense. Currently there seem to be two approaches to handling such data. On the one hand, to extract the actual data, practitioners use traditional relational languages that completely disregard various navigational patterns connecting the data. What makes this data interesting in modern applications, however, is precisely its ability to compactly represent intricate topological properties that envelop the data. To overcome this issue several languages that allow querying graph topology have been proposed and extensively studied. The problem with these languages is that they concentrate on navigation only, thus disregarding the data that is actually stored in the database. What we propose in this thesis is the ability to do both. Namely, we will study how query languages can be designed to allow specifying not only how the data is connected, but also how data changes along paths and patterns connecting it. To this end we will develop several query languages and show how adding different data manipulation capabilities and different navigational features affects the complexity of main reasoning tasks. The story here is somewhat similar to the early success of the relational data model, where theoretical considerations led to a better understanding of what makes certain tasks more challenging than others. Here we aim for languages that are both efficient and capable of expressing a wide variety of queries of interest to several groups of practitioners. To do so we will analyse how different requirements affect the language at hand and at the end provide a good base of primitives whose inclusion into a language should be considered, based on the applications one has in mind. Namely, we consider how adding a specific operation, mechanism, or capability to the language affects practical tasks that such an addition plans to tackle. In the end we arrive at several languages, all of them with their pros and cons, giving us a good overview of how specific capabilities of the language affect the design goals, thus providing a sound basis for practitioners to choose from, based on their requirements.
Published: 2014

615. Improving data quality : data consistency, deduplication, currency and accuracy

Author: Yu, Wenyuan, Fan, Wenfei, and Libkin, Leonid
Subjects: Dara accuracy, Data currency, Data quality, Deduplication, Data consistency
Abstract: Data quality is one of the key problems in data management. An unprecedented amount of data has been accumulated and has become a valuable asset of an organization. The value of the data relies greatly on its quality. However, data is often dirty in real life. It may be inconsistent, duplicated, stale, inaccurate or incomplete, which can reduce its usability and increase the cost of businesses. Consequently the need for improving data quality arises, which comprises of five central issues of improving data quality, namely, data consistency, data deduplication, data currency, data accuracy and information completeness. This thesis presents the results of our work on the first four issues with regards to data consistency, deduplication, currency and accuracy. The first part of the thesis investigates incremental verifications of data consistencies in distributed data. Given a distributed database D, a set S of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. Although the problems are intractable, we show that they are bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. Such incremental algorithms are provided for both vertically and horizontally partitioned data, and we show that the algorithms are optimal. The second part of the thesis studies the interaction between record matching and data repairing. Record matching, the main technique underlying data deduplication, aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent by fixing errors in the data using constraints. These are treated as separate processes in most data cleaning systems, based on heuristic solutions. However, our studies show that repairing can effectively help us identify matches, and vice versa. To capture the interaction, a uniform framework that seamlessly unifies repairing and matching operations is proposed to clean a database based on integrity constraints, matching rules and master data. The third part of the thesis presents our study of finding certain fixes that are absolutely correct for data repairing. Data repairing methods based on integrity constraints are normally heuristic, and they may not find certain fixes. Worse still, they may even introduce new errors when attempting to repair the data, which may not work well when repairing critical data such as medical records, in which a seemingly minor error often has disastrous consequences. We propose a framework and an algorithm to find certain fixes, based on master data, a class of editing rules and user interactions. A prototype system is also developed. The fourth part of the thesis introduces inferring data currency and consistency for conflict resolution, where data currency aims to identify the current values of entities, and conflict resolution is to combine tuples that pertain to the same real-world entity into a single tuple and resolve conflicts, which is also an important issue for data deduplication. We show that data currency and consistency help each other in resolving conflicts. We study a number of associated fundamental problems, and develop an approach for conflict resolution by inferring data currency and consistency. The last part of the thesis reports our study of data accuracy on the longstanding relative accuracy problem which is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2[A], i.e., t1[A] is closer to the true value of the A attribute of e than t2[A]. We introduce a class of accuracy rules and an inference system with a chase procedure to deduce relative accuracy, and the related fundamental problems are studied. We also propose a framework and algorithms for inferring accurate values with users’ interaction.
Published: 2013

616. Graph pattern matching on social network analysis

Author: Wang, Xin, Fan, Wenfei, Geerts, Floris, and Libkin, Leonid
Subjects: graph pattern matching, social network
Abstract: Graph pattern matching is fundamental to social network analysis. Its effectiveness for identifying social communities and social positions, making recommendations and so on has been repeatedly demonstrated. However, the social network analysis raises new challenges to graph pattern matching. As real-life social graphs are typically large, it is often prohibitively expensive to conduct graph pattern matching over such large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded simulation, and quadratic time for simulation. These hinder the applicability of graph pattern matching on social network analysis. In response to these challenges, the thesis presents a series of effective techniques for querying large, dynamic, and distributively stored social networks. First of all, we propose a notion of query preserving graph compression, to compress large social graphs relative to a class Q of queries. We then develop both batch and incremental compression strategies for two commonly used pattern queries. Via both theoretical analysis and experimental studies, we show that (1) using compressed graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr as well as its maintenance can be processed efficiently. Secondly, we investigate the distributed graph pattern matching problem, and explore parallel computation for graph pattern matching. We show that our techniques possess following performance guarantees: (1) each site is visited only once; (2) the total network traffic is independent of the size of G; and (3) the response time is decided by the size of largest fragment of G rather than the size of entire G. Furthermore, we show how these distributed algorithms can be implemented in the MapReduce framework. Thirdly, we study the problem of answering graph pattern matching using views since view based techniques have proven an effective technique for speeding up query evaluation. We propose a notion of pattern containment to characterise graph pattern matching using views, and introduce efficient algorithms to answer graph pattern matching using views. Moreover, we identify three problems related to graph pattern containment, and provide efficient algorithms for containment checking (approximation when the problem is intractable). Fourthly, we revise graph pattern matching by supporting a designated output node, which we treat as “query focus”. We then introduce algorithms for computing the top-k relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively, with early termination property. Furthermore, we investigate the diversified top-k matching problem, and develop an approximation algorithm with performance guarantee and a heuristic algorithm with early termination property. Finally, we introduce an expert search system, called ExpFinder, for large and dynamic social networks. ExpFinder identifies top-k experts in social networks by graph pattern matching, and copes with the sheer size of real-life social networks by integrating incremental graph pattern matching, query preserving compression and top-k matching computation. In particular, we also introduce bounded (resp. unbounded) incremental algorithms to maintain the weighted landmark vectors which are used for incremental maintenance for cached results.
Published: 2013

617. I/O-Efficient Planar Range Skyline and Attrition Priority Queues

Author: Jeonghun Yoon, Yufei Tao, Kostas Tsichlas, Konstantinos Tsakalidis, Casper Kejlberg-Rasmussen, Hull, Richard, and Fan, Wenfei
Subjects: Skyline, FOS: Computer and information sciences, Matching (graph theory), Rank (linear algebra), Linear space, external memory, Data structure, range reporting, priority queues, Upper and lower bounds, Combinatorics, data structures, F.2.2, H.3.1, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Rectangle, Priority queue, skyline, Mathematics
Abstract: In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory., Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.2341
Published: 2013
Full Text: View/download PDF

618. A Calculus of Chemical Systems

Author: Gordon Plotkin, Tannen, Val, Wong, Limsoon, Libkin, Leonid, Fan, Wenfei, Tan, Wang-Chiew, and Fourman, Michael
Subjects: Computer science, Semantics (computer science), Principle of compositionality, MathematicsofComputing_NUMERICALANALYSIS, Ode, Petri net, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS, Computer Science::Logic in Computer Science, Ordinary differential equation, ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION, Calculus, Homomorphism, Variety (universal algebra), Calculus of communicating systems
Abstract: We present the Calculus of Chemical Systems for the modular presentation of systems of chemical equations; it is intended to be a core calculus for rule-based modelling in systems biology. The calculus is loosely modelled after Milner’s Calculus of Communicating Systems, but with communication replaced by chemical reactions. We give a variety of compositional semantics for qualitative and quantitative versions of our calculus, employing a commutative monoid semantical framework. These semantics include (qualitative and quantitative) Petri nets, transition relations, ordinary differential equations (ODEs), and stochastic matrices. Standard semantics of Petri nets, whether of transition relations, ODEs, or stochastic matrices, fit within the framework as commutative monoid homomorphisms. We give complete equational axiomatisations and normal forms for all the semantics, and full abstraction results for the ODE and stochastic semantics. Definability can be characterised in some cases, as was already known for ODEs; other cases, including the stochastic one, remain open.
Published: 2013

619. Leveraging Wikipedia concept and category information to enhance contextual advertising

Author: Zhiwen Hu, Guandong Xu, Yanchun Zhang, Jianfeng Lu, Zongda Wu, Rong Pan, Berendt, Bettina, Vries, Arjen de, Fan, Wenfei, Macdonald, Craig, Ounis, Iadh, and Ruthven, Ian
Subjects: Information retrieval, Computer science, Web page, Similarity (psychology), Metric (mathematics), Selection (linguistics), Contextual advertising, Similarity measure, contextual advertising, similarity measure, wikipedia
Abstract: As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant ads into a Web page, so as to increase the number of ad-clicks. However, some problems of homonymy and polysemy, low intersection of keywords etc., can lead to the selection of irrelevant ads for a page. In this paper, we present a new contextual advertising approach to overcome the problems, which uses Wikipedia concept and category information to enrich the content representation of an ad (or a page). First, we map each ad and page into a keyword vector, a concept vector and a category vector. Next, we select the relevant ads for a given page based on a similarity metric that combines the above three feature vectors together. Last, we evaluate our approach by using real ads, pages, as well as a great number of concepts and categories of Wikipedia. Experimental results show that our approach can improve the precision of ads-selection effectively. © 2011 ACM.
Published: 2011

620. Proceedings of the 20th ACM International Conference on Information and Knowledge Management

Author: Berendt, Bettina, de Vries, Arjen, Fan, Wenfei, Macdonald, Craig, Ounis, Iadh, and Ruthven, Ian
Abstract: status: published
Published: 2011

621. Hybrid models for future event prediction

Author: Roi Blanco, Giuseppe Amodeo, Ulf Brefeld, Berendt, Bettina, de Vries, Arjen, Fan, Wenfei, Macdonald, Craig, Ounis, Iadh, and Ruthven, Ian
Subjects: Informatics, Time series, Web searches, Computer science, Knowledge management, Machine learning, computer.software_genre, Information retrieval, Autoregressive integrated moving average, Baseline (configuration management), Event prediction, Event (probability theory), business.industry, Computer Science::Information Retrieval, Autocorrelation, Probabilistic logic, Business informatics, Statistical model, World Wide Web, Artificial intelligence, Data mining, business, Regression analysis, computer, Forecasting
Abstract: We present a hybrid method to turn off-the-shelf information retrieval (IR) systems into future event predictors. Given a query, a time series model is trained on the publication dates of the retrieved documents to capture trends and periodicity of the associated events. The periodicity of historic data is used to estimate a probabilistic model to predict future bursts. Finally, a hybrid model is obtained by intertwining the probabilistic and the time-series model. Our empirical results on the New York Times corpus show that autocorrelation functions of time-series suffice to classify queries accurately and that our hybrid models lead to more accurate future event predictions than baseline competitors. We present a hybrid method to turn off-the-shelf information retrieval (IR) systems into future event predictors. Given a query, a time series model is trained on the publication dates of the retrieved documents to capture trends and periodicity of the associated events. The periodicity of historic data is used to estimate a probabilistic model to predict future bursts. Finally, a hybrid model is obtained by intertwining the probabilistic and the time-series model. Our empirical results on the New York Times corpus show that autocorrelation functions of time-series suffice to classify queries accurately and that our hybrid models lead to more accurate future event predictions than baseline competitors.
Published: 2011

622. From Relations to XML: Cleaning, Integrating and Securing Data

Author: Jia, Xibei and Fan, Wenfei
Subjects: Laboratory for Foundations of Computer Science, Informatics, Computer Science
Abstract: While relational databases are still the preferred approach for storing data, XML is emerging as the primary standard for representing and exchanging data. Consequently, it has been increasingly important to provide a uniform XML interface to various data sources— integration; and critical to protect sensitive and confidential information in XML data — access control. Moreover, it is preferable to first detect and repair the inconsistencies in the data to avoid the propagation of errors to other data processing steps. In response to these challenges, this thesis presents an integrated framework for cleaning, integrating and securing data. The framework contains three parts. First, the data cleaning sub-framework makes use of a new class of constraints specially designed for improving data quality, referred to as conditional functional dependencies (CFDs), to detect and remove inconsistencies in relational data. Both batch and incremental techniques are developed for detecting CFD violations by SQL efficiently and repairing them based on a cost model. The cleaned relational data, together with other non-XML data, is then converted to XML format by using widely deployed XML publishing facilities. Second, the data integration sub-framework uses a novel formalism, XML integration grammars (XIGs), to integrate multi-source XML data which is either native or published from traditional databases. XIGs automatically support conformance to a target DTD, and allow one to build a large, complex integration via composition of component XIGs. To efficiently materialize the integrated data, algorithms are developed for merging XML queries in XIGs and for scheduling them. Third, to protect sensitive information in the integrated XML data, the data security sub-framework allows users to access the data only through authorized views. User queries posed on these views need to be rewritten into equivalent queries on the underlying document to avoid the prohibitive cost of materializing and maintaining large number of views. Two algorithms are proposed to support virtual XML views: a rewriting algorithm that characterizes the rewritten queries as a new form of automata and an evaluation algorithm to execute the automata-represented queries. They allow the security sub-framework to answer queries on views in linear time. Using both relational and XML technologies, this framework provides a uniform approach to clean, integrate and secure data. The algorithms and techniques in the framework have been implemented and the experimental study verifies their effectiveness and efficiency.
Published: 2008

623. Making graphs compact by lossless contraction.

Author: Fan W, Li Y, Liu M, and Lu C
Abstract: This paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts and regular structures into supernodes. The supernodes carry a synopsis S Q for each query class Q in use, to abstract key features of the contracted parts for answering queries of Q . Moreover, for various types of graphs, we identify regular structures to contract. The contraction scheme provides a compact graph representation and prioritizes up-to-date data. Better still, it is generic and lossless. We show that the same contracted graph is able to support multiple query classes at the same time, no matter whether their queries are label based or not, local or non-local. Moreover, existing algorithms for these queries can be readily adapted to compute exact answers by using the synopses when possible and decontracting the supernodes only when necessary. As a proof of concept, we show how to adapt existing algorithms for subgraph isomorphism, triangle counting, shortest distance, connected component and clique decision to contracted graphs. We also provide a bounded incremental contraction algorithm in response to updates, such that its cost is determined by the size of areas affected by the updates alone, not by the entire graphs. We experimentally verify that on average, the contraction scheme reduces graphs by 71.9% and improves the evaluation of these queries by 1.69, 1.44, 1.47, 2.24 and 1.37 times, respectively., (© The Author(s) 2022.)
Published: 2023
Full Text: View/download PDF

624. Development of a Novel Highly Spontaneous Metastatic Model of Esophageal Squamous Cell Carcinoma Using Renal Capsule Technology.

Author: Gao P, Liu H, Yang Z, Hui Y, Shi Z, Yang Z, Song M, Yao M, Fan W, Yang J, Hao Y, and Fan T
Abstract: Purpose: Increasing evidence has demonstrated that animal models are imperative to investigate the potential molecular mechanism of metastasis and discover anti-metastasis drugs; however, efficient animal models to unveil the underlying mechanisms of metastasis in esophageal squamous cell carcinoma (ESCC) are limited., Methods: ESCC cell EC9706 with high invasiveness was screened by repeated Transwell assays. Its biological characteristics were identified by flow cytometry as well as by the wound healing and CCK-8 assays. Besides, the levels of epithelial-mesenchymal transition-related markers were examined using Western blotting. Parental (EC9706-I 0 ) and subpopulation (EC9706-I 3 ) cells were employed to establish the renal capsule model. Next, the tumor growth was detected by a live animal imaging system, and hematoxylin and eosin staining was applied to evaluate the metastatic status in ESCC., Results: EC9706-I 3 cells showed rapid proliferation ability, S phase abundance, and high invasive ability; obvious upregulation in N-cadherin, Snail, Vimentin, and Bit1; and downregulation in E-cadherin. EC9706-I 3 cells were less sensitive to the chemotherapy drug 5-fluorouracil than EC9706-I 0 cells; however, both cell lines reached a tumorigenesis rate of 100% in the renal capsule model. The live animal imaging system revealed that the tumors derived from EC9706-I 0 cells grew more slowly than those from EC9706-I 3 cells at weeks 3-14. The EC9706-I 3 xenograft model displayed a spontaneous metastatic site, including kidney, heart, liver, lung, pancreas, and spleen, with a distant metastatic rate of 80%., Conclusion: Our data suggested that the metastatic model was successfully established, providing a novel platform for further exploring the molecular mechanisms of metastasis in ESCC patients., Competing Interests: The authors declared no competing conflicts of interest with respect to the study, authorship, and/or publication of this article., (© 2021 Gao et al.)
Published: 2021
Full Text: View/download PDF

625. Making big data small.

Author: Fan W
Abstract: Big data analytics is often prohibitively costly and is typically conducted by parallel processing with a cluster of machines. Is big data analytics beyond the reach of small companies that can only afford limited resources? This paper tackles this question by presenting Boundedly EvAlable SQL (BEAS), a system for querying big relations with constrained resources. The idea is to make big data small. To answer a query posed on a dataset, it often suffices to access a small fraction of the data no matter how big the dataset is. In the light of this, BEAS answers queries on big data by identifying and fetching a small set of the data needed. Under available resources, it computes exact answers whenever possible and otherwise approximate answers with accuracy guarantees. Underlying BEAS are principled approaches of bounded evaluation and data-driven approximation, the focus of this paper., Competing Interests: I declare I have no competing interests.
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

625 results on '"Fan, Wenfei"'

601. Boosting graph computation with generic methods: partitioning and incrementalization

602. Querying graphs on large-scale data

603. XPath Rewriting Using Views: The More the Merrier

604. Preface of the 2nd International Workshop on XML Data Management

605. Preface to the 2nd International Workshop on Unstructured Data Management (USDM 2011)

606. Change Tracer: A Protégé Plug-In for Ontology Recovery and Visualization

607. Information Networks Mining and Analysis

608. On the Querying for Places on the Mobile Web

609. iMiner: From Passive Searching to Active Pushing

610. Forms-XML: Generating Form-Based User Interfaces for XML Vocabularies

611. Towards effective analysis of big graphs: from scalability to quality

612. GRAPE: Parallel Graph Query Engine

613. Querying big data with bounded data access

614. Querying graphs with data

615. Improving data quality : data consistency, deduplication, currency and accuracy

616. Graph pattern matching on social network analysis

617. I/O-Efficient Planar Range Skyline and Attrition Priority Queues

618. A Calculus of Chemical Systems

619. Leveraging Wikipedia concept and category information to enhance contextual advertising

620. Proceedings of the 20th ACM International Conference on Information and Knowledge Management

621. Hybrid models for future event prediction

622. From Relations to XML: Cleaning, Integrating and Securing Data

623. Making graphs compact by lossless contraction.

624. Development of a Novel Highly Spontaneous Metastatic Model of Esophageal Squamous Cell Carcinoma Using Renal Capsule Technology.

625. Making big data small.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

625 results on '"Fan, Wenfei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources