40 results on '"Barzan A"'
Search Results
2. Big Data: Finding Frequencies of Faulty Multimedia Data
- Author
-
Barzan Abdalla, Hemn, primary, Mustafa, Nasser, additional, and Ihnaini, Baha, additional
- Published
- 2021
- Full Text
- View/download PDF
3. Energy-Efficient AI over a Virtualized Cloud Fog Network
- Author
-
Yosuf, Barzan A., primary, Mohamed, Sanaa H., additional, Alenazi, Mohammed M., additional, El-Gorashi, Taisir E. H., additional, and Elmirghani, Jaafar M. H., additional
- Published
- 2021
- Full Text
- View/download PDF
4. QuickSel: Quick Selectivity Learning with Mixture Models
- Author
-
Yongjoo Park, Shucheng Zhong, and Barzan Mozafari
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Databases (cs.DB) ,Pattern recognition ,02 engineering and technology ,Query optimization ,Mixture model ,Computer Science - Databases ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today's databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes. Since frequent scans are costly, these statistics are often stale and lead to poor selectivity estimates. As an alternative to scans, query-driven histograms have been proposed, which refine the histograms based on the actual selectivities of the observed queries. Unfortunately, these approaches are either too costly to use in practice---i.e., require an exponential number of buckets---or quickly lose their advantage as they observe more queries. In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms. Instead, it builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries). This fast refinement allows QuickSel to continuously learn from each query and yield increasingly more accurate selectivity estimates over time. Unlike query-driven histograms, QuickSel relies on a mixture model and a new optimization algorithm for training its model. Our extensive experiments on two real-world datasets confirm that, given the same target accuracy, QuickSel is 34.0x--179.4x faster than state-of-the-art query-driven histograms, including ISOMER and STHoles. Further, given the same space budget, QuickSel is 26.8%--91.8% more accurate than periodically-updated histograms and samples, respectively.
- Published
- 2020
- Full Text
- View/download PDF
5. QuickSel: Quick Selectivity Learning with Mixture Models
- Author
-
Park, Yongjoo, primary, Zhong, Shucheng, additional, and Mozafari, Barzan, additional
- Published
- 2020
- Full Text
- View/download PDF
6. Demonstration of VerdictDB, the Platform-Independent AQP System
- Author
-
Idris Hanafi, Wen He, Yongjoo Park, Barzan Mozafari, and Jacob Yatvitskiy
- Subjects
SQL ,Information retrieval ,Computer science ,Interface (Java) ,InformationSystems_DATABASEMANAGEMENT ,02 engineering and technology ,computer.software_genre ,020204 information systems ,Middleware (distributed applications) ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Kripke semantics ,Rewriting ,computer ,computer.programming_language - Abstract
We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. Unlike existing AQP systems that are tightly-integrated into a specific database, VerdictDB operates at the driver-level, acting as a middleware between users and off-the-shelf database systems. In other words, VerdictDB requires no modifications to the database internals; it simply relies on rewriting incoming queries such that the standard execution of the rewritten queries under relational semantics yields approximate answers to the original queries. VerdictDB exploits a novel technique for error estimation called variational subsampling, which is amenable to efficient computation via SQL. In this demonstration, we showcase VerdictDB's performance benefits (up to two orders of magnitude) compared to the queries that are issued directly to existing query engines. We also illustrate that the approximate answers returned by VerdictDB are nearly identical to the exact answers. We use Apache Spark SQL and Amazon Redshift as two examples of modern distributed query platforms. We allow the audience to explore VerdictDB using a web-based interface (e.g., Hue or Apache Zeppelin) to issue queries and visualize their answers. VerdictDB is currently open-sourced and available under Apache License (V2).
- Published
- 2018
- Full Text
- View/download PDF
7. Distributed Lock Management with RDMA
- Author
-
Dong Young Yoon, Barzan Mozafari, and Mosharaf Chowdhury
- Subjects
Remote direct memory access ,Lamport's bakery algorithm ,Computer science ,business.industry ,InfiniBand ,Distributed lock manager ,020206 networking & telecommunications ,02 engineering and technology ,Lock (computer science) ,Scheduling (computing) ,Serializability ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Latency (engineering) ,business ,Queue ,Computer network - Abstract
Lock managers are a crucial component of modern distributed systems. However, with the increasing availability of fast RDMA-enabled networks, traditional lock managers can no longer keep up with the latency and throughput requirements of modern systems. Centralized lock managers can ensure fairness and prevent starvation using global knowledge of the system, but are themselves single points of contention and failure. Consequently, they fall short in leveraging the full potential of RDMA networks. On the other hand, decentralized (RDMA-based) lock managers either completely sacrifice global knowledge to achieve higher throughput at the risk of starvation and higher tail latencies, or they resort to costly communications in order to maintain global knowledge, which can result in significantly lower throughput. In this paper, we show that it is possible for a lock manager to be fully decentralized and yet exchange the partial knowledge necessary for preventing starvation and thereby reducing tail latencies. Our main observation is that we can design a lock manager primarily using RDMA's fetch-and-add (FA) operations, which always succeed, rather than compare-and-swap (CAS) operations, which only succeed if a given condition is satisfied. While this requires us to rethink the locking mechanism from the ground up, it enables us to sidestep the performance drawbacks of the previous CAS-based proposals that relied solely on blind retries upon lock conflicts. Specifically, we present DSLR (Decentralized and Starvation-free Lock management with RDMA), a decentralized lock manager that targets distributed systems running on RDMA-enabled networks. We demonstrate that, despite being fully decentralized, DSLR prevents starvation and blind retries by guaranteeing first-come-first-serve (FCFS) scheduling without maintaining explicit queues. We adapt Lamport's bakery algorithm to an RDMA-enabled environment with multiple bakers, utilizing only one-sided READ and atomic FA operations. Our experiments show that, on average, DSLR delivers 1.8x (and up to 2.8x) higher throughput than all existing RDMA-based lock managers, while reducing their mean and 99.9% latencies by 2.0x and 18.3x (and up to 2.5x and 47x), respectively.
- Published
- 2018
- Full Text
- View/download PDF
8. BlinkML
- Author
-
Park, Yongjoo, primary, Qing, Jingyi, additional, Shen, Xiaoyang, additional, and Mozafari, Barzan, additional
- Published
- 2019
- Full Text
- View/download PDF
9. Huron: hybrid false sharing detection and repair
- Author
-
Khan, Tanvir Ahmed, primary, Zhao, Yifan, additional, Pokam, Gilles, additional, Mozafari, Barzan, additional, and Kasikci, Baris, additional
- Published
- 2019
- Full Text
- View/download PDF
10. Approximate Query Engines
- Author
-
Barzan Mozafari
- Subjects
Interface (Java) ,Computer science ,business.industry ,02 engineering and technology ,Research opportunities ,Reuse ,Data science ,Field (computer science) ,Work (electrical) ,Software deployment ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business - Abstract
Recent years have witnessed a surge of interest in Approximate Query Processing (AQP) solutions, both in academia and the commercial world. In addition to well-known open problems in this area, there are many new research challenges that have surfaced as a result of the first interaction of AQP technology with commercial and real-world customers. We categorize these into deployment, planning, and interface challenges. At the same time, AQP settings introduce many interesting opportunities that would not be possible in a database with precise answers. These opportunities create hopes for overcoming some of the major limitations of traditional database systems. For example, we discuss how a database can reuse its past work in a generic way, and become smarter as it answers new queries. Our goal in this talk is to suggest some of the exciting research directions in this field that are worth pursuing.
- Published
- 2017
- Full Text
- View/download PDF
11. A Top-Down Approach to Achieving Performance Predictability in Database Systems
- Author
-
Grant Schoenebeck, Thomas F. Wenisch, Jiamin Huang, and Barzan Mozafari
- Subjects
Source code ,Database ,Computer science ,Transaction processing ,media_common.quotation_subject ,Distributed computing ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Lock (computer science) ,Scheduling (computing) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Distributed transaction ,Online transaction processing ,Latency (engineering) ,computer ,Database transaction ,Critical path method ,media_common - Abstract
While much of the research on transaction processing has focused on improving overall performance in terms of throughput and mean latency, surprisingly less attention has been given to performance predictability: how often individual transactions exhibit execution latency far from the mean. Performance predictability is increasingly important when transactions lie on the critical path of latency-sensitive applications, enterprise software, or interactive web services. In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today's transactional databases. We conduct the first quantitative study of major sources of variance in MySQL, Postgres (two of the largest and most popular open-source products on the market), and VoltDB (a non-conventional database). We carry out our study with a tool called TProfiler that, given the source code of a database system and programmer annotations indicating the start and end of a transaction, is able to identify the dominant sources of variance in transaction latency. Based on our findings, we investigate alternative algorithms, implementations, and tuning strategies to reduce latency variance without compromising mean latency or throughput. Most notably, we propose a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), and a lazy buffer pool replacement policy. In particular, our modified MySQL exhibits significantly lower variance and 99th percentile latencies by up to 5.6× and 6.3×, respectively. Our proposal has been welcomed by the open-source community, and our VATS algorithm has already been adopted as of MySQL's 5.7.17 release (and been made the default scheduling policy in MariaDB).
- Published
- 2017
- Full Text
- View/download PDF
12. Database Learning
- Author
-
Ahmad Shahab Tajik, Yongjoo Park, Barzan Mozafari, and Michael Cafarella
- Subjects
FOS: Computer and information sciences ,SQL ,Database ,Computer Science - Artificial Intelligence ,Computer science ,media_common.quotation_subject ,Principle of maximum entropy ,InformationSystems_DATABASEMANAGEMENT ,Databases (cs.DB) ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Artificial Intelligence (cs.AI) ,Computer Science - Databases ,020204 information systems ,Reading (process) ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Raw data ,computer ,computer.programming_language ,media_common - Abstract
In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems., Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 2017
- Published
- 2017
- Full Text
- View/download PDF
13. Demonstration of VerdictDB, the Platform-Independent AQP System
- Author
-
He, Wen, primary, Park, Yongjoo, additional, Hanafi, Idris, additional, Yatvitskiy, Jacob, additional, and Mozafari, Barzan, additional
- Published
- 2018
- Full Text
- View/download PDF
14. Session details: Industry 2: Real-time Analytics
- Author
-
Mozafari, Barzan, primary
- Published
- 2018
- Full Text
- View/download PDF
15. Distributed Lock Management with RDMA
- Author
-
Yoon, Dong Young, primary, Chowdhury, Mosharaf, additional, and Mozafari, Barzan, additional
- Published
- 2018
- Full Text
- View/download PDF
16. VerdictDB
- Author
-
Park, Yongjoo, primary, Mozafari, Barzan, additional, Sorenson, Joseph, additional, and Wang, Junhao, additional
- Published
- 2018
- Full Text
- View/download PDF
17. SnappyData
- Author
-
Soubhik Chakraborty, Neeraj Kumar, Rishitesh Mishra, Yogesh Mahajan, Barzan Mozafari, Sumedh Wale, Hemant Bhanawat, Jags Ramnarayan, Kishor Bachhav, and Sudhir Menon
- Subjects
SQL ,Database ,business.industry ,Computer science ,Online analytical processing ,Big data ,02 engineering and technology ,computer.software_genre ,In-memory database ,Analytics ,020204 information systems ,Spark (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Online transaction processing ,020201 artificial intelligence & image processing ,Use case ,business ,computer ,computer.programming_language - Abstract
In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.
- Published
- 2016
- Full Text
- View/download PDF
18. DBSherlock
- Author
-
Ning Niu, Dong Young Yoon, and Barzan Mozafari
- Subjects
Database ,Computer science ,Process (engineering) ,Volume (computing) ,InformationSystems_DATABASEMANAGEMENT ,020207 software engineering ,Database administrator ,02 engineering and technology ,computer.software_genre ,Database tuning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Online transaction processing ,Data mining ,computer - Abstract
Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet. This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.
- Published
- 2016
- Full Text
- View/download PDF
19. Database Learning
- Author
-
Park, Yongjoo, primary, Tajik, Ahmad Shahab, additional, Cafarella, Michael, additional, and Mozafari, Barzan, additional
- Published
- 2017
- Full Text
- View/download PDF
20. A Top-Down Approach to Achieving Performance Predictability in Database Systems
- Author
-
Huang, Jiamin, primary, Mozafari, Barzan, additional, Schoenebeck, Grant, additional, and Wenisch, Thomas F., additional
- Published
- 2017
- Full Text
- View/download PDF
21. Approximate Query Engines
- Author
-
Mozafari, Barzan, primary
- Published
- 2017
- Full Text
- View/download PDF
22. Statistical Analysis of Latency Through Semantic Profiling
- Author
-
Huang, Jiamin, primary, Mozafari, Barzan, additional, and Wenisch, Thomas F., additional
- Published
- 2017
- Full Text
- View/download PDF
23. BlinkDB
- Author
-
Barzan Mozafari, Henry Milner, Sameer Agarwal, Ion Stoica, Aurojit Panda, Samuel Madden, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Mozafari, Barzan, and Madden, Samuel R.
- Subjects
Set (abstract data type) ,SQL ,Adaptive optimization ,Computer science ,Node (networking) ,Bounded function ,Response time ,Sample (statistics) ,Data mining ,computer.software_genre ,Massively parallel ,computer ,computer.programming_language - Abstract
In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%., National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158), United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331))
- Published
- 2013
- Full Text
- View/download PDF
24. CliffGuard
- Author
-
Barzan Mozafari, Eugene Zhen Ye Goh, and Dong Young Yoon
- Subjects
Database ,Computer science ,Robustness (computer science) ,Materialized view ,computer.software_genre ,computer ,Database tuning - Abstract
A fundamental problem in database systems is choosing the best physical design, i.e., a small set of auxiliary structures that enable the fastest execution of future queries. Almost all commercial databases come with designer tools that create a number of indices or materialized views (together comprising the physical design) that they exploit during query processing. Existing designers are what we call nominal; that is, they assume that their input parameters are precisely known and equal to some nominal values. For instance, since future workload is often not known a priori, it is common for these tools to optimize for past workloads in hopes that future queries and data will be similar. In practice, however, these parameters are often noisy or missing. Since nominal designers do not take the influence of such uncertainties into account, they find designs that are sub-optimal and remarkably brittle. Often, as soon as the future workload deviates from the past, their overall performance falls off a cliff, leading to customer discontent and expensive redesigns. Thus, we propose a new type of database designer that is robust against parameter uncertainties, so that overall performance degrades more gracefully when future workloads deviate from the past. Users express their risk tolerance by deciding on how much nominal optimality they are willing to trade for attaining their desired level of robustness against uncertain situations. To the best of our knowledge, this paper is the first to adopt the recent breakthroughs in the theory of robust optimization to build a practical framework for solving some of the most fundamental problems in databases, replacing today's brittle designs with a principled world of robust designs that can guarantee predictable and consistent performance.
- Published
- 2015
- Full Text
- View/download PDF
25. The analytical bootstrap
- Author
-
Barzan Mozafari, Shi Gao, Carlo Zaniolo, and Kai Zeng
- Subjects
SQL ,Computer science ,business.industry ,Semantics (computer science) ,Computation ,Bootstrap aggregating ,Big data ,Sampling (statistics) ,computer.software_genre ,Measure (mathematics) ,Analytics ,Data mining ,business ,computer ,Probabilistic relational model ,computer.programming_language - Abstract
Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area of research that is now made more critical by the need for timely and cost-effective analytics over "Big Data". Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the bootstrap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead. In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on our probabilistic framework, we develop efficient algorithms to predict the distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that our method has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.
- Published
- 2014
- Full Text
- View/download PDF
26. ABS
- Author
-
Shi Gao, Kai Zeng, Carlo Zaniolo, Barzan Mozafari, and Jiaqi Gu
- Subjects
SQL ,Theoretical computer science ,business.industry ,Computer science ,Computation ,Big data ,Probabilistic logic ,Relational algebra ,Analytics ,Scalability ,business ,Probabilistic relational model ,computer ,computer.programming_language - Abstract
Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must be accompanied by reliable estimates on the quality of sample-produced approximate answers; the two main techniques used in the past for this purpose are (i) closed-form analytic error estimation, and (ii) the bootstrap method. Approach (i) is extremely efficient but lacks generality, whereas (ii) is general but suffers from high computational overhead. Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference. The ABS system models bootstrap by a probabilistic relational model, and extends relational algebra with operations on probabilistic relations to predict the distributions of the AQP results. Thus, ABS entails a very fast computation of bootstrap-based quality measures for a general class of SQL queries, which is several orders of magnitude faster than the standard simulation-based bootstrap. In this demo, we will demonstrate the generality, automaticity, and ease of use of the ABS system, and its superior performance over the traditional approaches described above.
- Published
- 2014
- Full Text
- View/download PDF
27. DBSherlock
- Author
-
Yoon, Dong Young, primary, Niu, Ning, additional, and Mozafari, Barzan, additional
- Published
- 2016
- Full Text
- View/download PDF
28. SnappyData
- Author
-
Ramnarayan, Jags, primary, Mozafari, Barzan, additional, Wale, Sumedh, additional, Menon, Sudhir, additional, Kumar, Neeraj, additional, Bhanawat, Hemant, additional, Chakraborty, Soubhik, additional, Mahajan, Yogesh, additional, Mishra, Rishitesh, additional, and Bachhav, Kishor, additional
- Published
- 2016
- Full Text
- View/download PDF
29. High-performance complex event processing over XML streams
- Author
-
Barzan Mozafari, Kai Zeng, and Carlo Zaniolo
- Subjects
Computer science ,Programming language ,computer.internet_protocol ,RSS ,Distributed computing ,Pushdown automaton ,Complex event processing ,computer.file_format ,computer.software_genre ,Field (computer science) ,Data exchange ,computer ,XML ,XPath - Abstract
Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range of applications. However, many current CEP systems focus on processing efficiently data having a simple structure, and are otherwise limited in their ability to support efficiently complex continuous queries on structured or semi-structured information. However, XML streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial records, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient implementation. XSeq is designed to take full advantage of recent advances in the field of automata on Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising efficiency (whereas the amenability to efficient implementation was not demonstrated in XPath extensions previously proposed).We illustrate XSeq's power for CEP applications through examples from different domains, and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement are obtained over the same queries executed in general-purpose XML engines.
- Published
- 2012
- Full Text
- View/download PDF
30. K*SQL
- Author
-
Kai Zeng, Barzan Mozafari, and Carlo Zaniolo
- Subjects
Data stream ,SQL ,Programming language ,computer.internet_protocol ,Computer science ,Data definition language ,Data Transformation Services ,computer.software_genre ,Null (SQL) ,SQL injection ,Query by Example ,Stored procedure ,computer ,XML ,XPath ,computer.programming_language ,Business Intelligence Markup Language - Abstract
A strong interest is emerging in SQL extensions for sequence patterns using Kleene-closure expressions. This burst of interest from both the research community and the commercial world is due to the many database and data stream applications made possible by these extensions, including financial services, RFID-based inventory management, and electronic health systems. In this demo we will present the K*SQL system that represents a major step forward in this area. K*SQL supports a more expressive language that allows for generalized Kleene-closure queries and also achieves the expressive power of the nested word model, which greatly expands the application domain to include XML queries, software trace analysis, and genomics. In this demo, we first introduce the core features of our language in expressing complex pattern queries over both relational and XML data. We overview the architecture of our unifying engine and its user-friendly interfaces. We also present several K*SQL queries from stock market, XML, software trace analysis and genomic applications.
- Published
- 2010
- Full Text
- View/download PDF
31. CliffGuard
- Author
-
Mozafari, Barzan, primary, Goh, Eugene Zhen Ye, additional, and Yoon, Dong Young, additional
- Published
- 2015
- Full Text
- View/download PDF
32. Designing an inductive data stream management system
- Author
-
Hetal Thakkar, Carlo Zaniolo, and Barzan Mozafari
- Subjects
Task (computing) ,Data stream management system ,Process (engineering) ,Data stream mining ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,Pattern matching ,Data mining ,Query language ,computer.software_genre ,Cluster analysis ,Extensibility ,computer - Abstract
There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.
- Published
- 2008
- Full Text
- View/download PDF
33. Knowing when you're wrong
- Author
-
Agarwal, Sameer, primary, Milner, Henry, additional, Kleiner, Ariel, additional, Talwalkar, Ameet, additional, Jordan, Michael, additional, Madden, Samuel, additional, Mozafari, Barzan, additional, and Stoica, Ion, additional
- Published
- 2014
- Full Text
- View/download PDF
34. The analytical bootstrap
- Author
-
Zeng, Kai, primary, Gao, Shi, additional, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
- Published
- 2014
- Full Text
- View/download PDF
35. ABS
- Author
-
Zeng, Kai, primary, Gao, Shi, additional, Gu, Jiaqi, additional, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
- Published
- 2014
- Full Text
- View/download PDF
36. Performance and resource modeling in highly-concurrent OLTP workloads
- Author
-
Mozafari, Barzan, primary, Curino, Carlo, additional, Jindal, Alekh, additional, and Madden, Samuel, additional
- Published
- 2013
- Full Text
- View/download PDF
37. BlinkDB
- Author
-
Agarwal, Sameer, primary, Mozafari, Barzan, additional, Panda, Aurojit, additional, Milner, Henry, additional, Madden, Samuel, additional, and Stoica, Ion, additional
- Published
- 2013
- Full Text
- View/download PDF
38. High-performance complex event processing over XML streams
- Author
-
Mozafari, Barzan, primary, Zeng, Kai, additional, and Zaniolo, Carlo, additional
- Published
- 2012
- Full Text
- View/download PDF
39. K*SQL
- Author
-
Mozafari, Barzan, primary, Zeng, Kai, additional, and Zaniolo, Carlo, additional
- Published
- 2010
- Full Text
- View/download PDF
40. Designing an inductive data stream management system
- Author
-
Thakkar, Hetal, primary, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.