Author: "Barzan A" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

1. Big Data: Finding Frequencies of Faulty Multimedia Data

Author: Hemn Barzan Abdalla, Nasser Mustafa, and Baha Ihnaini
Published: 2021
Full Text: View/download PDF

2. Big Data: Finding Frequencies of Faulty Multimedia Data

Author: Barzan Abdalla, Hemn, primary, Mustafa, Nasser, additional, and Ihnaini, Baha, additional
Published: 2021
Full Text: View/download PDF

3. Energy-Efficient AI over a Virtualized Cloud Fog Network

Author: Yosuf, Barzan A., primary, Mohamed, Sanaa H., additional, Alenazi, Mohammed M., additional, El-Gorashi, Taisir E. H., additional, and Elmirghani, Jaafar M. H., additional
Published: 2021
Full Text: View/download PDF

4. QuickSel: Quick Selectivity Learning with Mixture Models

Author: Yongjoo Park, Shucheng Zhong, and Barzan Mozafari
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Databases (cs.DB), Pattern recognition, 02 engineering and technology, Query optimization, Mixture model, Computer Science - Databases, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Estimating the selectivity of a query is a key step in almost any cost-based query optimizer. Most of today's databases rely on histograms or samples that are periodically refreshed by re-scanning the data as the underlying data changes. Since frequent scans are costly, these statistics are often stale and lead to poor selectivity estimates. As an alternative to scans, query-driven histograms have been proposed, which refine the histograms based on the actual selectivities of the observed queries. Unfortunately, these approaches are either too costly to use in practice---i.e., require an exponential number of buckets---or quickly lose their advantage as they observe more queries. In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms. Instead, it builds an internal model of the underlying data, which can be refined significantly faster (e.g., only 1.9 milliseconds for 300 queries). This fast refinement allows QuickSel to continuously learn from each query and yield increasingly more accurate selectivity estimates over time. Unlike query-driven histograms, QuickSel relies on a mixture model and a new optimization algorithm for training its model. Our extensive experiments on two real-world datasets confirm that, given the same target accuracy, QuickSel is 34.0x--179.4x faster than state-of-the-art query-driven histograms, including ISOMER and STHoles. Further, given the same space budget, QuickSel is 26.8%--91.8% more accurate than periodically-updated histograms and samples, respectively.
Published: 2020
Full Text: View/download PDF

5. QuickSel: Quick Selectivity Learning with Mixture Models

Author: Park, Yongjoo, primary, Zhong, Shucheng, additional, and Mozafari, Barzan, additional
Published: 2020
Full Text: View/download PDF

6. Demonstration of VerdictDB, the Platform-Independent AQP System

Author: Idris Hanafi, Wen He, Yongjoo Park, Barzan Mozafari, and Jacob Yatvitskiy
Subjects: SQL, Information retrieval, Computer science, Interface (Java), InformationSystems_DATABASEMANAGEMENT, 02 engineering and technology, computer.software_genre, 020204 information systems, Middleware (distributed applications), Spark (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Kripke semantics, Rewriting, computer, computer.programming_language
Abstract: We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. Unlike existing AQP systems that are tightly-integrated into a specific database, VerdictDB operates at the driver-level, acting as a middleware between users and off-the-shelf database systems. In other words, VerdictDB requires no modifications to the database internals; it simply relies on rewriting incoming queries such that the standard execution of the rewritten queries under relational semantics yields approximate answers to the original queries. VerdictDB exploits a novel technique for error estimation called variational subsampling, which is amenable to efficient computation via SQL. In this demonstration, we showcase VerdictDB's performance benefits (up to two orders of magnitude) compared to the queries that are issued directly to existing query engines. We also illustrate that the approximate answers returned by VerdictDB are nearly identical to the exact answers. We use Apache Spark SQL and Amazon Redshift as two examples of modern distributed query platforms. We allow the audience to explore VerdictDB using a web-based interface (e.g., Hue or Apache Zeppelin) to issue queries and visualize their answers. VerdictDB is currently open-sourced and available under Apache License (V2).
Published: 2018
Full Text: View/download PDF

7. Distributed Lock Management with RDMA

Author: Dong Young Yoon, Barzan Mozafari, and Mosharaf Chowdhury
Subjects: Remote direct memory access, Lamport's bakery algorithm, Computer science, business.industry, InfiniBand, Distributed lock manager, 020206 networking & telecommunications, 02 engineering and technology, Lock (computer science), Scheduling (computing), Serializability, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Latency (engineering), business, Queue, Computer network
Abstract: Lock managers are a crucial component of modern distributed systems. However, with the increasing availability of fast RDMA-enabled networks, traditional lock managers can no longer keep up with the latency and throughput requirements of modern systems. Centralized lock managers can ensure fairness and prevent starvation using global knowledge of the system, but are themselves single points of contention and failure. Consequently, they fall short in leveraging the full potential of RDMA networks. On the other hand, decentralized (RDMA-based) lock managers either completely sacrifice global knowledge to achieve higher throughput at the risk of starvation and higher tail latencies, or they resort to costly communications in order to maintain global knowledge, which can result in significantly lower throughput. In this paper, we show that it is possible for a lock manager to be fully decentralized and yet exchange the partial knowledge necessary for preventing starvation and thereby reducing tail latencies. Our main observation is that we can design a lock manager primarily using RDMA's fetch-and-add (FA) operations, which always succeed, rather than compare-and-swap (CAS) operations, which only succeed if a given condition is satisfied. While this requires us to rethink the locking mechanism from the ground up, it enables us to sidestep the performance drawbacks of the previous CAS-based proposals that relied solely on blind retries upon lock conflicts. Specifically, we present DSLR (Decentralized and Starvation-free Lock management with RDMA), a decentralized lock manager that targets distributed systems running on RDMA-enabled networks. We demonstrate that, despite being fully decentralized, DSLR prevents starvation and blind retries by guaranteeing first-come-first-serve (FCFS) scheduling without maintaining explicit queues. We adapt Lamport's bakery algorithm to an RDMA-enabled environment with multiple bakers, utilizing only one-sided READ and atomic FA operations. Our experiments show that, on average, DSLR delivers 1.8x (and up to 2.8x) higher throughput than all existing RDMA-based lock managers, while reducing their mean and 99.9% latencies by 2.0x and 18.3x (and up to 2.5x and 47x), respectively.
Published: 2018
Full Text: View/download PDF

8. BlinkML

Author: Park, Yongjoo, primary, Qing, Jingyi, additional, Shen, Xiaoyang, additional, and Mozafari, Barzan, additional
Published: 2019
Full Text: View/download PDF

9. Huron: hybrid false sharing detection and repair

Author: Khan, Tanvir Ahmed, primary, Zhao, Yifan, additional, Pokam, Gilles, additional, Mozafari, Barzan, additional, and Kasikci, Baris, additional
Published: 2019
Full Text: View/download PDF

10. Approximate Query Engines

Author: Barzan Mozafari
Subjects: Interface (Java), Computer science, business.industry, 02 engineering and technology, Research opportunities, Reuse, Data science, Field (computer science), Work (electrical), Software deployment, Analytics, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, business
Abstract: Recent years have witnessed a surge of interest in Approximate Query Processing (AQP) solutions, both in academia and the commercial world. In addition to well-known open problems in this area, there are many new research challenges that have surfaced as a result of the first interaction of AQP technology with commercial and real-world customers. We categorize these into deployment, planning, and interface challenges. At the same time, AQP settings introduce many interesting opportunities that would not be possible in a database with precise answers. These opportunities create hopes for overcoming some of the major limitations of traditional database systems. For example, we discuss how a database can reuse its past work in a generic way, and become smarter as it answers new queries. Our goal in this talk is to suggest some of the exciting research directions in this field that are worth pursuing.
Published: 2017
Full Text: View/download PDF

11. A Top-Down Approach to Achieving Performance Predictability in Database Systems

Author: Grant Schoenebeck, Thomas F. Wenisch, Jiamin Huang, and Barzan Mozafari
Subjects: Source code, Database, Computer science, Transaction processing, media_common.quotation_subject, Distributed computing, 020207 software engineering, 02 engineering and technology, computer.software_genre, Lock (computer science), Scheduling (computing), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Distributed transaction, Online transaction processing, Latency (engineering), computer, Database transaction, Critical path method, media_common
Abstract: While much of the research on transaction processing has focused on improving overall performance in terms of throughput and mean latency, surprisingly less attention has been given to performance predictability: how often individual transactions exhibit execution latency far from the mean. Performance predictability is increasingly important when transactions lie on the critical path of latency-sensitive applications, enterprise software, or interactive web services. In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today's transactional databases. We conduct the first quantitative study of major sources of variance in MySQL, Postgres (two of the largest and most popular open-source products on the market), and VoltDB (a non-conventional database). We carry out our study with a tool called TProfiler that, given the source code of a database system and programmer annotations indicating the start and end of a transaction, is able to identify the dominant sources of variance in transaction latency. Based on our findings, we investigate alternative algorithms, implementations, and tuning strategies to reduce latency variance without compromising mean latency or throughput. Most notably, we propose a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), and a lazy buffer pool replacement policy. In particular, our modified MySQL exhibits significantly lower variance and 99th percentile latencies by up to 5.6× and 6.3×, respectively. Our proposal has been welcomed by the open-source community, and our VATS algorithm has already been adopted as of MySQL's 5.7.17 release (and been made the default scheduling policy in MariaDB).
Published: 2017
Full Text: View/download PDF

12. Database Learning

Author: Ahmad Shahab Tajik, Yongjoo Park, Barzan Mozafari, and Michael Cafarella
Subjects: FOS: Computer and information sciences, SQL, Database, Computer Science - Artificial Intelligence, Computer science, media_common.quotation_subject, Principle of maximum entropy, InformationSystems_DATABASEMANAGEMENT, Databases (cs.DB), Context (language use), 02 engineering and technology, computer.software_genre, Artificial Intelligence (cs.AI), Computer Science - Databases, 020204 information systems, Reading (process), Spark (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Raw data, computer, computer.programming_language, media_common
Abstract: In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems., Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 2017
Published: 2017
Full Text: View/download PDF

13. Demonstration of VerdictDB, the Platform-Independent AQP System

Author: He, Wen, primary, Park, Yongjoo, additional, Hanafi, Idris, additional, Yatvitskiy, Jacob, additional, and Mozafari, Barzan, additional
Published: 2018
Full Text: View/download PDF

14. Session details: Industry 2: Real-time Analytics

Author: Mozafari, Barzan, primary
Published: 2018
Full Text: View/download PDF

15. Distributed Lock Management with RDMA

Author: Yoon, Dong Young, primary, Chowdhury, Mosharaf, additional, and Mozafari, Barzan, additional
Published: 2018
Full Text: View/download PDF

16. VerdictDB

Author: Park, Yongjoo, primary, Mozafari, Barzan, additional, Sorenson, Joseph, additional, and Wang, Junhao, additional
Published: 2018
Full Text: View/download PDF

17. SnappyData

Author: Soubhik Chakraborty, Neeraj Kumar, Rishitesh Mishra, Yogesh Mahajan, Barzan Mozafari, Sumedh Wale, Hemant Bhanawat, Jags Ramnarayan, Kishor Bachhav, and Sudhir Menon
Subjects: SQL, Database, business.industry, Computer science, Online analytical processing, Big data, 02 engineering and technology, computer.software_genre, In-memory database, Analytics, 020204 information systems, Spark (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Online transaction processing, 020201 artificial intelligence & image processing, Use case, business, computer, computer.programming_language
Abstract: In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.
Published: 2016
Full Text: View/download PDF

18. DBSherlock

Author: Ning Niu, Dong Young Yoon, and Barzan Mozafari
Subjects: Database, Computer science, Process (engineering), Volume (computing), InformationSystems_DATABASEMANAGEMENT, 020207 software engineering, Database administrator, 02 engineering and technology, computer.software_genre, Database tuning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Online transaction processing, Data mining, computer
Abstract: Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet. This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.
Published: 2016
Full Text: View/download PDF

19. Database Learning

Author: Park, Yongjoo, primary, Tajik, Ahmad Shahab, additional, Cafarella, Michael, additional, and Mozafari, Barzan, additional
Published: 2017
Full Text: View/download PDF

20. A Top-Down Approach to Achieving Performance Predictability in Database Systems

Author: Huang, Jiamin, primary, Mozafari, Barzan, additional, Schoenebeck, Grant, additional, and Wenisch, Thomas F., additional
Published: 2017
Full Text: View/download PDF

21. Approximate Query Engines

Author: Mozafari, Barzan, primary
Published: 2017
Full Text: View/download PDF

22. Statistical Analysis of Latency Through Semantic Profiling

Author: Huang, Jiamin, primary, Mozafari, Barzan, additional, and Wenisch, Thomas F., additional
Published: 2017
Full Text: View/download PDF

23. BlinkDB

Author: Barzan Mozafari, Henry Milner, Sameer Agarwal, Ion Stoica, Aurojit Panda, Samuel Madden, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science, Mozafari, Barzan, and Madden, Samuel R.
Subjects: Set (abstract data type), SQL, Adaptive optimization, Computer science, Node (networking), Bounded function, Response time, Sample (statistics), Data mining, computer.software_genre, Massively parallel, computer, computer.programming_language
Abstract: In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%., National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158), United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331))
Published: 2013
Full Text: View/download PDF

24. CliffGuard

Author: Barzan Mozafari, Eugene Zhen Ye Goh, and Dong Young Yoon
Subjects: Database, Computer science, Robustness (computer science), Materialized view, computer.software_genre, computer, Database tuning
Abstract: A fundamental problem in database systems is choosing the best physical design, i.e., a small set of auxiliary structures that enable the fastest execution of future queries. Almost all commercial databases come with designer tools that create a number of indices or materialized views (together comprising the physical design) that they exploit during query processing. Existing designers are what we call nominal; that is, they assume that their input parameters are precisely known and equal to some nominal values. For instance, since future workload is often not known a priori, it is common for these tools to optimize for past workloads in hopes that future queries and data will be similar. In practice, however, these parameters are often noisy or missing. Since nominal designers do not take the influence of such uncertainties into account, they find designs that are sub-optimal and remarkably brittle. Often, as soon as the future workload deviates from the past, their overall performance falls off a cliff, leading to customer discontent and expensive redesigns. Thus, we propose a new type of database designer that is robust against parameter uncertainties, so that overall performance degrades more gracefully when future workloads deviate from the past. Users express their risk tolerance by deciding on how much nominal optimality they are willing to trade for attaining their desired level of robustness against uncertain situations. To the best of our knowledge, this paper is the first to adopt the recent breakthroughs in the theory of robust optimization to build a practical framework for solving some of the most fundamental problems in databases, replacing today's brittle designs with a principled world of robust designs that can guarantee predictable and consistent performance.
Published: 2015
Full Text: View/download PDF

25. The analytical bootstrap

Author: Barzan Mozafari, Shi Gao, Carlo Zaniolo, and Kai Zeng
Subjects: SQL, Computer science, business.industry, Semantics (computer science), Computation, Bootstrap aggregating, Big data, Sampling (statistics), computer.software_genre, Measure (mathematics), Analytics, Data mining, business, computer, Probabilistic relational model, computer.programming_language
Abstract: Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area of research that is now made more critical by the need for timely and cost-effective analytics over "Big Data". Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the bootstrap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead. In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on our probabilistic framework, we develop efficient algorithms to predict the distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that our method has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.
Published: 2014
Full Text: View/download PDF

26. ABS

Author: Shi Gao, Kai Zeng, Carlo Zaniolo, Barzan Mozafari, and Jiaqi Gu
Subjects: SQL, Theoretical computer science, business.industry, Computer science, Computation, Big data, Probabilistic logic, Relational algebra, Analytics, Scalability, business, Probabilistic relational model, computer, computer.programming_language
Abstract: Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must be accompanied by reliable estimates on the quality of sample-produced approximate answers; the two main techniques used in the past for this purpose are (i) closed-form analytic error estimation, and (ii) the bootstrap method. Approach (i) is extremely efficient but lacks generality, whereas (ii) is general but suffers from high computational overhead. Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference. The ABS system models bootstrap by a probabilistic relational model, and extends relational algebra with operations on probabilistic relations to predict the distributions of the AQP results. Thus, ABS entails a very fast computation of bootstrap-based quality measures for a general class of SQL queries, which is several orders of magnitude faster than the standard simulation-based bootstrap. In this demo, we will demonstrate the generality, automaticity, and ease of use of the ABS system, and its superior performance over the traditional approaches described above.
Published: 2014
Full Text: View/download PDF

27. DBSherlock

Author: Yoon, Dong Young, primary, Niu, Ning, additional, and Mozafari, Barzan, additional
Published: 2016
Full Text: View/download PDF

28. SnappyData

Author: Ramnarayan, Jags, primary, Mozafari, Barzan, additional, Wale, Sumedh, additional, Menon, Sudhir, additional, Kumar, Neeraj, additional, Bhanawat, Hemant, additional, Chakraborty, Soubhik, additional, Mahajan, Yogesh, additional, Mishra, Rishitesh, additional, and Bachhav, Kishor, additional
Published: 2016
Full Text: View/download PDF

29. High-performance complex event processing over XML streams

Author: Barzan Mozafari, Kai Zeng, and Carlo Zaniolo
Subjects: Computer science, Programming language, computer.internet_protocol, RSS, Distributed computing, Pushdown automaton, Complex event processing, computer.file_format, computer.software_genre, Field (computer science), Data exchange, computer, XML, XPath
Abstract: Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range of applications. However, many current CEP systems focus on processing efficiently data having a simple structure, and are otherwise limited in their ability to support efficiently complex continuous queries on structured or semi-structured information. However, XML streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial records, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient implementation. XSeq is designed to take full advantage of recent advances in the field of automata on Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising efficiency (whereas the amenability to efficient implementation was not demonstrated in XPath extensions previously proposed).We illustrate XSeq's power for CEP applications through examples from different domains, and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement are obtained over the same queries executed in general-purpose XML engines.
Published: 2012
Full Text: View/download PDF

30. K*SQL

Author: Kai Zeng, Barzan Mozafari, and Carlo Zaniolo
Subjects: Data stream, SQL, Programming language, computer.internet_protocol, Computer science, Data definition language, Data Transformation Services, computer.software_genre, Null (SQL), SQL injection, Query by Example, Stored procedure, computer, XML, XPath, computer.programming_language, Business Intelligence Markup Language
Abstract: A strong interest is emerging in SQL extensions for sequence patterns using Kleene-closure expressions. This burst of interest from both the research community and the commercial world is due to the many database and data stream applications made possible by these extensions, including financial services, RFID-based inventory management, and electronic health systems. In this demo we will present the K*SQL system that represents a major step forward in this area. K*SQL supports a more expressive language that allows for generalized Kleene-closure queries and also achieves the expressive power of the nested word model, which greatly expands the application domain to include XML queries, software trace analysis, and genomics. In this demo, we first introduce the core features of our language in expressing complex pattern queries over both relational and XML data. We overview the architecture of our unifying engine and its user-friendly interfaces. We also present several K*SQL queries from stock market, XML, software trace analysis and genomic applications.
Published: 2010
Full Text: View/download PDF

31. CliffGuard

Author: Mozafari, Barzan, primary, Goh, Eugene Zhen Ye, additional, and Yoon, Dong Young, additional
Published: 2015
Full Text: View/download PDF

32. Designing an inductive data stream management system

Author: Hetal Thakkar, Carlo Zaniolo, and Barzan Mozafari
Subjects: Task (computing), Data stream management system, Process (engineering), Data stream mining, Computer science, InformationSystems_DATABASEMANAGEMENT, Pattern matching, Data mining, Query language, computer.software_genre, Cluster analysis, Extensibility, computer
Abstract: There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.
Published: 2008
Full Text: View/download PDF

33. Knowing when you're wrong

Author: Agarwal, Sameer, primary, Milner, Henry, additional, Kleiner, Ariel, additional, Talwalkar, Ameet, additional, Jordan, Michael, additional, Madden, Samuel, additional, Mozafari, Barzan, additional, and Stoica, Ion, additional
Published: 2014
Full Text: View/download PDF

34. The analytical bootstrap

Author: Zeng, Kai, primary, Gao, Shi, additional, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
Published: 2014
Full Text: View/download PDF

35. ABS

Author: Zeng, Kai, primary, Gao, Shi, additional, Gu, Jiaqi, additional, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
Published: 2014
Full Text: View/download PDF

36. Performance and resource modeling in highly-concurrent OLTP workloads

Author: Mozafari, Barzan, primary, Curino, Carlo, additional, Jindal, Alekh, additional, and Madden, Samuel, additional
Published: 2013
Full Text: View/download PDF

37. BlinkDB

Author: Agarwal, Sameer, primary, Mozafari, Barzan, additional, Panda, Aurojit, additional, Milner, Henry, additional, Madden, Samuel, additional, and Stoica, Ion, additional
Published: 2013
Full Text: View/download PDF

38. High-performance complex event processing over XML streams

Author: Mozafari, Barzan, primary, Zeng, Kai, additional, and Zaniolo, Carlo, additional
Published: 2012
Full Text: View/download PDF

39. K*SQL

Author: Mozafari, Barzan, primary, Zeng, Kai, additional, and Zaniolo, Carlo, additional
Published: 2010
Full Text: View/download PDF

40. Designing an inductive data stream management system

Author: Thakkar, Hetal, primary, Mozafari, Barzan, additional, and Zaniolo, Carlo, additional
Published: 2008
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

40 results on '"Barzan A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources