Author: "Per-Ake Larson" - Searchworks@Jio Institute Digital Library Search Results

1. Near Data Processing in Taurus Database

Author: Shu Lin, Arunprasad P. Marathe, Per-Ake Larson, Chong Chen, Calvin Sun, Paul Lee, Weidong Yu, Jianwei Li, Juncai Meng, Roulin Lin, Xiaoyang Chenxi, and Qingping Zhuxii
Published: 2022
Full Text: View/download PDF

2. Qd-tree: Learning Data Layouts for Big Data Analytics

Author: Chi Wang, Rajeev Acharya, Zongheng Yang, Badrish Chandramouli, Umar Farooq Minhas, Yinan Li, Per-Ake Larson, Johannes Gehrke, and Donald Kossmann
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, 021103 operations research, business.industry, Computer science, Online analytical processing, Search engine indexing, Hash function, Big data, 0211 other engineering and technologies, Databases (cs.DB), 02 engineering and technology, computer.software_genre, Partition (database), Machine Learning (cs.LG), Computer Science - Databases, Computer Science - Data Structures and Algorithms, Data analysis, Reinforcement learning, Data Structures and Algorithms (cs.DS), Data mining, business, computer
Abstract: Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected fields. For a given workload, however, such techniques are unable to optimize for the important metric of the number of blocks accessed by a query. This metric directly relates to the I/O cost, and therefore performance, of most analytical queries. Further, they are unable to exploit additional available storage to drive this metric down further. In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. Experiments over benchmark and real workloads show that a qd-tree can provide physical speedups of more than an order of magnitude compared to current blocking schemes, and can reach within 2X of the lower bound for data skipping based on selectivity, while providing complete semantic descriptions of created blocks., ACM SIGMOD 2020
Published: 2020
Full Text: View/download PDF

3. Bztree

Author: Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson
Subjects: General Engineering
Published: 2018
Full Text: View/download PDF

4. Easy Lock-Free Indexing in Non-Volatile Memory

Author: Per-Ake Larson, Tianzheng Wang, and Justin J. Levandoski
Subjects: 010302 applied physics, Skip list, Computer science, business.industry, Concurrency, 02 engineering and technology, 01 natural sciences, Non-volatile memory, 020204 information systems, Embedded system, 0103 physical sciences, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Non-blocking algorithm, Overhead (computing), Non-volatile random-access memory, business, Auxiliary memory, Dram
Abstract: Large non-volatile memories (NVRAM) will change the durability and recovery mechanisms of main-memory database systems. Today, these systems make operations durable through logging and checkpointing to secondary storage, and recover by rebuilding the in-memory database (records and indexes) from on-disk state. A main-memory database stored in NVRAM, however, can potentially recover instantly after a power failure. Modern main-memory databases typically use lock-free index structures to enable a high degree of concurrency. Thus NVRAM-resident databases need indexes that are both lock-free, persistent, and able to recover (almost) instantly after a crash. In this paper, we show how to easily build such index structures. A key enabling component of our scheme is a multi-word compare-and-swap operation, PMwCAS, that is lock-free, persistent, and efficient. PMwCAS significantly reduces the complexity of building lock-free indexes, which we illustrate by implementing both doubly-linked skip lists and the Bw-tree lock-free B+-tree for NVRAM. Experimental results show that PMwCAS's runtime overhead is very low (~4–6% under realistic workloads). This overhead is sufficiently low that the same implementation can be used for both DRAM and NVRAM resident indexes.
Published: 2018
Full Text: View/download PDF

6. Trekking through Siberia

Author: Ahmed Eldawy, Justin J. Levandoski, and Per-Ake Larson
Subjects: Hardware_MEMORYSTRUCTURES, Database, Computer science, Interface (computing), General Engineering, Cold storage, ComputerApplications_COMPUTERSINOTHERSYSTEMS, computer.software_genre, Data access, Online transaction processing, Throughput (business), computer, Auxiliary memory, Database engine
Abstract: Main memories are becoming sufficiently large that most OLTP databases can be stored entirely in main memory, but this may not be the best solution. OLTP workloads typically exhibit skewed access patterns where some records are hot (frequently accessed) but many records are cold (infrequently or never accessed). It is still more economical to store the coldest records on secondary storage such as flash. This paper introduces Siberia, a framework for managing cold data in the Microsoft Hekaton main-memory database engine. We discuss how to migrate cold data to secondary storage while providing an interface to the user to manipulate both hot and cold data that hides the actual data location. We describe how queries of different isolation levels can read and modify data stored in both hot and cold stores without restriction while minimizing number of accesses to cold storage. We also show how records can be migrated between hot and cold stores while the DBMS is online and active. Experiments reveal that for cold data access rates appropriate for main-memory optimized databases, we incur an acceptable 7-14% throughput loss.
Published: 2014
Full Text: View/download PDF

7. Adaptive range filters for cold data

Author: Karolina Alexiou, Per-Ake Larson, and Donald Kossmann
Subjects: Range query (data structures), Computer science, General Engineering, TRIPS architecture, Bloom filter, Data mining, Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING, Data structure, computer.software_genre, computer
Abstract: Bloom filters are a great technique to test whether a key is not in a set of keys. This paper presents a novel data structure called ARF. In a nutshell, ARFs are for range queries what Bloom filters are for point queries. That is, an ARF can determine whether a set of keys does not contain any keys that are part of a specific range. This paper describes the principles and methods for efficient implementation of ARFs and presents the results of comprehensive experiments that assess the precision, space, and latency of ARFs. Furthermore, this paper shows how ARFs can be applied to a commercial database system that partitions data into hot and cold regions to optimize queries that involve only hot data.
Published: 2013
Full Text: View/download PDF

8. Similarity queries: their conceptual evaluation, transformations, and processing

Author: Per-Ake Larson, Spencer S. Pearson, Yasin N. Silva, Mohamed Ali, and Walid G. Aref
Subjects: Set (abstract data type), Identification (information), Information retrieval, Semantic similarity, Similarity (network science), Hardware and Architecture, Computer science, Similarity heuristic, Selection (linguistics), Semantics, Query optimization, Information Systems
Abstract: Many application scenarios can significantly benefit from the identification and processing of similarities in the data. Even though some work has been done to extend the semantics of some operators, for example join and selection, to be aware of data similarities, there has not been much study on the role and implementation of similarity-aware operations as first-class database operators. Furthermore, very little work has addressed the problem of evaluating and optimizing queries that combine several similarity operations. The focus of this paper is the study of similarity queries that contain one or multiple first-class similarity database operators such as Similarity Selection, Similarity Join, and Similarity Group-by. Particularly, we analyze the implementation techniques of several similarity operators, introduce a consistent and comprehensive conceptual evaluation model for similarity queries, and present a rich set of transformation rules to extend cost-based query optimization to the case of similarity queries.
Published: 2012
Full Text: View/download PDF

9. SCOPE: parallel databases meet MapReduce

Author: Darren A. Shakib, Ming-Chuan Wu, Jingren Zhou, Ronnie Chaiken, Nicolas Bruno, and Per-Ake Larson
Subjects: Schedule, Database, Computer science, Distributed computing, Fault tolerance, Directed acyclic graph, Query optimization, computer.software_genre, Hardware and Architecture, Scripting language, Scalability, Graph (abstract data type), Explicit parallelism, Data as a service, computer, Information Systems
Abstract: Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets, such as search logs, click streams, and web graph data. For cost and performance reasons, processing is typically done on large clusters of tens of thousands of commodity machines. Such massive data analysis on large clusters presents new opportunities and challenges for developing a highly scalable and efficient distributed computation system that is easy to program and supports complex system optimization to maximize performance and reliability. In this paper, we describe a distributed computation system, Structured Computations Optimized for Parallel Execution (Scope), targeted for this type of massive data analysis. Scope combines benefits from both traditional parallel databases and MapReduce execution engines to allow easy programmability and deliver massive scalability and high performance through advanced optimization. Similar to parallel databases, the system has a SQL-like declarative scripting language with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. An optimizer is responsible for converting scripts into efficient execution plans for the distributed computation engine. A physical execution plan consists of a directed acyclic graph of vertices. Execution of the plan is orchestrated by a job manager that schedules execution on available machines and provides fault tolerance and recovery, much like MapReduce systems. Scope is being used daily for a variety of data analysis and data mining applications over tens of thousands of machines at Microsoft, powering Bing, and other online services.
Published: 2012
Full Text: View/download PDF

10. SCOPE

Author: Bob Jenkins, Darren A. Shakib, Simon Weaver, Jingren Zhou, Bill Ramsey, Per-Ake Larson, and Ronnie Chaiken
Subjects: Statement (computer science), SQL, Programming language, Computer science, General Engineering, Joins, Parallel computing, computer.software_genre, Parallel processing (DSP implementation), Scripting language, Programming paradigm, Explicit parallelism, computer, Scope (computer science), computer.programming_language
Abstract: Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of shared-nothing commodity machines. It is imperative to develop a programming model that hides the complexity of the underlying system but provides flexibility by allowing users to extend functionality to meet a variety of requirements. In this paper, we present a new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis. The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL. Data is modeled as sets of rows composed of typed columns. The select statement is retained with inner joins, outer joins, and aggregation allowed. Users can easily define their own functions and implement their own versions of operators: extractors (parsing and constructing rows from a file), processors (row-wise processing), reducers (group-wise processing), and combiners (combining rows from two inputs). SCOPE supports nesting of expressions but also allows a computation to be specified as a series of steps, in a manner often preferred by programmers. We also describe how scripts are compiled into efficient, parallel execution plans and executed on large clusters.
Published: 2008
Full Text: View/download PDF

11. Supporting views in data stream management systems

Author: Per-Ake Larson, Ahmed K. Elmagarmid, Thanaa M. Ghanem, and Walid G. Aref
Subjects: Query expansion, Theoretical computer science, Data stream management system, Computer science, Web query classification, View, Sargable, Query optimization, Query language, computer, Information Systems, RDF query language, computer.programming_language
Abstract: In relational database management systems, views supplement basic query constructs to cope with the demand for “higher-level” views of data. Moreover, in traditional query optimization, answering a query using a set of existing materialized views can yield a more efficient query execution plan. Due to their effectiveness, views are attractive to data stream management systems. In order to support views over streams, a data stream management system should employ a closed (or composable) continuous query language. A closed query language is a language in which query inputs and outputs are interpreted in the same way, hence allowing query composition. This article introduces the Synchronized SQL (or SyncSQL) query language that defines a data stream as a sequence of modify operations against a relation. SyncSQL enables query composition through the unified interpretation of query inputs and outputs. An important issue in continuous queries over data streams is the frequency by which the answer gets refreshed and the conditions that trigger the refresh. Coarser periodic refresh requirements are typically expressed as sliding windows. In this article, the sliding window approach is generalized by introducing the synchronization principle that empowers SyncSQL with a formal mechanism to express queries with arbitrary refresh conditions. After introducing the semantics and syntax, we lay the algebraic foundation for SyncSQL and propose a query-matching algorithm for deciding containment of SyncSQL expressions. Then, the article introduces the Nile-SyncSQL prototype to support SyncSQL queries. Nile-SyncSQL employs a pipelined incremental evaluation paradigm in which the query pipeline consists of a set of differential operators. A cost model is developed to estimate the cost of SyncSQL query execution pipelines and to choose the best execution plan from a set of different plans for the same query. An experimental study is conducted to evaluate the performance of Nile-SyncSQL. The experimental results illustrate the effectiveness of Nile-SyncSQL and the significant performance gains when views are enabled in data stream management systems.
Published: 2008
Full Text: View/download PDF

12. View matching for outer-join views

Author: Per-Ake Larson and Jingren Zhou
Subjects: SQL, Theoretical computer science, Matching (graph theory), Selection (relational algebra), Computer science, Materialized view, Joins, Projection (mathematics), Hardware and Architecture, computer, Algorithm, Blossom algorithm, Information Systems, computer.programming_language, Foreign key
Abstract: Prior work on computing queries from materialized views has focused on views defined by expressions consisting of selection, projection, and inner joins, with an optional aggregation on top (SPJG views). This paper provides a view matching algorithm for views that may also contain outer joins (SPOJG views). The algorithm relies on a normal form for outer-join expressions and is not based on bottom-up syntactic matching of expressions. It handles any combination of inner and outer joins, deals correctly with SQL bag semantics, and exploits not-null constraints, uniqueness constraints and foreign key constraints.
Published: 2006
Full Text: View/download PDF

13. Hash-based labeling techniques for storage scaling

Author: Per-Ake Larson, Cyrus Shahabi, and D. Yao
Subjects: Hardware and Architecture, Computer science, business.industry, Computer data storage, Hash function, Scalability, Data_FILES, Parallel computing, Load balancing (computing), business, Scaling, Double hashing, Information Systems
Abstract: Scalable storage architectures allow for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices. Assuming random placement of data objects across multiple storage devices of a storage pool, our optimization objective is to redistribute a minimum number of objects after scaling the pool. In addition, a uniform distribution, and hence a balanced load, should be ensured after redistribution. Moreover, the redistributed objects should be retrieved efficiently during the normal mode of operation: in one I/O access and with low complexity computation. To achieve this, we propose an algorithm called random disk labeling (RDL), based on double hashing, where storage can be added or removed without any increase in complexity. We compare RDL with other proposed techniques and demonstrate its effectiveness through experimentation.
Published: 2005
Full Text: View/download PDF

14. Evolutionary techniques for updating query cost models in a dynamic multidatabase environment

Author: Per-Ake Larson, Qiang Zhu, and Amira Rahal
Subjects: Scheme (programming language), Hardware and Architecture, Computer science, Evolutionary algorithm, Sample (statistics), Data mining, computer.software_genre, Query optimization, computer, Information Systems, Block (data storage), computer.programming_language
Abstract: Deriving local cost models for query optimization in a dynamic multidatabase system (MDBS) is a challenging issue. In this paper, we study how to evolve a query cost model to capture a slowly-changing dynamic MDBS environment so that the cost model is kept up-to-date all the time. Two novel evolutionary techniques, i.e., the shifting method and the block-moving method, are proposed. The former updates a cost model by taking up-to-date information from a new sample query into consideration at each step, while the latter considers a block (batch) of new sample queries at each step. The relevant issues, including derivation of recurrence updating formulas, development of efficient algorithms, analysis and comparison of complexities, and design of an integrated scheme to apply the two methods adaptively, are studied. Our theoretical and experimental results demonstrate that the proposed techniques are quite promising in maintaining accurate cost models efficiently for a slowly changing dynamic MDBS environment. Besides the application to MDBSs, the proposed techniques can also be applied to the automatic maintenance of cost models in self-managing database systems.
Published: 2004
Full Text: View/download PDF

15. Modern main-memory database systems

Author: Justin J. Levandoski and Per-Ake Larson
Subjects: Database, Computer science, business.industry, General Engineering, 02 engineering and technology, computer.software_genre, Database design, 020202 computer hardware & architecture, Concurrency control, Relational database management system, SAP HANA, Analytics, 020204 information systems, High availability, Computer data storage, 0202 electrical engineering, electronic engineering, information engineering, Database theory, business, computer
Abstract: This tutorial provides an overview of recent developments in main-memory database systems. With growing memory sizes and memory prices dropping by a factor of 10 every 5 years, data having a "primary home" in memory is now a reality. Main-memory databases eschew many of the traditional architectural tenets of relational database systems that optimized for disk-resident data. Innovative approaches to fundamental issues such as concurrency control and query processing are required to unleash the full performance potential of main-memory databases. The tutorial is focused around design issues and architectural choices that must be made when building a high performance database system optimized for main-memory: data storage and indexing, concurrency control, durability and recovery techniques, query processing and compilation, support for high availability, and ability to support hybrid transactional and analytics workloads. This will be illustrated by example solutions drawn from four state-of-the-art systems: H-Store/VoltDB, Hekaton, HyPeR, and SAP HANA. The tutorial will also cover current and future research trends.
Published: 2016
Full Text: View/download PDF

16. CLASSIFYING LOCAL QUERIES FOR GLOBAL QUERY OPTIMIZATION IN MULTIDATABASE SYSTEMS

Author: Qiang Zhu and Per-Ake Larson
Subjects: Information retrieval, Computer science, Query optimization, Query language, computer.software_genre, Multidatabase systems, Computer Science Applications, Spatial query, Query expansion, Web query classification, Sargable, Data mining, computer, Boolean conjunctive query, Information Systems
Abstract: A multidatabase system (MDBS) integrates information from multiple pre-existing local databases. A major challenge for global query optimization in an MDBS is that some required local information about local database systems such as local cost models may not be available at the global level due to local autonomy. A feasible method to tackle this challenge is to group local queries on a local database system into classes and then use the costs of sample queries from each query class to derive a cost formula for the class via regression analysis. This paper discusses the issues on how to classify local queries so that a good cost formula can be derived for each query class. Two classification approaches, i.e. bottom-up and top-down, are suggested. The relationship between these two approaches is discussed. Classification rules that can be used in the approaches are identified. Problems regarding composition and redundancy of classification rules are studied. Classification algorithms are given. To test the membership of a query in a class, an efficient algorithm based on ranks is introduced. In addition, a hybrid classification approach that combines the bottom-up and top-down ones is also suggested. Experimental results demonstrate that the suggested query classification techniques can be used to derive good local cost formulas for global query optimization in an MDBS.
Published: 2000
Full Text: View/download PDF

17. [Untitled]

Author: Per-Ake Larson and Qiang Zhu
Subjects: Information Systems and Management, Cost estimate, Computer science, InformationSystems_DATABASEMANAGEMENT, Online aggregation, Query optimization, Data structure, computer.software_genre, Spatial query, Query expansion, Hardware and Architecture, Web query classification, Sargable, Data mining, computer, Software, Information Systems
Abstract: To meet users‘ growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regression. The derived formulas can be used to estimate the cost of a query during query optimization. The relevant issues, such as query classification rules, sampling procedures, and cost model development and validation, are explored in this paper. To verify the feasibility of the method, experiments were conducted on three commercial database management systems supported in an MDBS. Experimental results demonstrate that the proposed method is quite promising in estimating local cost parameters in an MDBS.
Published: 1998
Full Text: View/download PDF

18. A Fuzzy Query Optimization Approach for Multidatabase Systems

Author: Per-Ake Larson and Qiang Zhu
Subjects: Fuzzy classification, Neuro-fuzzy, Computer science, computer.software_genre, Query optimization, Defuzzification, Fuzzy logic, Fuzzy transportation, Artificial Intelligence, Control and Systems Engineering, Fuzzy set operations, Fuzzy number, Data mining, computer, Software, Information Systems
Abstract: A crucial challenge for global query optimization in a multidatabase system (MDBS) is that some local optimization information, such as local cost parameters, may not be accurately known at the global level because of local autonomy. Traditional query optimization techniques using a crisp cost model may not be suitable for an MDBS because precise information is required. In this paper we present a new approach that performs global query optimization using a fuzzy cost model that allows fuzzy information. We suggest methods for establishing a fuzzy cost model and introduce a fuzzy optimization criterion that can be used with a fuzzy cost model. We discuss the relationship between the fuzzy optimization approach and the traditional (crisp) optimization approach and show that the former has a better chance to find a good execution strategy for a query in an MDBS environment, but its complexity may grow exponentially compared with the complexity of the later. To reduce the complexity, we suggest to use so-called k-approximate fuzzy values to approximate all fuzzy values during fuzzy query optimization. It is proven that the improved fuzzy approach has the same order of complexity as the crisp approach.
Published: 1997
Full Text: View/download PDF

19. Hekaton

Author: Cristian Diaconu, Michael James Zwilling, Ryan L. Stonecipher, Per-Ake Larson, Pravin Mittal, Erik Ismert, Craig Steven Freedman, and Nitin Verma
Subjects: Database, Computer science, Multiversion concurrency control, InformationSystems_DATABASEMANAGEMENT, computer.software_genre, Data structure, Operating system, Table (database), Online transaction processing, Stored procedure, computer, Optimistic concurrency control, Database transaction, Database engine
Abstract: Hekaton is a new database engine optimized for memory resident data and OLTP workloads. Hekaton is fully integrated into SQL Server; it is not a separate system. To take advantage of Hekaton, a user simply declares a table memory optimized. Hekaton tables are fully transactional and durable and accessed using T-SQL in the same way as regular SQL Server tables. A query can reference both Hekaton tables and regular tables and a transaction can update data in both types of tables. T-SQL stored procedures that reference only Hekaton tables can be compiled into machine code for further performance improvements. The engine is designed for high con-currency. To achieve this it uses only latch-free data structures and a new optimistic, multiversion concurrency control technique. This paper gives an overview of the design of the Hekaton engine and reports some experimental results.
Published: 2013
Full Text: View/download PDF

20. LHlf

Author: Donghui Zhang and Per-Ake Larson
Subjects: Universal hashing, Computer science, Dynamic perfect hashing, Hash function, Parallel computing, Linear hashing, 2-choice hashing, Computer Graphics and Computer-Aided Design, Hash table, K-independent hashing, Locality-sensitive hashing, Hopscotch hashing, Cuckoo hashing, Open addressing, Pearson hashing, Coalesced hashing, SUHA, Data_FILES, Feature hashing, Consistent hashing, Extendible hashing, Perfect hash function, Software, Double hashing, Tabulation hashing
Abstract: LHlf is a new hash table designed to allow very high levels of concurrency. The table is lock free and grows and shrinks auto-matically according to the number of items in the table. Insertions, lookups and deletions are never blocked. LHlf is based on linear hashing but adopts recursive split-ordering of the items within a bucket to be able to split and merge lists in a lock free manner. LHlf is as fast as the best previous lock-free design and in addition it offers stable performance, uses less space, and supports both expansions and contractions.
Published: 2012
Full Text: View/download PDF

21. High-Performance Concurrency Control Mechanisms for Main-Memory Databases

Author: Spyros Blanas, Craig Steven Freedman, Cristian Diaconu, Jignesh M. Patel, Per-Ake Larson, and Michael James Zwilling
Subjects: FOS: Computer and information sciences, Non-lock concurrency control, Computer science, Distributed computing, Multiversion concurrency control, Commit, computer.software_genre, Concurrency control, Computer Science - Databases, Distributed transaction, Isolation (database systems), Atomicity, Schedule (computer science), Database, Compensating transaction, Distributed concurrency control, General Engineering, Databases (cs.DB), Timestamp-based concurrency control, Serializability, Two-phase locking, Online transaction processing, Snapshot isolation, Database transaction, computer, Optimistic concurrency control, Rollback
Abstract: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure correct serialization ordering. We also implemented a main-memory optimized version of single-version locking. Experimental results show that while single-version locking works well when transactions are short and contention is low performance degrades under more demanding conditions. The multiversion schemes have higher overhead but are much less sensitive to hotspots and the presence of long-running transactions., VLDB2012
Published: 2011

22. SQL server column store indexes

Author: Eric N. Hanson, Cipri Clinciu, Artem A. Oks, Aleksandras Surna, Susan Price, Srikumar Rangarajan, Qingqing Zhou, and Per-Ake Larson
Subjects: Web search query, Database, Computer science, Online analytical processing, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, InformationSystems_DATABASEMANAGEMENT, computer.software_genre, Query optimization, Column (database), Data warehouse, Spatial query, Query expansion, Sargable, Query by Example, Data mining, Row, computer, computer.programming_language
Abstract: The SQL Server 11 release (code named "Denali") introduces a new data warehouse query acceleration feature based on a new index type called a column store index. The new index type combined with new query operators processing batches of rows greatly improves data warehouse query performance: in some cases by hundreds of times and routinely a tenfold speedup for a broad range of decision support queries. Column store indexes are fully integrated with the rest of the system, including query processing and optimization. This paper gives an overview of the design and implementation of column store indexes including enhancements to query processing and query optimization to take full advantage of the new indexes. The resulting performance improvements are illustrated by a number of example queries.
Published: 2011
Full Text: View/download PDF

23. SimDB

Author: Yasin N. Silva, Ahmed M. Aly, Per-Ake Larson, and Walid G. Aref
Subjects: Alias, Database, Computer science, View, Database schema, Online aggregation, computer.software_genre, Query optimization, Database design, Database tuning, Database testing, Information system, Database theory, computer
Abstract: The identification and processing of similarities in the data play a key role in multiple application scenarios. Several types of similarity-aware operations have been studied in the literature. However, in most of the previous work, similarity-aware operations are studied in isolation from other regular or similarity-aware operations. Furthermore, most of the previous research in the area considers a standalone implementation, i.e., without any integration with a database system. In this demonstration we present SimDB, a similarity-aware database management system. SimDB supports multiple similarity-aware operations as first-class database operators. We describe the architectural changes to implement the similarity-aware operators. In particular, we present the way conventional operators' implementation machinery is extended to support similarity-aware operators. We also show how these operators interact with other similarity-aware and regular operators. In particular, we show the effectiveness of multiple equivalence rules that can be used to extend cost-based query optimization to the case of similarity-ware operations.
Published: 2010
Full Text: View/download PDF

24. Incorporating partitioning and parallel plans into the SCOPE optimizer

Author: Per-Ake Larson, Ronnie Chaiken, and Jingren Zhou
Subjects: SQL, Scope (project management), Relational database, Computer science, Distributed computing, Sorting, Query optimization, computer.software_genre, Relational operator, Parallel processing (DSP implementation), Scripting language, Data mining, computer, computer.programming_language
Abstract: Massive data analysis on large clusters presents new opportunities and challenges for query optimization. Data partitioning is crucial to performance in this environment. However, data repartitioning is a very expensive operation so minimizing the number of such operations can yield very significant performance improvements. A query optimizer for this environment must therefore be able to reason about data partitioning including its interaction with sorting and grouping. SCOPE is a SQL-like scripting language used at Microsoft for massive data analysis. A transformation-based optimizer is responsible for converting scripts into efficient execution plans for the Cosmos distributed computing platform. In this paper, we describe how reasoning about data partitioning is incorporated into the SCOPE optimizer. We show how relational operators affect partitioning, sorting and grouping properties and describe how the optimizer reasons about and exploits such properties to avoid unnecessary operations. In most optimizers, consideration of parallel plans is an afterthought done in a postprocessing step. Reasoning about partitioning enables the SCOPE optimizer to fully integrate consideration of parallel, serial and mixed plans into the cost-based optimization. The benefits are illustrated by showing the variety of plans enabled by our approach.
Published: 2010
Full Text: View/download PDF

25. Efficient exploitation of similar subexpressions for query processing

Author: Wolfgang Lehner, Johann-Christoph Freytag, Per-Ake Larson, and Jingren Zhou
Subjects: Web search query, Theoretical computer science, Computer science, Materialized view, similar subexpressions, query optimization, query processing, Query language, computer.software_genre, Query optimization, Spatial query, Query plan, Query expansion, Web query classification, ähnliche Unterausdrücke, Abfrageoptimierung, Abfrageverarbeitung, Sargable, Data mining, ddc:004, computer
Abstract: Complex queries often contain common or similar subexpressions, either within a single query or among multiple queries submitted as a batch. If so, query execution time can be improved by evaluating a common subexpression once and reusing the result in multiple places. However, current query optimizers do not recognize and exploit similar subexpressions, even within the same query. We present an efficient, scalable, and principled solution to this long-standing optimization problem. We introduce a light-weight and effective mechanism to detect potential sharing opportunities among expressions. Candidate covering subexpressions are constructed and optimization is resumed to determine which, if any, such subexpressions to include in the final query plan. The chosen subexpression(s) are computed only once and the results are reused to answer other parts of queries. Our solution automatically applies to optimization of query batches, nested queries, and maintenance of multiple materialized views. It is the first comprehensive solution covering all aspects of the problem: detection, construction, and cost-based optimization. Experiments on Microsoft SQL Server show significant performance improvements with minimal overhead.
Published: 2007
Full Text: View/download PDF

26. Cardinality estimation using sample views with quality assurance

Author: Jingren Zhou, Peter Zabback, Per-Ake Larson, and Wolfgang Lehner
Subjects: Estimation, Theoretical computer science, Computer science, Materialized view, Sample (statistics), computer.software_genre, Query optimization, Query optimization, cardinality estimation, sample views, sequential sampling, statistical quality control, Cardinality, Abfrageoptimierung, Kardinalitätsschätzung, Stichprobenansichten, sequentielle Stichproben, statistische Qualitätskontrolle, Cardinality (SQL statements), Data mining, ddc:004, computer
Abstract: Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have.
Published: 2007
Full Text: View/download PDF

27. Efficient Maintenance of Materialized Outer-Join Views

Author: Jingren Zhou and Per-Ake Larson
Subjects: Database, Computer science, computer.internet_protocol, View, Materialized view, InformationSystems_DATABASEMANAGEMENT, Joins, computer.software_genre, Base (topology), Data warehouse, Projection (relational algebra), Overhead (computing), computer, XML
Abstract: Queries containing outer joins are common in data warehousing applications. Materialized outer-join views could greatly speed up many such queries but most database systems do not allow outer joins in materialized views. In part, this is because outer-join views could not previously be maintained efficiently when base tables are updated. In this paper we show how to efficiently maintain general outer-join views, that is, views composed of selection, projection, inner and outer joins. Foreign-key constraints are exploited to reduce maintenance overhead. Experimental results show that maintaining an outer-join view need not be more expensive than maintaining an inner-join view.
Published: 2007
Full Text: View/download PDF

28. Dynamic Materialized Views

Author: Luping Ding, Jonathan Goldstein, Per-Ake Larson, and Jingren Zhou
Subjects: Set (abstract data type), Database, Computer science, Materialized view, Cache, Space (commercial competition), computer.software_genre, computer, Row
Abstract: A conventional materialized view blindly materializes and maintains all rows of a view, even rows that are never accessed. We propose a more flexible materialization strategy aimed at reducing storage space and view maintenance costs. A dynamic materialized view selectively materializes only a subset of rows, for example, the most frequently accessed rows. One or more control tables are associated with the view and define which rows are currently materialized. The set of materialized rows can be changed dynamically, either manually or automatically by an internal cache manager using a feedback loop. Dynamic execution plans are generated to decide whether the view is applicable at run time. Experimental results in Microsoft SQL Server show that compared with conventional materialized views, dynamic materialized views greatly reduce storage requirements and maintenance costs while achieving better query performance with improved buffer pool efficiency.
Published: 2007
Full Text: View/download PDF

29. Exploiting self-monitoring sample views for cardinality estimation

Author: Wolfgang Lehner, Peter Zabback, Per-Ake Larson, and Jingren Zhou
Subjects: Estimation, Theoretical computer science, Computer science, InformationSystems_DATABASEMANAGEMENT, Sample (statistics), computer.software_genre, Query optimization, Query optimization, cardinality estimation, sample views, sequential sampling, statistical quality control, Self-monitoring, Abfrageoptimierung, Kardinalitätsschätzung, Stichprobenansichten, sequentielle Stichproben, statistische Qualitätskontrolle, Cardinality (SQL statements), Sargable, Data mining, ddc:004, computer
Abstract: Good cardinality estimates are critical for generating good execution plans during query optimization. Complex predicates, correlations between columns, and user-defined functions are extremely hard to handle when using the traditional histogram approach. This demo illustrates the use of sample views for cardinality estimations as prototyped in Microsoft SQL Server. We show the creation of sample views, discuss how they are exploited during query optimization, and explain their potential effect on query plans. In addition, we also show our implementation of maintenance policies using statistical quality control techniques based on query feedback.
Published: 2007

30. Stacked indexed views in microsoft SQL server

Author: David DeHaan, Per-Ake Larson, and Jingren Zhou
Subjects: Matching (statistics), Class (computer programming), Information retrieval, Limit (category theory), Speedup, Database, Computer science, Materialized view, Base (topology), computer.software_genre, computer, Information schema, Signature (logic)
Abstract: Appropriately selected materialized views (also called indexed views) can speed up query execution by orders of magnitude. Most database systems limit support for materialized views to select-project-join expressions, possibly with a group-by, over base tables because this class of views can be efficiently maintained incrementally and thus kept up to date with the underlying source tables. However, limiting views to reference only base tables restricts the class of queries that can be supported by materialized views. View stacking (also called views on views) relaxes one restriction by allowing a materialized view to reference both base tables and other materialized views. This extends materialized view support to additional types of queries. This paper describes a prototype implementation of stacked views within Microsoft SQL Server and explains which classes of queries can be supported. To support view matching for stacked views, a signature mechanism was added to the optimizer. This mechanism turned out to be beneficial also for regular views by significantly speeding up view matching.
Published: 2005
Full Text: View/download PDF

31. MTCache: transparent mid-tier database caching in SQL server

Author: Jonathan Goldstein, Per-Ake Larson, and Jingren Zhou
Subjects: Autocommit, SQL, Database caching, Relational database, Computer science, computer.software_genre, Language Integrated Query, Client–server model, Internet Authentication Service, In-Memory Processing, Server, Query by Example, computer.programming_language, Database server, Application server, Materialized view, InformationSystems_DATABASEMANAGEMENT, Data Transformation Services, User-defined function, Server farm, Scalability, Operating system, Cache, Log shipping, computer, Business Intelligence Markup Language
Abstract: Many applications today run in a multitier environment with browser-based clients, midtier (application) servers and a backend database server. Midtier database caching attempts to improve system throughput and scalability by offloading part of the database workload to intermediate database servers that partially replicate data from the backend server. The fact that some queries are offloaded to an intermediate server should be completely transparent to applications - one of the key distinctions between caching and replication. MTCache is a prototype midtier database caching solution for SQL server that achieves this transparency. It builds on SQL server's support for materialized views, distributed queries and replication. We describe MTCache and report experimental results on the TPC-W benchmark. The experiments show that a significant part of the query workload can be offloaded to cache servers, resulting in greatly improved scale-out on the read-dominated workloads of the benchmark. Replication overhead was small with an average replication delay of less than two seconds.
Published: 2004
Full Text: View/download PDF

32. Support for relaxed currency and consistency constraints in MTCache

Author: Raghu Ramakrishnan, Per-Ake Larson, Jonathan Goldstein, and Hongfei Guo
Subjects: Computer science, Currency, Data mining, computer.software_genre, computer
Published: 2004
Full Text: View/download PDF

33. Relaxed currency and consistency

Author: Raghu Ramakrishnan, Per-Ake Larson, Jonathan Goldstein, and Hongfei Guo
Subjects: SQL, Syntax (programming languages), Computer science, Programming language, Replica, Query optimization, computer.software_genre, Consistency (database systems), Asynchronous communication, Scalability, Cache, computer, Compile time, computer.programming_language
Abstract: Despite the widespread and growing use of asynchronous copies to improve scalability, performance and availability, this practice still lacks a firm semantic foundation. Applications are written with some understanding of which queries can use data that is not entirely current and which copies are "good enough"; however, there are neither explicit requirements nor guarantees. We propose to make this knowledge available to the DBMS through explicit currency and consistency (C&C) constraints in queries and develop techniques so the DBMS can guarantee that the constraints are satisfied. In this paper we describe our model for expressing C&C constraints, define their semantics, and propose SQL syntax. We explain how C&C constraints are enforced in MTCache, our prototype mid-tier database cache, including how constraints and replica update policies are elegantly integrated into the cost-based query optimizer. Consistency constraints are enforced at compile time while currency constraints are enforced at run time by dynamic plans that check the currency of each local replica before use and select sub-plans accordingly. This approach makes optimal use of the cache DBMS while at the same time guaranteeing that applications always get data that is "good enough" for their purpose.
Published: 2004
Full Text: View/download PDF

34. Transparent mid-tier database caching in SQL server

Author: Jingren Zhou, Per-Ake Larson, and Jonathan Goldstein
Subjects: Database caching, Information retrieval, Automatic image annotation, Database, Computer science, Sql server, Data definition language, Data Transformation Services, computer.software_genre, computer, computer.programming_language, Database index
Published: 2003
Full Text: View/download PDF

35. Performing group-by before join [query processing]

Author: Per-Ake Larson and W.P. Yan
Subjects: SQL, Theoretical computer science, Semantics (computer science), Computer science, Joins, Join (topology), Type (model theory), Query optimization, computer.software_genre, Transformation (function), Null (SQL), Data mining, computer, computer.programming_language
Abstract: Assume that we have an SQL query containing joins and a group-by. The standard way of evaluating this type of query is to first perform all the joins and then the group-by operation. However, it may be possible to perform the group-by early, that is, to push the group-by operation past one or more joins. Early grouping may reduce the query processing cost by reducing the amount of data participating in joins. We formally define the problem, adhering strictly to the semantics of NULL and duplicate elimination in SQL2, and prove necessary and sufficient conditions for deciding when this transformation is valid. In practice, it may be expensive or even impossible to test whether the conditions are satisfied. Therefore, we also present a more practical algorithm that tests a simpler, sufficient condition. This algorithm is fast and detects a large subclass of transformable queries. >
Published: 2002
Full Text: View/download PDF

36. A query sampling method for estimating local cost parameters in a multidatabase system

Author: Qiang Zhu and Per-Ake Larson
Subjects: Spatial query, Query expansion, Computer science, View, InformationSystems_DATABASEMANAGEMENT, Online aggregation, Sargable, Sample (statistics), Data mining, computer.software_genre, Query optimization, computer
Abstract: In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new method that employs a query sampling technique to estimate the cost parameters of an autonomous local database system. We introduce a classification for grouping local queries and suggest a cost estimation formula for the queries in each class. We present a procedure to draw a sample of queries from each class and use the observed costs of sample queries to determine the cost parameters by multiple regression. Experimental results indicate that the method is quite promising for estimating the cost of local queries in an MDBS. >
Published: 2002
Full Text: View/download PDF

37. Exploiting uniqueness in query optimization

Author: Per-Ake Larson and G. N. Paulley
Subjects: SQL, Theoretical computer science, Information retrieval, Alias, View, Computer science, Relational database, Online aggregation, Range query (database), Query language, Query optimization, sort, Query by Example, Sargable, Uniqueness, computer, Boolean conjunctive query, computer.programming_language
Abstract: Consider an SQL query that specifies duplicate elimination via a DISTINCT clause. Because duplicate elimination often requires an expensive sort of the query result, it is often worthwhile to identify situations where the DISTINCT clause is unnecessary, to avoid the sort altogether. We prove a necessary and sufficient condition for deciding if a query requires duplicate elimination. The condition exploits knowledge about keys, table constraints, and query predicates. Because the condition cannot always be tested efficiently, we offer a practical algorithm that tests a simpler, sufficient condition. We consider applications of this condition for various types of queries, and show that we can exploit this condition in both relational and nonrelational database systems.
Published: 2002
Full Text: View/download PDF

38. Developing Evolutionary Cost Models for Query Optimization in a Dynamic Multidatabase Environment

Author: Per-Ake Larson, Amira Rahal, and Qiang Zhu
Subjects: Dynamic programming, Computer science, business.industry, Distributed computing, Key (cryptography), Evolutionary algorithm, Sample (statistics), Artificial intelligence, Query optimization, business
Abstract: Deriving local cost models for query optimization in a multidatabase system (MDBS) is a challenging issue due to local autonomy. It becomes even more difficult when dynamic environmental factors are taken into consideration. In this paper, we study how to evolve a cost model to capture a slowly-changing dynamic MDBS environment so that the cost model is kept up-to-date all the time. We propose a novel evolutionary technique, called the shifting method, to tackle this issue. The key idea is to adjust a cost model by adding the up-to-date performance information of a new sample query into and, in the meantime, removing the out-of-date information of the oldest sample query from consideration at each step. It is shown that this method is more efficient than the direct re-building approach. The relevant issues including derivation of recurrence update formulas, development of efficient algorithm, analysis of complexities as well as some aspects of implementation are studied. Our theoretical and experimental results demonstrate that the proposed shifting method is quite promising in deriving accurate evolutionary cost models for a slowly-changing dynamic MDBS environment.
Published: 2002
Full Text: View/download PDF

39. Optimizing queries using materialized views

Author: Per-Ake Larson and Jonathan Goldstein
Subjects: Transformation (function), Information retrieval, Theoretical computer science, Heuristic (computer science), Computer science, View, Materialized view, Scalability, Joins, Query optimization
Abstract: Materialized views can provide massive improvements in query processing time, especially for aggregation queries over large tables. To realize this potential, the query optimizer must know how and when to exploit materialized views. This paper presents a fast and scalable algorithm for determining whether part or all of a query can be computed from materialized views and describes how it can be incorporated in transformation-based optimizers. The current version handles views composed of selections, joins and a final group-by. Optimization remains fully cost based, that is, a single “best” rewrite is not selected by heuristic rules but multiple rewrites are generated and the optimizer chooses the best alternative in the normal way. Experimental results based on an implementation in Microsoft SQL Server show outstanding performance and scalability. Optimization time increases slowly with the number of views but remains low even up to a thousand.
Published: 2001
Full Text: View/download PDF

40. Memory allocation for long-running server applications

Author: Murali R. Krishnan and Per-Ake Larson
Subjects: Computer science, C dynamic memory allocation, Registered memory, Thread (computing), Static memory allocation, computer.software_genre, Memory map, Lock (computer science), Allocator, Memory management, Shared memory, Server, Operating system, Interleaved memory, Cache, computer
Abstract: Prior work on dynamic memory allocation has largely neglected long-running server applications, for example, web servers and mail servers. Their requirements differ from those of one-shot applications like compilers or text editors. We investigated how to build an allocator that is not only fast and memory efficient but also scales well on SMP machines. We found that it is not sufficient to focus on reducing lock contention - higher speedups require a reduction in cache misses and bus traffic. We then designed and prototyped a new allocator, called LKmalloc, targeted for both traditional applications and server applications. LKmalloc uses several subheaps, each one with a separate set of free lists and memory arena. A thread always allocates from the same subheap but can free a block belonging to any subheap. A thread is assigned to a subheap by hashing on its thread ID. WC compared its performance with several other allocators on a server-like, simulated workload and found that it indeed scales well and is quite fast hut memory more efficiently.
Published: 1998
Full Text: View/download PDF

41. Dynamic hashing

Author: Per-Ake Larson
Subjects: Binary tree, Computer Networks and Communications, Computer science, Applied Mathematics, Dynamic perfect hashing, Hash function, Binary number, Data structure, Computational Mathematics, Index (publishing), Data_FILES, Algorithm, Extendible hashing, Software, Auxiliary memory
Abstract: A new file organisation called dynamic hashing is presented. The organisation is based on normal hashing, but the allocated storage space can easily be increased and decreased without reorganising the file, according to the number of records actually stored in the file. The expected storage utilisation is analysed and is shown to be approximately 69% all the time. Algorithms for inserting and deleting a record are presented and analysed. Retrieval of a record is fast, requiring only one access to secondary storage. There are no overflow records. The proposed scheme necessitates maintenance of a relatively small index structured as a forest of binary trees or slightly modified binary tries. The expected size of the index is analysed and a compact representation of the index is suggested.
Published: 1978
Full Text: View/download PDF

42. Analysis of hashing with chaining in the prime area

Author: Per-Ake Larson
Subjects: Computational Mathematics, Random search, Control and Optimization, Computational Theory and Mathematics, Simple (abstract algebra), Hash function, Chaining, Data_FILES, Storage area, 2-choice hashing, Algorithm, Prime (order theory), Mathematics
Abstract: The expected performance of hashing with chaining in the prime area is analyzed. The method analyzed is briefly characterized as hashing with chaining of overflow records in the prime storage area, using one or several noncoalescing chains per bucket, and with random search for empty space. The analysis is asymptotic, and it is assumed that deletions do not occur. Simple closed formulas cannot be obtained, except in the case of bucket size one, but numerical results are readily computed. The expected performance compares favorably with that of other methods for handling overflow records.
Published: 1984
Full Text: View/download PDF

43. File organization using composite perfect hashing

Author: Per-Ake Larson and M.V. Ramakrishna
Subjects: Universal hashing, Computer science, Dynamic perfect hashing, Data_FILES, 2-choice hashing, Linear hashing, Consistent hashing, Algorithm, Perfect hash function, Hash table, Information Systems, K-independent hashing
Abstract: Perfect hashing refers to hashing with no overflows. We propose and analyze a composite perfect hashing scheme for large external files. The scheme guarantees retrieval of any record in a single disk access. Insertions and deletions are simple, and the file size may vary considerably without adversely affecting the performance. A simple variant of the scheme supports efficient range searches in addition to being a completely dynamic file organization scheme. These advantages are achieved at the cost of a small amount of additional internal storage and increased cost of insertions.
Published: 1989
Full Text: View/download PDF

44. Updating derived relations: detecting irrelevant and autonomously computable updates

Author: José A. Blakeley, Per-Ake Larson, and Neil Coburn
Subjects: Set (abstract data type), Relation (database), Selection (relational algebra), Computer science, Class (philosophy), Join (topology), Relational algebra, Tuple, Base (topology), Algorithm, Information Systems
Abstract: Consider a database containing not only base relations but also stored derived relations (also called materialized or concrete views). When a base relation is updated, it may also be necessary to update some of the derived relations. This paper gives sufficient and necessary conditions for detecting when an update of a base relation cannot affect a derived relation (an irrelevant update), and for detecting when a derived relation can be correctly updated using no data other than the derived relation itself and the given update operation (an autonomously computable update). The class of derived relations considered is restricted to those defined by PSJ -expressions, that is, any relational algebra expressions constructed from an arbitrary number of project, select and join operations (but containing no self-joins). The class of update operations consists of insertions, deletions, and modifications, where the set of tuples to be deleted or modified is specified by a selection condition on attributes of the relation being updated.
Published: 1989
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

44 results on '"Per-Ake Larson"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources