Descriptor: "ELECTRONIC file management" / Journal: communications of the acm - Searchworks@Jio Institute Digital Library Search Results

1. Research for Practice: Crash Consistency: Keeping data safe in the presence of crashes is a fundamental problem.

Author: ALAGAPPAN, RAMNATTHAN
Subjects: *COMPUTER system failures, *COMPUTER system failure prevention, *COMPUTER storage capacity, *APPLICATION software, *ELECTRONIC file management, *COMPUTERS
Abstract: This article discusses system crashes and how each level of the system needs to be implemented correctly and the system component interfaces need to be used correctly by applications in order to prevent crashes. Several papers are cited within this article exploring the file system, a lower level component within the system, an exploration of interface-level guarantees with bug-finders, crash-consistent programs, and how the newer concept of persistent memory interacts with system crashes.
Published: 2023
Full Text: View/download PDF

2. Activity-Centric Computing Systems.

Author: BARDRAM, JAKOB E., JEURIS, STEVEN, TELL, PAOLO, HOUBEN, STEVEN, and VOIDA, STEPHEN
Subjects: *COMPUTER science, *RESEARCH & development, *CONCEPTUAL models, *TEAMS in the workplace, *ELECTRONIC file management
Abstract: The article discusses the notion of Activity-Centric Computing (ACC) in relation to application-centric computing. Topics include the history of ACC systems in relation to research and development, the relation of ACC to conceptual models, and the relation of ACC to collaborative work and film or resource management.
Published: 2019
Full Text: View/download PDF

3. Fast and Powerful Hashing Using Tabulation.

Author: Thorup, Mikkel
Subjects: *HASHING, *MATHEMATICAL bounds, *INDEPENDENCE (Mathematics), *ELECTRONIC file management, *COMPUTER logic
Abstract: Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist (A new hashing method with application for game playing. Technical Report 88, Computer Sciences Department, University of Wisconsin). Keys are viewed as consisting of c characters and we have precomputed character tables h1, . . ., hc mapping characters to random hash values. A key x = (x1, . . ., xc) is hashed to h1[x1] ⊕ h2[x2]. . . . . ⊕ hc[xc]. This schemes is very fast with character tables in cache. Although simple tabulation is not even four-independent, it does provide many of the guarantees that are normally obtained via higher independence, for example, linear probing and Cuckoo hashing. Next, we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoffstyle tail bounds and a very small bias for minwise hashing. This is also yields an extremely fast pseudorandom number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Wegman and Carter.26 In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. Although these tabulation schemes are all easy to implement and use, their analysis is not. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

4. Certifying a File System Using Crash Hoare Logic: Correctness in the Presence of Crashes.

Author: Chajed, Tej, Haogang Chen, Chlipala, Adam, Kaashoek, M. Frans, Zeldovich, Nickolai, and Ziegler, Daniel
Subjects: *ELECTRONIC file management, *HOARE logic, *COMPUTER system failures, *INFORMATION storage & retrieval systems, *AUTOMATION, *ERRORS
Abstract: FSCQ is the first file system with a machine-checkable proof that its implementation meets a specification, even in the presence of fail-stop crashes. FSCQ provably avoids bugs that have plagued previous file systems, such as performing disk writes without sufficient barriers or forgetting to zero out directory blocks. If a crash happens at an inopportune time, these bugs can lead to data loss. FSCQ's theorems prove that, under any sequence of crashes followed by reboots, FSCQ will recover its state correctly without losing data. To state FSCQ's theorems, this paper introduces the Crash Hoare logic (CHL), which extends traditional Hoare logic with a crash condition, a recovery procedure, and logical address spaces for specifying disk states at different abstraction levels. CHL also reduces the proof effort for developers through proof automation. Using CHL, we developed, specified, and proved the correctness of the FSCQ file system. Although FSCQ's design is relatively simple, experiments with FSCQ as a user-level file system show that it is sufficient to run Unix applications with usable performance. FSCQ's specifications and proofs required significantly more work than the implementation, but the work was manageable even for a small team of a few researchers. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

5. Crash Consistency.

Author: PILLAI, T. S., CHIDAMBARAM, V., ALAGAPPAN, R., AL-KISWANY, S., ARPACI-DUSSEAU, A. C., and ARPACI-DUSSEAU, R. H.
Subjects: *ELECTRONIC file management, *COMPUTER software, *VON Neumann architecture (Computers), *COMPUTER system failures, *SOFTWARE architecture, *COMPUTER files
Abstract: The article focuses on computer crash mitigation in Von Neumann computer architecture. The authors propose reexamining the way that the file system's fundamental abstractions are set up. Topics include their research on the problems that crashes pose in the real world in terms of data corruption and loss of data. The article looks at the example of a database management system that stores data in one file. The authors discuss their study that found 30 vulnerabilities in 11 computer software applications that they looked at. INSETS: Try It Yourself!;The Unspoken Agreement.;Best Practices for Application Developers..
Published: 2015
Full Text: View/download PDF

6. Research for Practice: Vigorous Public Debates in Academic Computer Science.

Author: REGEHR, JOHN
Subjects: *COMPUTER science research, *COMPUTER software development, *ELECTRONIC file management, *COMPUTER security, *COMPUTER systems
Abstract: The article discusses various issues related to academic computer science research. Topics include debates on software-development methodology in relation to independent faults, a debate on the evaluation of file system performance, and the use of software-based attestation (SWATT) as a protocol for checking memory images from remote systems.
Published: 2017
Full Text: View/download PDF

7. Proving the Correctness of Nonblocking Data Structures.

Author: DESNOYERS, MATHIEU
Subjects: *DATA structures, *ELECTRONIC data processing, *COMPUTER programming, *ELECTRONIC file management, *SCALABILITY, *SYSTEMS design
Abstract: The article discusses the correctness of nonblocking data structures. Topics covered include language and architecture memory models as considerations when programming nonblocking data structures, the unaligned word-sized memory accesses as a good example of a problem with nonblocking data-structure programming, and how reordering can be conducted at various levels for performance reasons. Also mentioned are the considerations of throughput and scalability tied to atomic accesses and memory barriers, the use of counterexamples to illustrate problems in nonblocking data-structure design and implementation, and execution-based model checking.
Published: 2013
Full Text: View/download PDF

8. A File System All Its Own.

Author: LEVENTHAL, ADAM H.
Subjects: *FLASH memory, *SOLID state electronics, *ELECTRONIC file management, *TECHNOLOGICAL forecasting, *COMPUTER performance, *HARD disks, *COMPUTER network resources
Abstract: The article examines the development of flash memory in computing, focusing on how solid-state devices (SSDs) are experiencing improvements in performance, reliability, and cost. It discusses the construction of SSDs as a replacement for hard drives as well as the use of NAND flash memory. Particular attention is given to flash-optimized file systems that could be created by software companies building SSDs. Forecasts from experts on the likelihood of flash memory's continued technological relevance are also covered.
Published: 2013
Full Text: View/download PDF

9. Theory and Applications of b-Bit Minwise Hashing.

Subjects: *HASHING, *ELECTRONIC file management, *INFORMATION retrieval research, *INFORMATION resources management, *COMPUTER science research, *COMPUTATIONAL mathematics
Abstract: The article presents computer science research on hashing. A variation on minwise hashing terms b-bit minwise hashing is offered for the approximate computation of set similarity in very large datasets. An algorithm related to the technique is examined in terms of the improvements it provides in storage requirements and computational overhead in information retrieval and database management. A comparison is made with other alternative techniques to minwise hashing, and it is noted that b-bit minwise hashing involves only minor modifications to minwise hashing schemes.
Published: 2011
Full Text: View/download PDF

10. API Design Matters.

Author: HENNING, MICHI
Subjects: *APPLICATION program interfaces, *SOFTWARE engineering, *APPLICATION software, *COMPUTER operating systems, *ELECTRONIC file management, *COMPUTER science
Abstract: The article discusses application programming interfaces (API) used in the software engineering industry. An API is a set of routines, data structures, object classes or protocols provided by libraries and operating system services in order to support the building of applications. A good API, the author states, works without friction and deals correctly with boundary conditions. Topics include the different ways one can design an API incorrectly, the inherent difficulty in designing a good API, and the collateral damage that occurs when minor, isolated design flaws interact with each other.
Published: 2009
Full Text: View/download PDF

11. Understanding Ontological Engineering.

Author: DEVEDŽIĆ, VLADAN
Subjects: *ONTOLOGIES (Information retrieval), *DATA structures, *INTERDISCIPLINARY approach to knowledge, *INTERDISCIPLINARY research, *COMPUTER engineering, *ELECTRONIC data processing, *ELECTRONIC file management, *INFORMATION retrieval
Abstract: The article discusses the field of ontological engineering with a focus on its application for knowledge-based systems, natural language processing, intelligent information retrieval on the Internet, virtual organizations, and simulation and modeling. Topics include the creation of large-scale ontologies, the representation of ontological knowledge in computer science, and interdisciplinary relations between ontological engineering and other fields. Ontologies are defined as explicit representations of domain concepts for data structures.
Published: 2002
Full Text: View/download PDF

12. PRACTICAL MA\INIMAL PERFECT HASG\H FUNTIONS FOR LARGE DATABASES.

Author: Fox, Edward A., Health, Lenwood S., Chen, Qi Fan, and Daoud, Amjad M.
Subjects: *HASHING, *ELECTRONIC file management, *SPACETIME, *COMPUTER algorithms, *HYPERTEXT systems, *DATABASE management, *INFORMATION retrieval
Abstract: The article focuses on solutions to improve current hashing techniques by eliminating problems of wasted space and time. A less extensive literature has grown up, during the last decade, dealing with perfect hash functions. Mapping to integers and existence proofs of hash function have been described. There are several general strategies for finding perfect hash functions, the simplest one is to select a class of functions that is likely to include a number of minimal perfect hash functions (MPHF), and then to search for a MPHF in that class by assigning different values to each of the parameters characterizing the class. These are randomness, vertex degree distribution and fitting into a disk. A variety of experimental tests have been done using algorithms and results suggest that algorithms processes large key sets, there is some variation in processing time because of the probabilistic nature of the operations. MPHF can be used in hypertext, hypermedia, semantic networks, file managers, database management systems, object managers, information retrieval systems and compilers.
Published: 1992
Full Text: View/download PDF

13. Concurrent Operations on Extendible Hashing and its Performance.

Author: Kumar, Vijay and Sibley, Edgar H.
Subjects: *HASHING, *ALGORITHMS, *COMPUTER science, *COMPUTER programming, *COMPUTER storage devices, *ELECTRONIC file management
Abstract: Extendible hashing is a dynamic data structure which accommodates expansion and contraction of any stored data efficiently. In this article, an algorithm has been developed for managing concurrent operations on extendible hashing by achieving optimal memory utilization by supporting directory expansion and contraction, page split, and merge. The results of this study have been encouraging in the sense that it seems to provide a higher degree of concurrency compared to other algorithms on an extendible hash file. [ABSTRACT FROM AUTHOR]
Published: 1990
Full Text: View/download PDF

14. Fast Hashing of Variable-Length Text Strings.

Author: Pearson, Peter K. and Sibley, Edgar H.
Subjects: *HASHING, *ELECTRONIC file management, *ALGORITHMS, *MICROPROCESSORS, *ECONOMIC forecasting, *COMPUTER science
Abstract: This article focuses on hashing techniques. The relatively rare articles on hashing functions themselves tend to discuss algorithms that operate on values of predetermined length or that make heavy use of operations (multiplication, division, or shifts of long bit strings) that are absent from the instruction sets of smaller microprocessors. Two desirable properties of this algorithm for hashing variable-length strings derive from the technique of cryptographic checksums or message authentication codes, from which it is adapted. First, a good cryptographic checksum ensures that small changes to the data result In large and seemingly random changes to the checksum. In the hashing adaptation, this results in good separation of very similar strings. The purpose of any text hashing function is to take text strings, even very similar text strings and map them onto integers that are spread as uniformly as possible over the intended range of output values. In the absence of prior knowledge about the strings being hashed, a perfectly uniform output distribution cannot be expected.
Published: 1990
Full Text: View/download PDF

15. Efficient Decoding of Prefix Codes.

Author: Hirschberg, Daniel S., Lelewer, Debra A., and Sibley, Edgar H.
Subjects: *DATA compression, *CODING theory, *COMPUTER programming, *ELECTRONIC file management, *COMPUTER science, *DIGITAL electronics
Abstract: The article discusses problems associated with data compression. Four methods of decoding prefix codes in limited space have been presented. The methods are partitioned into two categories based on the data structuring strategy employed. Tables comparing time and space requirements of the four methods expose the relevant parameters. The methods authors' describe define only the decoding phase of a data compression system. The choice of a fixed encoding dictionary is the most critical factor in determining the performance of a system based on their methods. While the representation of the code contributes to the compression ratio attained, most of the compression is achieved by selecting a dictionary that is well-suited to the data being compressed. With an advantageous choice of dictionary, the authors' methods can attain compression performance comparable to state-of-the-art techniques such as the Unix utility compress. Defining a dictionary that guarantees good compression is, however, a difficult task.
Published: 1990
Full Text: View/download PDF

16. Dynamic File Migration in Distributed Computer Systems.

Author: Gavish, Bezalel, Liu Sheng, Olivia R., and Zmud, Robert
Subjects: *ELECTRONIC file management, *DATABASE management, *ELECTRONIC data processing, *INFORMATION storage & retrieval systems, *DATA transmission systems, *BUSINESS information services
Abstract: The article summarizes accomplishments of researchers on file migration. The importance of file migration is increasing because of its potential to improve the performance of distributed office, manufacturing and hospital information systems. Due to large geographical regions of business operations and external competitive pressures, information services are expected to collect, store, retrieve, process, aggregate and distribute timely information from data generated at geographically remote and dispersed sites. The traditional mode of providing computerized information services in large organizations has been to operate large centralized computer systems, with most of the processing, storage and retrieval of data performed at central sites. Such systems have the disadvantages of high cost and high loads on the centralized computing and communication devices. Consequently, both communication with and processing at the central site have become system bottlenecks, access speeds have slowed down, and the availability and reliability of information services has deteriorated.
Published: 1990
Full Text: View/download PDF

17. Technical Correspondence.

Author: Suchenek, Mare A.
Subjects: *COMPUTER storage devices, *INFORMATION retrieval, *ELECTRONIC file management, *CACHE memory, *ALGORITHMS, *DISK access (Computer science)
Abstract: This article presents correspondence from computer scientists related to technical problems. One paper presents commentary on the movement of large amounts of information in the form of data and commands in the form of instructions into and out of the computer's processing module. The cache store unit is an optional circuit that improves the effective access and cycle time of store operations. Cache takes advantage of the general programming characteristic of locality of reference. Improvements in disk access times have also failed to keep up with improvements in processor times. Cache memory has been used to compensate for some of the differences in performance between the processor and its data storage facilities. Therefore, how cache is implemented can have a significant impact on system performance and literally can make or break a particular system. The importance of cache between the processor and main memory can easily be demonstrated by simple examination of current technology. The current generation of general-purpose-high-speed computers have processor cycle-times around 15 to 25 nanoseconds.
Published: 1989

18. Data Compression with Finite Windows.

Author: Fiala, Edward R. and Greene, Daniel H.
Subjects: *DATA compression, *CODING theory, *DATA transmission systems, *DATABASE management, *ELECTRONIC file management, *COMPUTER programming
Abstract: Several methods are presented for adaptive, invertible data compression in the style of Lempel's and Ziv's first textual substitution proposal. For the first two methods, the article describes modifications of McCreight's suffix tree data structure that support cyclic maintenance of a window on the most recent source characters. A percolating update is used to keep node positions within the window, and the updating process is shown to have constant amortized cost. Other methods explore the tradeoffs between compression time, expansion time, data structure size, and amount of compression achieved. The article includes a graph-theoretic analysis of the compression penalty incurred by our codeword selection policy in comparison with an optimal policy, and it includes empirical studies of the performance of various adaptive compressors from the literature. [ABSTRACT FROM AUTHOR]
Published: 1989
Full Text: View/download PDF

19. CACHING AND OTHER DISK ACCESS AVOIDANCE TECHNIQUES ON PERSONAL COMPUTERS.

Author: Jalics, Paul J., McIntyre, David R., and Sibley, Edgar H.
Subjects: *COMPUTER storage devices, *CACHE memory, *COMPUTER operating systems, *ELECTRONIC file management, *DATA disk drives, *HARD disks
Abstract: The article presents information about an experiment to examine the performance of three common disk-caching systems in an effort to determine a technique for reducing gap between disk access and central processing unit speed. Both the hardware and software cache programs tested had options to tune their performance for specific situations. Initial experiments were run with a hardware cache of 3.5 megabytes, and a software cache of up to 4 megabytes. It was decided to reduce the cache size to 512 kilobytes. With a maximum of 20 megabytes of hard disk storage, it was not practical to deal with files of more than 14 megabytes. All the tests were carried out one at a time since multiprogramming is not possible under the DOS operating system. Thus, in a typical test there is only one file in use; the entire file fits into the cache, and processing time is directly proportional to file size. Therefore, the results in percent savings are typically applicable to files that are smaller than 256 kilobytes.
Published: 1989
Full Text: View/download PDF

20. AN ADAPTIVE DEPENDENCY SOURCE MODEL FOR DATA COMPRESSION.

Author: Abrahamson, David M. and Sibley, Edgar H.
Subjects: *EMAIL systems, *DATA compression, *CODING theory, *DATA transmission systems, *DATABASE management, *ELECTRONIC file management
Abstract: The adaptive dependency source model, as described in this article, provides a simple way to benefit from the interdependency between the successive characters in a text. For most files, when compared with either the adaptive arithmetic coding model by Witten et al., or Gallager's adaptive Huffman encoding model, the adaptive dependency model produces a 12 to 15 percent increase in compression efficiency. [ABSTRACT FROM AUTHOR]
Published: 1989
Full Text: View/download PDF

21. Dynamic Hash Tables.

Author: Sleaton, Daniel and Larson, Per-Ake
Subjects: *HASHING, *ELECTRONIC file management, *INFORMATION storage & retrieval systems
Abstract: Demonstrates how to adapt linear hashing and spiral storage for hash tables stored in main memory. Dynamic hashing schemes originally designed for external files; Necessary data structures and algorithms; Actual execution times; Expected performance; Simple unbalanced binary tree; Double hashing with periodic rehashing into a larger table.
Published: 1988

22. Literate programming.

Author: Van Wyk, Christopher J. and Denning, Peter J.
Subjects: *DATA structures, *PROGRAMMING languages, *ABSTRACT data types (Computer science), *ELECTRONIC file management, *COMPUTER programming
Abstract: Presents information about literate programming. Use of program codes segments in text; Benefits of literate programs; Features of a system for literate programming; Discussion on programming style and data structures.
Published: 1987
Full Text: View/download PDF

23. DATA COMPRESSION ON A DATABASE SYSTEM.

Author: Cormack, Gordon V.
Subjects: *DATA compression, *DATABASE management, *CODING theory, *DATABASES, *DATA transmission systems, *ELECTRONIC file management
Abstract: The article describes data compression on a database system. Compressing information in a database system is attractive for two major reasons: storage saving and performance improvement. Storage saving is a direct and obvious benefit, whereas performance improvement derives from the fact that less physical data need be moved for any particular operation on the database. Data in an Information Management System (IMS) database are stored as a set of segments (records). The IMS system makes provision for a data compression exit to be invoked whenever a segment is stored, and for a complementary expansion exit to be called whenever a segment is retrieved. Thus, compression and expansion exits become an integral part of the IMS system and are therefore heavily constrained. In this scenario, it is inconvenient for exit routines to have to determine any information about the format of a segment being processed, such as where the field boundaries are or what type of data the fields contain. Finally, because the exit routines become part of the IMS system and are loaded into common storage, a proliferation of different exits is undesirable: One pair of exits must apply to a very wide variety of data. Indeed, each exit should be able to handle all possible character sequences.
Published: 1985
Full Text: View/download PDF

24. A Probability Model for Overflow Sufficiency in Small Hash Tables.

Author: Horowitz, Ellis, Norton, R.M., and Yeager, D.P.
Subjects: *HASHING, *MATHEMATICAL models, *PROBABILITY theory, *ELECTRONIC file management, *COMPUTER science, *COMPUTER storage devices
Abstract: The construction of a probabilistic model for table sufficiency in small hash tables makes it possible to accurately predict the results of loading such tables and to design those tables for optimum use of time and storage space. Charts like those in Tables V-VII can be much more effective ways of making decisions about the configuration of small tables than the Poisson estimates now in use. Those estimates use as parameters the bucket size and the primary load factor. They ignore two very important factors: the total number of buckets and the ratio of primary buckets to overflow buckets. Moreover, when using the Poisson estimates the only available tool for estimating the amount of overflow space needed is the expected value of the number Mo of overflow records. The approach taken in this article provides an accurate and reliable measure of the amount of overflow space which will probably be needed for a hash table whose parameters fall within the domain of validity. Two clear directions remain: to extend the domain of validity for the computations in this model and to construct a model of a more approximate nature which will not be susceptible to the computational difficulties of this one: A complete model which is not subject to size limitations is needed. [ABSTRACT FROM AUTHOR]
Published: 1985

25. Data Structures for Quadtree Approximation and Compression.

Author: Samet, Hanan and Haralick, Robert M.
Subjects: *DATA structures, *IMAGE processing, *COMPUTER programming, *ELECTRONIC file management, *DATA compression, *COMPUTER systems
Abstract: A number of data structures for representing images by quadtrees without pointers are discussed. The image is treated as a collection of leaf nodes. Each leaf node is represented by use of a locational code corresponding to a sequence of directional codes that locate the leaf along a path from the root of the tree. Somewhat related is the concept of a forest which is a representation that consists of a collection of maximal blocks. It is reviewed and refined to enable the representation of a quadtree as a sequence of approximations. In essence, all BLACK and WHITE nodes are said to be of type GB and GW, respectively. GRAY nodes are of type GB if at least two of their sons are of type GB; otherwise, they are of type GW. Sequences of approximations using various combinations of locational codes of GB and GW nodes are proposed and shown to be superior to approximation methods based on truncation of nodes below a certain level m the tree. These approximations have two important properties. First, they are progressive in the sense that as more of the image is transmitted, the receiving device can construct a better approximation (contrast with facsimile methods which transmit the image one line at a time). Second, they are proved to lead to compression in the sense that they never require more than MIN(B. W) nodes where B and W correspond to the number of BLACK and WHITE nodes in the original quadtree. Algorithms are given for constructing the approximation sequences as well as decoding them to rebuild the original quadtree. [ABSTRACT FROM AUTHOR]
Published: 1985
Full Text: View/download PDF

26. DATA COMPRESSION USING STATIC HUFFMAN CODE-DECODE TABLES.

Author: McIntyre, David R., Pechura, Michael A., and Sibley, Edgar H.
Subjects: *COMPUTER software, *DATA compression, *COMPUTER storage devices, *DATABASE management, *WORD processors (Machinery), *ELECTRONIC file management
Abstract: This article examines data compression using static Huffman code-decode tables. Pioneering work in the construction of minimum redundancy codes was developed in 1952. In the Huffman scheme, the encoding is tailored specifically to each file, and therefore a decode table must be prefixed to the encoding in storage. Depending upon the number of distinct characters in the file and the size of the compressed file, this additional overhead can be quite significant. To overcome this storage drawback, it is possible to remove the individual decode tables, corresponding to a class of files, from in front of each compressed file and replace them with only one fixed static decode table for the entire set of compressed files. Files with and without trailing blanks were investigated to study the effectiveness of Huffman encodings both without and with simple preprocessing. As might be expected, static Huffman codes for general program categories of the same language with and without trailing blanks were identical.
Published: 1985
Full Text: View/download PDF

27. A Polynomial Time Generator for Minimal Perfect Hash Functions.

Author: Sager, Thomas J.
Subjects: *HASHING, *ELECTRONIC file management, *GENERATORS (Computer programs), *AUTOMATIC programming (Computer science), *COMPUTER software, *SYSTEMS software, *COMPUTER programming, *COMPUTER algorithms, *ELECTRONIC data processing
Abstract: A perfect hash function PHF is an injection F from a set W of M objects into the set consisting of the first N nonnegative integers where N >= M. If N = M, then F is a minimal perfect hash function, MPHF. PHFs are useful for the compact storage and fast retrieval of frequently used objects such as reserved words in a programming language or commonly employed words in a programming language or commonly employed words in a natural language. The mincycle algorithm for finding PHFs executes with an expected time complexity that is polynomial in M and has been used successfully on sets of cardinality up to 512. Given three pseudorandom functions h0, h1, and h2, the mincycle algorithm searches for a function g such that F(w) = (h0(w) + g ∘ h2(w)) mod N is a PHF. [ABSTRACT FROM AUTHOR]
Published: 1985
Full Text: View/download PDF

28. Faster Methods for Random Sampling.

Author: Vitter, Jeffrey Scott
Subjects: *ELECTRONIC file management, *STATISTICAL sampling, *ALGORITHMS
Abstract: Presents several methods for selecting computer records or files. Manner by which algorithms under the methods select files; Characteristics of the algorithms; Effect of the methods on large mainframe computers.
Published: 1984
Full Text: View/download PDF

29. FILE ORGANIZATION: IMPLEMENTATION OF A METHOD GUARANTEEING RETRIEVAL IN ONE ACCESS.

Author: Larson, Per-Ake and Kajla, Ajay
Subjects: *ELECTRONIC file management, *HASHING, *INFORMATION storage & retrieval systems
Abstract: Focuses on a file organization method based on hashing that guarantees record retrieval in one access. Comparison of the method with other modes of information retrieval; Algorithms used for the implementation of the required hashing and signature functions.
Published: 1984
Full Text: View/download PDF

30. Optimal Pagination of B-Trees with Variable-Length Items.

Author: Diehr, George and Faaland, Bruce
Subjects: *ALGORITHMS, *DYNAMIC programming, *MATHEMATICAL optimization, *ELECTRONIC file management, *PAGING (Computer science), *COMPUTER programming, *TREE graphs, *DECISION trees, *SYSTEMS engineering
Abstract: Two algorithms are developed for the optimal organization of B-trees (and variations) with variable-length items. The first algorithm solves a problem posed by McCreight, that of finding a pagination of n items that minimizes the sum of key lengths promoted to the next higher level of the tree. The algorithm requires O(n log n) time and O(n) space. The second algorithm constructs the minimum depth tree in O(n³ log n) time from the n items. Both methods rely on dynamic programming arguments and can be interpreted as shortest-path problems. Practical approaches for implementing the algorithms are discussed. [ABSTRACT FROM AUTHOR]
Published: 1984
Full Text: View/download PDF

31. Reducing the Retrieval Time of Hashing Method by Using Predictors.

Author: Nishihara, Seiichi and Ikeda, Katsuo
Subjects: *HASHING, *COMPUTER programming, *ELECTRONIC file management, *COMPUTER algorithms, *COMPUTER science
Abstract: Many methods for resolving collisions in hashing techniques have been proposed. They are classified into two main categories: open addressing and chaining. In this paper, other methods are presented that are intermediate between the two categories. The basic idea of our methods is the use of one or more predictors reserved per cell instead of a link field as in the chaining method. The predictors are used to maintain loose synonym chains. After describing the methods, the efficiencies are estimated theoretically and verified experimentally. In comparison with the chaining method, we prove that our methods significantly reduce the average number of probes necessary to retrieve a key without expending extra space. [ABSTRACT FROM AUTHOR]
Published: 1983
Full Text: View/download PDF

32. A Constructive Approach to the Design of Algorithms and Their Data Structures.

Author: Gonnet, Gaston H., Tompa, Frank W., and Horowitz, Ellis
Subjects: *DATA structures, *ELECTRONIC data processing, *COMPUTER programming, *ELECTRONIC file management, *ABSTRACT data types (Computer science), *INFORMATION storage & retrieval systems
Abstract: A framework for describing specific algorithms and their data structures is introduced so that designs can be presented in a uniform style that is suitable for discovering new designs as well as documenting known ones. Data objects are described in terms of a formal grammar, and most data manipulation is characterized as composition of a few simple algorithms. Descriptions for several standard algorithms are included to illustrate the process, and a few newly designed structures are introduced. Such an approach is expected to be extremely useful in the construction of software libraries. [ABSTRACT FROM AUTHOR]
Published: 1983
Full Text: View/download PDF

33. On-the-Fly Optimization of Data Structures.

Author: Kessels, J. L. W. and Horowitz, Ellis
Subjects: *DATA structures, *ELECTRONIC data processing, *COMPUTER programming, *ELECTRONIC file management, *MATHEMATICAL optimization, *MATHEMATICAL analysis
Abstract: A technique is proposed for keeping a graphlike data structure near its optimum form; such global optimization is obtained as the cumulative result of simple local operations. Each operation involves only a small number of neighboring vertices and preserves the consistency of the data structure. The technique is demonstrated by means of two examples: the recording of equivalence classes and the construction of a binary search tree. In the examples the optimizing operations are executed "on the fly" while the data structure is being accessed. Hence optimization of the data structure is obtained as a side effect of its normal use. The derived solutions are comparable in efficiency to known solutions, but they allow concurrent access to the data structure. [ABSTRACT FROM AUTHOR]
Published: 1983
Full Text: View/download PDF

34. File Archival Techniques Using Data Compression.

Author: Pechura, Michael
Subjects: *DATA compression, *ELECTRONIC file management, *DATABASE management, *COMPUTER programming, *COMPUTER industry, *COMPUTER systems
Abstract: The performance of most small computer systems is determined, to a large extent, by the characteristics of their mass storage devices. Data compression can expand the storage capacity of such devices and also slightly increase their speed. Here, a summary of one method of data compression using Huffman variable length coding is presented, along with statistics of effectiveness for various file types and practical techniques for integrating such methods into a small computer system. [ABSTRACT FROM AUTHOR]
Published: 1982
Full Text: View/download PDF

35. Designing a Bloom Filter for Differential File Access.

Author: Gremilion, Lee L.
Subjects: *ELECTRONIC file management, *DATABASES, *COMPUTER files, *COMPUTERS, *INFORMATION technology, *COMPUTER systems
Abstract: The use of a differential file for a database update can yield integrity and performance benefits, but It can also present problems in providing current data to subsequent accessing transactions. A mechanism known as a Bloom filter can solve these problems by preventing most unnecessary searches of the differential file. Here, the design process for a Bloom filter for an online student database is described, and it is shown that a very effective filter can be constructed with a modest expenditure of system resources. [ABSTRACT FROM AUTHOR]
Published: 1982
Full Text: View/download PDF

36. Estimating Block Accesses and Number of Records in File Management.

Author: To-Yat Cheung and Schwetman, Herbert D.
Subjects: *ELECTRONIC file management, *RECORDS management, *DATABASE management, *INFORMATION resources management, *ESTIMATION theory, *NUMERICAL analysis
Abstract: We consider the problems of estimating the number of secondary storage blocks and the number of distinct records accessed when a transaction consisting of possibly duplicate requested records is presented to a file management system. Our main results include (1) a new formula for block access estimation for the case where the requested records may have duplications and their ordering is immaterial and (2) a simple formula for estimating the number of distinct records in the transaction. [ABSTRACT FROM AUTHOR]
Published: 1982
Full Text: View/download PDF

37. Reciprocal Hashing: A Method for Generating Minimal Perfect Hashing Functions.

Author: Jaeschke, G. and McIlroy, D.
Subjects: *HASHING, *ELECTRONIC file management, *RECIPROCALS (Mathematics), *MATHEMATICAL functions, *COMPUTER programming, *COMPUTER algorithms
Abstract: A method is presented for building minimal perfect hash functions, i.e., functions which allow single probe retrieval from minimally sized tables of identifier sets. A proof of existence for minimal perfect hash functions of a special type (reciprocal type) is given. Two algorithms for determining hash functions of reciprocal type are presented and their practical limitations are discussed. Further some application results are given and compared with those of earlier approaches for perfect hashing. [ABSTRACT FROM AUTHOR]
Published: 1981
Full Text: View/download PDF

38. Comparison of Synonym Handling and Bucket Organization Methods.

Author: Quittner, P., Csoka, S., Halasz, S., Kotsis, D., and Varnai, K.
Subjects: *ELECTRONIC file management
Abstract: Focuses on the comparison between synonym handling and bucket organization methods of files. Guidelines for determining bucket sizes; Optimal bucket size; Reduction on the number of synonyms.
Published: 1981
Full Text: View/download PDF

39. Technical correspondence.

Author: Pollitzer, Gustovo A, Peterson, Wesley, and Dwyer, Barry
Subjects: *LETTERS to the editor, *ELECTRONIC file management, *ALGORITHMS, *STRUCTURED programming, *ELECTRONIC data processing, *COMPUTER programming
Abstract: Presents letters to the editor referencing articles and topics published in earlier issues of the journal "Communications of the ACM." Focuses on issues relating updating of master files; Information on algorithms used in updating process; Introduction on structured programming in everyday activities; Alternatives used in processing the different file records.
Published: 1981

40. Long Term File Migration: Development and Evaluation of Algorithms.

Author: Smith, Alan Jay and Hayes, J. P.
Subjects: *ALGORITHMS, *ELECTRONIC file management, *COMPUTER storage devices, *DATABASE management, *COMPUTER programming
Abstract: The steady increase in the power and complexity of modern computer systems has encouraged the implementation of automatic file migration systems which move files dynamically between mass storage devices and disk in response to user reference patterns. Using information describing 13 months of user disk data set file references, we develop and evaluate (replacement) algorithms for the selection of files to be moved from disk to mass storage. Our approach is general and demonstrates a general methodology for this type of problem. We find that algorithms based on both the file size and the time since the file was last used work well. The best realizable algorithms tested condition on the empirical distribution of the times between file references. Acceptable results are also obtained by selecting for replacement that file whose size times time to most recent reference is maximal. Comparisons are made with a number of standard algorithms developed for paging, such as Working Set, VMIN, and GOPT. Sufficient information (parameter values, fitted equations) is provided so that our algorithms may be easily implemented on other systems. [ABSTRACT FROM AUTHOR]
Published: 1981
Full Text: View/download PDF

41. Partial-Match Retrieval Using Indexed Descriptor Files.

Author: Pfaltz, John L., Berman, William J., Cagley, Edgar M., and McIlroy, M. D.
Subjects: *INFORMATION retrieval, *DATABASES, *COMPUTER files, *ELECTRONIC information resources, *ELECTRONIC file management, *ELECTRONIC data processing
Abstract: In this paper we describe a practical method of partial-match retrieval in very large data files. A binary code word, called a descriptor, is associated with each record of the file. These record descriptors are then used to form a derived descriptor for a block of several records, which will serve as an index for the block as a whole; hence, the name "indexed descriptor files." First the structure of these files is described and a simple, efficient retrieval algorithm is presented. Then its expected behavior, in terms of storage accesses, is analyzed in detail. Two different tile creation procedures are sketched, and a number of ways in which the file organization can be "tuned" to a particular application are suggested. [ABSTRACT FROM AUTHOR]
Published: 1980
Full Text: View/download PDF

42. Hierarchical Binary Search.

Author: Gill, Arthur
Subjects: *DATA structures, *ELECTRONIC file management
Abstract: Evaluates the hierarchical binary search for data structures. Assessment on the performance of hierarchical file organization; Division of data structures of hierarchical search into substructures; Application of two-stage hierarchy techniques; Derivation of the general properties of hierarchical search.
Published: 1980
Full Text: View/download PDF

43. Multidimensional Divide-and-Conquer.

Author: Bentley, Jon Louis
Subjects: *COMPUTER algorithms, *DATA structures, *ALGORITHMS, *ELECTRONIC file management, *COMPUTER programming, *ELECTRONIC data processing
Abstract: Most results in the field of algorithm design are single algorithms that solve single problems. In this paper we discuss multidimensional divide-anti-conquer, an algorithmic paradigm that can be instantiated in many different ways to yield a number of algorithms and data structures for multidimensional problems. We use this paradigm to give best-known solutions to such problems as the ECDF, maximal range searching, closest pair, and all nearest neighbor problems The contributions of the paper are on two levels. On the first level are the particular algorithms and data structures given by applying the paradigm. On the second level is the more novel contribution of this paper a detailed study of an algorithmic paradigm that is specific enough to be described precisely yet general enough to solve a wide variety of problems. [ABSTRACT FROM AUTHOR]
Published: 1980
Full Text: View/download PDF

44. The Use of Normal Multiplication Tables for Information Storage and Retrieval.

Author: Motzkin, Dalia and Montgomery, C. A.
Subjects: *INFORMATION storage & retrieval systems, *CHARACTER sets (Data processing), *ELECTRONIC file management, *COMPUTER algorithms, *COMPUTER storage devices, MULTIPLICATION tables
Abstract: This paper describes a method for the organization and retrieval of attribute based information systems, using the normal multiplication table as a directory for the information system. Algorithms for the organization and retrieval of information are described. This method is particularly suitable for queries requesting a group of information items, all of which possess a particular set of attributes (and possibly some other attributes as well). Several examples are given; the results with respect to the number of disk accesses and disk space are compared to other common approaches. Algorithms evaluating the appropriateness of the above approach to a given information system are described. For a certain class of information systems, the normal multiplication table method yields far more rapid retrieval with a more economical space requirement than conventional systems. Moreover this method incorporates an improved modification of the inverted file technique. [ABSTRACT FROM AUTHOR]
Published: 1979
Full Text: View/download PDF

45. Power Trees.

Author: Luccio, Fabrizio and Pagli, Linda
Subjects: *TREE graphs, *ELECTRONIC file management
Abstract: Examines the introduction of a class of families of binary trees where height balance is maintained only for given sets of nodes. Presentation of a procedure for node insertion; Application of binary search trees for ordered file organization; Importance of height-balance trees.
Published: 1978
Full Text: View/download PDF

46. Pseudochaining in Hash Tables.

Author: Halatsis, Constantine, Philokyprou, George, Graham, S. L., and Rivest, R. L.
Subjects: *HASHING, *ALGORITHMS, *CHARTS, diagrams, etc., *ELECTRONIC file management, *FOUNDATIONS of arithmetic, *STATISTICS
Abstract: This paper presents pseudochaining as a new collision-resolution method. Pseudochaining is half way between open addressing and chaining. It owes its name to the fact that link fields are present in each cell of the hash table which permits "chaining" of the first overflow items in the table. The efficiency of the method is derived and a tradeoff analysis is given. [ABSTRACT FROM AUTHOR]
Published: 1978
Full Text: View/download PDF

47. A Data Structure for Manipulating Priority Queues.

Author: Vuillemin, Jean
Subjects: *DATA structures, *ELECTRONIC data processing, *COMPUTER programming, *ELECTRONIC file management, *QUEUING theory, *BINOMIAL distribution
Abstract: The article presents a data structure for manipulating priority queues. A variety of applications directly require using priority queues--job scheduling, discrete simulation languages where labels represent the time at which events are to occur, as well as various sorting problems. Priority queues also play a central role in several good algorithms, such as optimal code constructions. The author describes the underlying combinatorial structure, called binomial trees. Binomial trees and forests appear in various data manipulation problems. Although the discussion in the article is an adequate presentation of the priority queue primitives for a very abstract machine model, it does not indicate how to actually code these algorithms on a digital computer. One still has to solve some problems, the first of which concerns the machine representation of labeled binomial forests. The data structure seems to be interesting in itself because of its conceptual simplicity and of the connections it establishes among various data manipulation problems.
Published: 1978
Full Text: View/download PDF

48. List Processing in Real Time on a Serial Computer.

Author: Baker, Jr., Henry G.
Subjects: *LIST processing (Electronic computers), *ELECTRONIC data processing, *COMPUTER programming, *ELECTRONIC file management, *DATABASE management, *REAL-time programming
Abstract: This article focuses on problems related to classical list processing techniques. Using the method given in the article, a computer could have list processing primitives built in as machine instructions and the programmer would still be assured that each instruction would finish in a reasonable amount of time. A real-time list processing system has the property that the time required by each of the elementary operations is bounded by a constant independent of the number of cells in use. This property does not guarantee that the constant will be small enough for a particular application on a particular computer, and hence has been called "pseudo-real-time" by some. In all but the most demanding applications, the proper choice of hardware can reduce the constants to acceptable values. The model described in the article consists of a memory, that is, a one-dimensional array of words, each of which is large enough to hold the representation of a non-negative integer which is an index into that array; and a central processing unit, which has a small fixed number of registers the size of a word.
Published: 1978
Full Text: View/download PDF

49. A Technique for Isolating Differences Between Files.

Author: Heckel, Paul
Subjects: *COMPUTER algorithms, *TEXT files, *HASHING, *ELECTRONIC file management, *ELECTRONIC records, *COMPUTER files
Abstract: The article presents a computer algorithm for isolating the differences between two text files. The technique described is prone to detecting false differences. It can be modified by using a hashcode as a surrogate for the characters in a line. Hashcoding buys greater speed, program simplicity and storage efficiency at the cost of letting some changes go undetected. A good hashcoding algorithm with a large number of potential hashcodes will keep the number of undetected changes small. This technique can be used on very large files. The symbol table is the only randomly accessed data structure that grows with the size of the files. The size of each symbol can be reduced to two bits. One seems to face a choice between a hashcode size having a large number of bits and minimizing undetected changes and one that has a small number and allows direct addressing of the hash table entry. However, one can get the advantages of both by using a double hashcode technique. The number of extra entries required for such a symbol table depends on the number of differences in the files being compared.
Published: 1978
Full Text: View/download PDF

50. Self-Assessment Procedure IV.

Subjects: *EVALUATION, *EXAMINATIONS, *ELECTRONIC file management, *DECISION logic tables, *PROGRAM development (Education), *DATABASE management, *DATA protection, *ELECTRONIC data processing
Abstract: The article presents the Self-Assessment Procedure IV. The procedure deals with tools and methods of program development, decision tables, data integrity, file organization, and processing. This is the fourth self-assessment procedure submitted by the ACM. The self-assessment procedure is devised to help a person appraise and develop his knowledge about a specific topic. It is also intended for educational experience. A list of further readings is provided along with the answers to the appraisal questions. The self-assessment procedure is presented and comments from participants are welcome.
Published: 1978
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

232 results on '"ELECTRONIC file management"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources