Descriptor: "Cache efficient" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cache efficient"' showing total 9 results

Start Over Descriptor "Cache efficient"

9 results on '"Cache efficient"'

1. String correction using the Damerau-Levenshtein distance

Author: Chunchun Zhao and Sartaj Sahni
Subjects: Edit distance, Damerau-Levenshtein distance, Cache efficient, String correction, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background In the string correction problem, we are to transform one string into another using a set of prescribed edit operations. In string correction using the Damerau-Levenshtein (DL) distance, the permissible edit operations are: substitution, insertion, deletion and transposition. Several algorithms for string correction using the DL distance have been proposed. The fastest and most space efficient of these algorithms is due to Lowrance and Wagner. It computes the DL distance between strings of length m and n, respectively, in O(m n) time and O(m n) space. In this paper, we focus on the development of algorithms whose asymptotic space complexity is less and whose actual runtime and energy consumption are less than those of the algorithm of Lowrance and Wagner. Results We develop space- and cache-efficient algorithms to compute the Damerau-Levenshtein (DL) distance between two strings as well as to find a sequence of edit operations of length equal to the DL distance. Our algorithms require O(s min{m,n}+m+n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(m n) space. The space- and cache-efficient algorithms of this paper are demonstrated, experimentally, to be superior to earlier algorithms for the DL distance problem on time, space, and enery metrics using three different computational platforms. Conclusion Our benchmarking shows that, our algorithms are able to handle much larger sequences than earlier algorithms due to the reduction in space requirements. On a single core, we are able to compute the DL distance and an optimal edit sequence faster than known algorithms by as much as 73.1% and 63.5%, respectively. Further, we reduce energy consumption by as much as 68.5%. Multicore versions of our algorithms achieve a speedup of 23.2 on 24 cores.
Published: 2019
Full Text: View/download PDF

2. Cache and energy efficient algorithms for Nussinov’s RNA Folding

Author: Chunchun Zhao and Sartaj Sahni
Subjects: RNA Folding, Nussinov’s algorithm, Cache efficient, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov’s classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Results Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov’s RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms–Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2–using two programming languages–C and Java–show that our cache efficient algorithms are also efficient in terms of run time and energy. Conclusion Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Published: 2017
Full Text: View/download PDF

3. String correction using the Damerau-Levenshtein distance.

Author: Zhao, Chunchun and Sahni, Sartaj
Subjects: *DELETION mutation, *DNA insertion elements, *CHROMOSOMAL translocation, *ALGORITHMS, *ENERGY consumption
Abstract: Background: In the string correction problem, we are to transform one string into another using a set of prescribed edit operations. In string correction using the Damerau-Levenshtein (DL) distance, the permissible edit operations are: substitution, insertion, deletion and transposition. Several algorithms for string correction using the DL distance have been proposed. The fastest and most space efficient of these algorithms is due to Lowrance and Wagner. It computes the DL distance between strings of length m and n, respectively, in O(mn) time and O(mn) space. In this paper, we focus on the development of algorithms whose asymptotic space complexity is less and whose actual runtime and energy consumption are less than those of the algorithm of Lowrance and Wagner. Results: We develop space- and cache-efficient algorithms to compute the Damerau-Levenshtein (DL) distance between two strings as well as to find a sequence of edit operations of length equal to the DL distance. Our algorithms require O(s min{m,n}+m+n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(mn) space. The space- and cache-efficient algorithms of this paper are demonstrated, experimentally, to be superior to earlier algorithms for the DL distance problem on time, space, and enery metrics using three different computational platforms. Conclusion: Our benchmarking shows that, our algorithms are able to handle much larger sequences than earlier algorithms due to the reduction in space requirements. On a single core, we are able to compute the DL distance and an optimal edit sequence faster than known algorithms by as much as 73.1% and 63.5%, respectively. Further, we reduce energy consumption by as much as 68.5%. Multicore versions of our algorithms achieve a speedup of 23.2 on 24 cores. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

4. ACME: A scalable parallel system for extracting frequent patterns from a very long sequence.

Author: Sahli, Majed, Mansour, Essam, and Kalnis, Panos
Abstract: Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

5. Dynamic Programming for Improving Cache Efficiency.

Author: Korde, P. S. and Khanale, P. B.
Subjects: DYNAMIC programming, CACHE memory, MATHEMATICAL optimization, MATHEMATICAL programming, ALGORITHMS
Abstract: A dynamic programming is a solution for special type of application such as context free language recognition matrix chain multiplication and optimal binary trees. Unfortunately these implementation exhibit poor cache performances. The purpose of this paper is to design, implement empirically evaluates divide and conquer algorithm that exhibit cache performance problem. [ABSTRACT FROM AUTHOR]
Published: 2012

6. Cache and energy efficient algorithms for Nussinov’s RNA Folding

Author: Sartaj Sahni and Chunchun Zhao
Subjects: 0301 basic medicine, RNA Folding, Computer science, PowerPC, Nussinov’s algorithm, 02 engineering and technology, Parallel computing, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Cache efficient, 03 medical and health sciences, Structural Biology, Transpose, 0202 electrical engineering, electronic engineering, information engineering, Molecular Biology, Cache algorithms, lcsh:QH301-705.5, Research, Applied Mathematics, Computational Biology, 020206 networking & telecommunications, Energy consumption, Folding (DSP implementation), Computer Science Applications, 030104 developmental biology, lcsh:Biology (General), Nucleic Acid Conformation, RNA, lcsh:R858-859.7, Cache, Algorithms, Energy (signal processing), Efficient energy use
Abstract: Background An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov’s classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Results Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov’s RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms–Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2–using two programming languages–C and Java–show that our cache efficient algorithms are also efficient in terms of run time and energy. Conclusion Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Published: 2017

7. String correction using the Damerau-Levenshtein distance

Author: Sartaj Sahni and Chunchun Zhao
Subjects: Speedup, Time Factors, String correction, Edit distance, Space (mathematics), lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Reduction (complexity), Cache efficient, 03 medical and health sciences, 0302 clinical medicine, Structural Biology, Damerau–Levenshtein distance, Damerau-Levenshtein distance, Molecular Biology, lcsh:QH301-705.5, 030304 developmental biology, Mathematics, Discrete mathematics, 0303 health sciences, Sequence, Base Sequence, Applied Mathematics, Research, String (computer science), Computational Biology, Computer Science Applications, lcsh:Biology (General), 030220 oncology & carcinogenesis, lcsh:R858-859.7, Single-core, Algorithms
Abstract: Background In the string correction problem, we are to transform one string into another using a set of prescribed edit operations. In string correction using the Damerau-Levenshtein (DL) distance, the permissible edit operations are: substitution, insertion, deletion and transposition. Several algorithms for string correction using the DL distance have been proposed. The fastest and most space efficient of these algorithms is due to Lowrance and Wagner. It computes the DL distance between strings of length m and n, respectively, in O(m n) time and O(m n) space. In this paper, we focus on the development of algorithms whose asymptotic space complexity is less and whose actual runtime and energy consumption are less than those of the algorithm of Lowrance and Wagner. Results We develop space- and cache-efficient algorithms to compute the Damerau-Levenshtein (DL) distance between two strings as well as to find a sequence of edit operations of length equal to the DL distance. Our algorithms require O(s min{m,n}+m+n) space, where s is the size of the alphabet and m and n are, respectively, the lengths of the two strings. Previously known algorithms require O(m n) space. The space- and cache-efficient algorithms of this paper are demonstrated, experimentally, to be superior to earlier algorithms for the DL distance problem on time, space, and enery metrics using three different computational platforms. Conclusion Our benchmarking shows that, our algorithms are able to handle much larger sequences than earlier algorithms due to the reduction in space requirements. On a single core, we are able to compute the DL distance and an optimal edit sequence faster than known algorithms by as much as 73.1% and 63.5%, respectively. Further, we reduce energy consumption by as much as 68.5%. Multicore versions of our algorithms achieve a speedup of 23.2 on 24 cores.
Published: 2019

8. Simulation de la dynamique des dislocations à très grande échelle

Author: Etcheverry, Arnaud, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux, and Olivier Coulaud
Subjects: Dynamique des dislocations, Memory hierarchie, [INFO.INFO-OH]Computer Science [cs]/Other [cs.OH], Parallélisme hybride, Structure de données, Cache efficient, Problème à N-corps, Shared memory, Méthode multipôle rapide, Scalabilité, N-body problem, Mémoire distribuée, Mémoire partagée, Dislocation dynamics, Fast Multipol method, OpenMP task, Scalability, Parallelism, OpenMP, Hybrid, Distributed memory, Data structure, MPI, Simulation, Hiérarchie mémoire, 3D
Abstract: This research work focuses on bringing performances in 3D dislocation dynamics simulation, to run efficiently on modern computers. First of all, we introduce some algorithmic technics, to reduce the complexity in order to target large scale simulations. Second of all, we focus on data structure to take into account both memory hierachie and algorithmic data access. On one side we build this adaptive data structure to handle dynamism of data and on the other side we use an Octree to combine hierachie decompostion and data locality in order to face intensive arithmetics with force field computation and collision detection. Finnaly, we introduce some parallel aspects of our simulation. We propose a classical hybrid parallelism, with task based openMP threads and domain decomposition technics for MPI.; Le travail réalisé durant cette thèse vise à offrir à un code de simulation en dynamique des dislocations les composantes essentielles pour permettre le passage à l’échelle sur les calculateurs modernes. Nous abordons plusieurs aspects de la simulation numérique avec tout d’abord des considérations algorithmiques. Pour permettre de réaliser des simulations efficaces en terme de complexité algorithmique pour des grandes simulations, nous explorons les contraintes des différentes étapes de la simulation en offrant une analyse et des améliorations aux algorithmes. Ensuite, une considération particulière est apportée aux structures de données. En prenant en compte les nouveaux algorithmes, nous proposons une structure de données pour bénéficier d’accès performants à travers la hiérarchie mémoire. Cette structure est modulaire pour faire face à deux types d’algorithmes, avec d’un côté la gestion du maillage nécessitant une gestion dynamique de la mémoire et de l’autre les phases de calcul intensifs avec des accès rapides. Pour cela cette structure modulaire est complétée par un octree pour gérer la décomposition de domaine et aussi les algorithmes hiérarchiques comme le calcul du champ de contrainte et la détection des collisions. Enfin nous présentons les aspects parallèles du code. Pour cela nous introduisons une approche hybride, avec un parallélisme à grain fin à base de threads, et un parallélisme à gros grain de type MPI nécessitant une décomposition de domaine et un équilibrage de charge.Finalement, ces contributions sont testées pour valider les apports pour la simulation numérique. Deux cas d’étude sont présentés pour observer et analyser le comportement des différentes briques de la simulation. Tout d’abord une simulation extrêmement dynamique, composée de sources de Frank-Read dans un cristal de zirconium est utilisée, avant de présenter quelques résultats sur une simulation cible contenant une forte densité de défauts d’irradiation.
Published: 2015

9. Cache and energy efficient algorithms for Nussinov's RNA Folding.

Author: Zhao C and Sahni S
Subjects: Nucleic Acid Conformation, Algorithms, Computational Biology methods, RNA chemistry, RNA metabolism, RNA Folding
Abstract: Background: An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses., Results: Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy., Conclusion: Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Cache efficient"'

1. String correction using the Damerau-Levenshtein distance

2. Cache and energy efficient algorithms for Nussinov’s RNA Folding

3. String correction using the Damerau-Levenshtein distance.

4. ACME: A scalable parallel system for extracting frequent patterns from a very long sequence.

5. Dynamic Programming for Improving Cache Efficiency.

6. Cache and energy efficient algorithms for Nussinov’s RNA Folding

7. String correction using the Damerau-Levenshtein distance

8. Simulation de la dynamique des dislocations à très grande échelle

9. Cache and energy efficient algorithms for Nussinov's RNA Folding.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Cache efficient"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources