Author: "Joan-Manuel Parcerisa" / Publisher: institute of electrical and electronics engineers (ieee) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Joan-Manuel Parcerisa"' showing total 4 results

Start Over Author "Joan-Manuel Parcerisa" Publisher institute of electrical and electronics engineers (ieee)

4 results on '"Joan-Manuel Parcerisa"'

1. DTexL: Decoupled raster pipeline for texture locality

Author: Diya Joseph, Juan L. Aragon, Joan-Manuel Parcerisa, Antonio Gonzalez, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Subjects: Scheduling, Texture locality, Cache memory, Low-power, GPU, Graphics, Memòria cau, Caches, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Graphics processing units, Unitats de processament gràfic
Abstract: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/MICRO56248.2022.00028 Contemporary GPU architectures have multiple shader cores and a scheduler that distributes work (threads) among them, focusing on load balancing. These load balancing techniques favor thread distributions that are detrimental to texture memory locality for graphics applications in the L1 Texture Caches. Texture memory accesses make up the majority of the traffic to the memory hierarchy in typical low power graphics architectures. This paper focuses on improving the L1 Texture cache locality by focusing on a new workload scheduler by exploring various methods to group the threads, assign the groups to shader cores and also to reorder threads without violating the correctness of the pipeline. To overcome the resulting load imbalance, we also propose a minor modification in the GPU architecture that helps translate the improvement in cache locality to an improvement in the GPU’s performance. We propose DTexL that envelops these ideas and evaluate it over a benchmark suite of ten commercial games, to obtain a 46.8% decrease in L2 Accesses, a 19.3% increase in performance and a 6.3% decrease in total GPU energy. All this with a negligible overhead. This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the AGAUR grant 2020-FISDU-00287.
Published: 2022

2. Leveraging Register Windows to Reduce Physical Registers to the Bare Minimum

Author: Joan-Manuel Parcerisa, Antonio González, Eduardo Quinones, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Subjects: Instruction register, Memory buffer register, Computer science, Informàtica::Enginyeria del software [Àrees temàtiques de la UPC], File organization (Computer science), Register file, Parallel computing, computer.software_genre, Fitxers informàtics -- Organització, Theoretical Computer Science, Early register release, Memory address register, Control register, Hardware register, Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION, Out-of-order execution, Processor register, Software architecture, FLAGS register, Register renaming, Register window, Stack register, Microarchitecture, Physical register file, Computational Theory and Mathematics, Hardware and Architecture, Status register, Operating system, Programari -- Disseny, Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING, Memory data register, computer, Software, Register windows, Register allocation
Abstract: Register window is an architectural technique that reduces memory operations required to save and restore registers across procedure calls. Its effectiveness depends on the size of the register file. Such register requirements are normally increased for out-of-order execution because it requires registers for the in-flight instructions, in addition to the architectural ones. However, a large register file has an important cost in terms of area and power and may even affect the cycle time. In this paper, we propose a software/hardware early register release technique that leverage register windows to drastically reduce the register requirements, and hence, reduce the register file cost. Contrary to the common belief that out-of-order processors with register windows would need a large physical register file, this paper shows that the physical register file size may be reduced to the bare minimum by using this novel microarchitecture. Moreover, our proposal has much lower hardware complexity than previous approaches, and requires minimal changes to a conventional register window scheme. Performance studies show that the proposed technique can reduce the number of physical registers to the number of logical registers plus one (minimum number to guarantee forward progress) and still achieve almost the same performance as an unbounded register file.
Published: 2010

3. Memory bank predictors

Author: Antonio González, S. Bieschewski, Joan-Manuel Parcerisa, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Subjects: Computer science, Cache coloring, CPU cache, Cache memory, Pipeline burst cache, Memòria cau, Cache pollution, Cache-oblivious algorithm, Non-uniform memory access, Clustered microarchitectures, Write-once, Cache invalidation, Superscalar, Memòria ràpida de treball (Informàtica), Cache algorithms, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Snoopy cache, Hardware_MEMORYSTRUCTURES, Parallel processing (Electronic computers), business.industry, MESI protocol, Processament en paral·lel (Ordinadors), Cache-only memory architecture, Uniform memory access, MESIF protocol, Distributed cache, Microarchitecture, Smart Cache, Memory bank, Memory bank prediction, Computer architecture, Bus sniffing, Hit rate, Page cache, Cache, business, Computer network
Abstract: Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access can help to improve the performance in several ways. One scenario that is likely to become increasingly important is clustered microprocessors with a distributed cache. This work presents a study of different cache bank predictors. We show that effective bank predictors can be implemented with relatively low cost. For instance, a predictor of approximately 4 Kbytes is shown to achieve an average hit rate of 78% for SPECint2000 when used to predict accesses to an 8-bank cache memory in a contemporary superscalar processor. We also show how a predictor can be used to reduce the communication latency caused by memory accesses in a clustered microarchitecture with a distributed cache design.
Published: 2005
Full Text: View/download PDF

4. Improving branch prediction and predicated execution in out-of-order processors

Author: Antonio González, Eduardo Quinones, Joan-Manuel Parcerisa, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
Subjects: Computer science, Branch, Parallel computing, Out of order, computer.software_genre, Instruction set, Branch predication, Instruction sets, Degradation, Hardware, Proposals, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Accuracy, Compilers (Computer programs), Pipelines, Out-of-order execution, Compiladors (Programes d'ordinador), Registers, Branch predictor, Predicate (grammar), Costs, Branch target predictor, Computer aided instruction, Compiler, Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING, computer, Algorithm
Abstract: If-conversion is a compiler technique that reduces the misprediction penalties caused by hard-to-predict branches, transforming control dependencies into data dependencies. Although it is globally beneficial, it has a negative side-effect because the removal of branches eliminates useful correlation information necessary for conventional branch predictors. The remaining branches may become harder to predict. However, in predicated ISAs with a compare-branch model, the correlation information not only resides in branches, but also in compare instructions that compute their guarding predicates. When a branch is removed, its correlation information is still available in its compare instruction. We propose a branch prediction scheme based on predicate prediction. It has three advantages: First, since the prediction is not done on a branch basis but on a predicate define basis, branch removal after if-conversion does not lose any correlation information, so accuracy is not degraded. Second, the mechanism we propose permits using the computed value of the branch predicate when available, instead of the predicted value, thus effectively achieving 100% accuracy on such early-resolved branches. Third, as shown in previous work, the selective predicate prediction is a very effective technique to implement if-conversion on out-of-order processors, since it avoids the problem of multiple register definitions and reduces the unnecessary resource consumption of nullified instructions. Hence, our approach enables a very efficient implementation of if-conversion for an out-of-order processor, with almost no additional hardware cost, because the same hardware is used to predict the predicates of if-converted code and to predict branches without accuracy degradation.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Joan-Manuel Parcerisa"'

1. DTexL: Decoupled raster pipeline for texture locality

2. Leveraging Register Windows to Reduce Physical Registers to the Bare Minimum

3. Memory bank predictors

4. Improving branch prediction and predicated execution in out-of-order processors

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

4 results on '"Joan-Manuel Parcerisa"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources