Author: "Perais, Arthur" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Perais, Arthur"' showing total 33 results

Start Over Author "Perais, Arthur"

33 results on '"Perais, Arthur"'

1. Branch Target Buffer Organizations

Author: Perais, Arthur, primary and Sheikh, Rami, additional
Published: 2023
Full Text: View/download PDF

2. Rebasing Microarchitectural Research with Industry Traces

Author: Feliu, Josué, primary, Perais, Arthur, additional, Jiménez, Daniel A., additional, and Ros, Alberto, additional
Published: 2023
Full Text: View/download PDF

3. Toward Practical 128-Bit General Purpose Microarchitectures

Author: Deshpande, Chandana S., primary, Perais, Arthur, additional, and Pétrot, Frédéric, additional
Published: 2023
Full Text: View/download PDF

4. 128-bit Addresses for the Masses (of Memory and Devices)

Author: Bacou, Mathieu, Chader, Adam, Deshpande, Chandana S., Fabre, Christian, Fuguet, César, Michaud, Pierre, Perais, Arthur, Pétrot, Frédéric, Thomas, Gaël, Tomasi Ribeiro, Eduardo, Télécom SudParis (TSP), System Level Synthesis (TIMA-SLS), Techniques de l'Informatique et de la Microélectronique pour l'Architecture des systèmes intégrés (TIMA), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Département Systèmes et Circuits Intégrés Numériques (DSCIN), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
Subjects: PACS 85.42, [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics
Abstract: International audience; The ever growing storage and memory needs in computer infrastructures makes 128-bit addresses a possible long-term solution to access vast swaths of data uniformly. In this abstract, we give our thoughts regarding what this would entail from a hardware/software perspective.
Published: 2023

5. We had 64-bit, yes. What about second 64-bit?

Author: Bacou, Mathieu, Chader, Adam, Deshpande, Chandana S., Fabre, Christian, Fuguet, César, Michaud, Pierre, Perais, Arthur, Pétrot, Frédéric, Thomas, Gaël, Tomasi Ribeiro, Eduardo, Télécom SudParis (TSP), System Level Synthesis (TIMA-SLS), Techniques de l'Informatique et de la Microélectronique pour l'Architecture des systèmes intégrés (TIMA), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Département Systèmes et Circuits Intégrés Numériques (DSCIN), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), and Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
Subjects: PACS 85.42, [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics
Abstract: International audience; High-performance architectures are increasingly heterogeneous and incorporate often specialized hardware. We have first seen the generalization of GPUs in the most powerful machines, followed by FPGAs, and now by many other accelerators such as Tensor Processor Units (TPUs) for Deep Neural Networks, or variable precision FPUs. Recent hardware manufacturing trends make it very likely that specialization will not only persist, but increase. Manually managing this heterogeneity is complex and not maintainable. We therefore propose to revisit how we design both hardware and OS in order to better hide the heterogeneity. To ensure long term viability of our proposal, we propose to entertain the use of 128-bit addressing.
Published: 2023

6. Exploring Instruction Fusion Opportunities in General Purpose Processors

Author: Singh, Sawan, primary, Perais, Arthur, additional, Jimborean, Alexandra, additional, and Ros, Alberto, additional
Published: 2022
Full Text: View/download PDF

7. Free atomics

Author: Asgharzadeh, Ashkan, primary, Cebrian, Juan M., additional, Perais, Arthur, additional, Kaxiras, Stefanos, additional, and Ros, Alberto, additional
Published: 2022
Full Text: View/download PDF

8. Free Atomics : Hardware Atomic Operations without Fences

Author: Asgharzadeh, Ashkan, Cebrian, Juan M., Perais, Arthur, Kaxiras, Stefanos, Ros, Alberto, Asgharzadeh, Ashkan, Cebrian, Juan M., Perais, Arthur, Kaxiras, Stefanos, and Ros, Alberto
Abstract: Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation, current x86 implementations serialize atomic RMW operations, i.e., the store buffer is drained before issuing atomic RMWs and subsequent memory operations are stalled until the atomic RMW commits. This serialization, carried out by memory fences, incurs a performance cost which is expected to increase with deeper pipelines. This work proposes Free atomics, a lightweight, speculative, deadlock-free implementation of atomic operations that removes the need for memory fences, thus improving performance, while preserving atomicity and consistency. Free atomics is, to the best of our knowledge, the first proposal to enable store-to-load forwarding for atomic RMWs. Free atomics only requires simple modifications and incurs a small area overhead (15 bytes). Our evaluation using gem5-20 shows that, for a 32-core configuration, Free atomics improves performance by 12.5%, on average, for a large range of parallel workloads and 25.2%, on average, for atomic-intensive parallel workloads over a fenced atomic RMW implementation.
Published: 2022
Full Text: View/download PDF

9. Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential

Author: Perais, Arthur, primary
Published: 2021
Full Text: View/download PDF

10. A Case for Speculative Strength Reduction

Author: Perais, Arthur, primary
Published: 2021
Full Text: View/download PDF

11. Take A Way: Exploring the Security Implications of AMD's Cache Way Predictors

Author: Lipp, Moritz, primary, Hadžić, Vedad, additional, Schwarz, Michael, additional, Perais, Arthur, additional, Maurice, Clémentine, additional, and Gruss, Daniel, additional
Published: 2020
Full Text: View/download PDF

12. Elastic Instruction Fetching

Author: Perais, Arthur, primary, Sheikh, Rami, additional, Yen, Luke, additional, McIlvaine, Michael, additional, and Clancy, Robert D., additional
Published: 2019
Full Text: View/download PDF

13. Cost effective speculation with the omnipredictor

Author: Perais, Arthur, primary and Seznec, André, additional
Published: 2018
Full Text: View/download PDF

14. SPF: Selective Pipeline Flush

Author: Kothinti Naresh, Vignyan Reddy, primary, Sheikh, Rami, additional, Perais, Arthur, additional, and Cain, Harold W., additional
Published: 2018
Full Text: View/download PDF

15. Storage-Free Memory Dependency Prediction

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2017
Full Text: View/download PDF

16. On the Interactions Between Value Prediction and Compiler Optimizations in the Context of EOLE

Author: Endo, Fernando A., primary, Perais, Arthur, additional, and Seznec, André, additional
Published: 2017
Full Text: View/download PDF

17. La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires

Author: Perais, Arthur, Pushing Architecture and Compilation for Application Performance (PACAP), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-ARCHITECTURE (IRISA-D3), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Université de Rennes, André Seznec, ARCHITECTURE (IRISA-D3), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), and Université Rennes 1
Subjects: Exécution dans le désordre, [INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR], High performance processors, Exécution spéculative, Value prediction, Superscalar processors, Out-Of-Order execution, Architecture des processeurs, Processeurs à hautes performances, Processeurs superscalaires, Speculative execution, Prédiction de valeurs, Processor architecture
Abstract: Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient.; Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante.
Published: 2015

18. Register sharing for equality prediction

Author: Perais, Arthur, primary, Endo, Fernando A., additional, and Seznec, Andre, additional
Published: 2016
Full Text: View/download PDF

19. EOLE

Author: Perais, Arthur, primary and Seznec, André, additional
Published: 2016
Full Text: View/download PDF

20. Cost effective physical register sharing

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2016
Full Text: View/download PDF

21. Long Term Parking (LTP) : Criticality-aware Resource Allocation in OOO Processors

Author: Sembrant, Andreas, Carlson, Trevor E., Hagersten, Erik, Black-Schaffer, David, Perais, Arthur, Seznec, André, Michaud, Pierre, Sembrant, Andreas, Carlson, Trevor E., Hagersten, Erik, Black-Schaffer, David, Perais, Arthur, Seznec, André, and Michaud, Pierre
Abstract: Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically allocated to instructions in program order. This wastes resources by allocating resources to instructions that are not yet ready to be executed and by eagerly allocating resources to instructions that are not part of the application’s critical path. This work explores the possibility of allocating pipeline resources only when needed to expose MLP, and thereby enabling a processor design with significantly smaller structures, without sacrificing performance. First we identify the classes of instructions that should not reserve resources in program order and evaluate the potential performance gains we could achieve by delaying their allocations. We then use this information to “park” such instructions in a simpler, and therefore more efficient, Long Term Parking (LTP) structure. The LTP stores instructions until they are ready to execute, without allocating pipeline resources, and thereby keeps the pipeline available for instructions that can generate further MLP. LTP can accurately and rapidly identify which instructions to park, park them before they execute, wake them when needed to preserve performance, and do so using a simple queue instead of a complex IQ. We show that even a very simple queue-based LTP design allows us to significantly reduce IQ (64 →32) and register file (128→96) sizes while retaining MLP performance and improving energy efficiency., UPMARC, UART
Published: 2015
Full Text: View/download PDF

22. Cost-effective speculative scheduling in high performance processors

Author: Perais, Arthur, Seznec, André, Michaud, Pierre, Sembrant, Andreas, Hagersten, Erik, Perais, Arthur, Seznec, André, Michaud, Pierre, Sembrant, Andreas, and Hagersten, Erik
Abstract: To maximize performance, out-of-order execution processors sometimes issue instructions without having the guarantee that operands will be available in time; e.g. loads are typically assumed to hit in the L1 cache and dependent instructions are issued accordingly. This form of speculation - that we refer to as speculative scheduling - has been used for two decades in real processors, but has received little attention from the research community. In particular, as pipeline depth grows, and the distance between the Issue and the Execute stages increases, it becomes critical to issue instructions dependent on variable-latency instructions as soon as possible rather than wait for the actual cycle at which the result becomes available. Unfortunately, due to the uncertain nature of speculative scheduling, the scheduler may wrongly issue an instruction that will not have its source(s) available on the bypass network when it reaches the Execute stage. In that event, the instruction is canceled and replayed, potentially impairing performance and increasing energy consumption. In this work, we do not present a new replay mechanism. Rather, we focus on ways to reduce the number of replays that are agnostic of the replay scheme. First, we propose an easily implementable, low-cost solution to reduce the number of replays caused by L1 bank conflicts. Schedule shifting always assumes that, given a dual-load issue capacity, the second load issued in a given cycle will be delayed because of a bank conflict. Its dependents are thus always issued with the corresponding delay. Second, we also improve on existing L1 hit/miss prediction schemes by taking into account instruction criticality. That is, for some criterion of criticality and for loads whose hit/miss behavior is hard to predict, we show that it is more cost-effective to stall dependents if the load is not predicted critical., UPMARC, UART
Published: 2015
Full Text: View/download PDF

23. Long term parking (LTP)

Author: Sembrant, Andreas, primary, Carlson, Trevor, additional, Hagersten, Erik, additional, Black-Shaffer, David, additional, Perais, Arthur, additional, Seznec, André, additional, and Michaud, Pierre, additional
Published: 2015
Full Text: View/download PDF

24. Exploiting Value Prediction With Quasi-Unlimited Resources

Author: Perais, Arthur, Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES), Amdahl's Law is Forever (ALF), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-ARCHITECTURE (IRISA-D3), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), INRIA-IRISA Rennes Bretagne Atlantique, équipe ALF, and André Seznec
Subjects: [INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR], prédiction de valeurs, Architecture des ordinateurs
Abstract: Recent trends regarding general purpose microprocessors have focused on Thread-Level Parallelism (TLP), and in general, on parallel architectures such as multicores. However, due to Amdahl's law, the gain to be had from the parallelization of a program is limited since there will always be an incompressible sequential part in the program. The execution time of this part only depends on the sequential performance of the processor the program is executed on. Value Prediction was proposed in the late 90's as a way to improve sequential performance by predicting instructions results, allowing the hardware to break data dependencies between instructions and thus extract more Instruction Level Parallelism (ILP) from the code. In the meantime, very accurate Geometric Length indirect branch target predictor such as ITTAGE were proposed. Indirect Branch Target Prediction and Value Prediction exhibit some similarities in concept, which is why we present a value predictor borrowing from both the Geometric Length indirect target branch predictor ITTAGE and existing work in the field of Value Prediction. As transistor budget is not expected to be a problem for future microprocessors, we study the behavior of the Value TAGE predictor for both finite and ''infinite'' sizes. We evaluate VTAGE performance on standard integer and floating-point workloads as well as on vectorized code.
Published: 2012

25. Cost-effective speculative scheduling in high performance processors

Author: Perais, Arthur, primary, Seznec, André, additional, Michaud, Pierre, additional, Sembrant, Andreas, additional, and Hagersten, Erik, additional
Published: 2015
Full Text: View/download PDF

26. EOLE: Toward a Practical Implementation of Value Prediction

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2015
Full Text: View/download PDF

27. BeBoP: A cost effective predictor infrastructure for superscalar value prediction

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2015
Full Text: View/download PDF

28. EOLE

Author: Perais, Arthur, primary and Seznec, André, additional
Published: 2014
Full Text: View/download PDF

29. EOLE: Paving the way for an effective implementation of value prediction

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2014
Full Text: View/download PDF

30. Practical data value speculation for future high-end processors

Author: Perais, Arthur, primary and Seznec, Andre, additional
Published: 2014
Full Text: View/download PDF

31. EOLE.

Author: Perais, Arthur and Seznec, André
Published: 2014

32. Free atomics

Author: Asgharzadeh, Ashkan, Cebrian, Juan M., Perais, Arthur, Kaxiras, Stefanos, and Ros, Alberto
Full Text: View/download PDF

33. Revisiting Value Prediction

Author: Arthur Perais, Andre Seznec, Perais, Arthur, Amdahl's Law is Forever (ALF), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-ARCHITECTURE (IRISA-D3), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), ERC DAL, INRIA, CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
Subjects: [INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR], [INFO.INFO-AR] Computer Science [cs]/Hardware Architecture [cs.AR], Value Prediction, hybrid predictors, VTAGE
Abstract: Value prediction was proposed in the mid 90's to enhance the performance of high-end microprocessors. Unfortunately, to the best of our knowledge, there are no Value Prediction implementations available on the market. Moreover, the research on Value Prediction techniques almost vanished in the early 2000's as it was more effective to increase the number of cores than to dedicate silicon to Value Prediction. However, high-end processor chips currently feature 8-16 high-end cores and the technology will allow to implement 50-100 of such cores on a single die in a foreseeable future. Amdahl's law suggests that the performance of most workloads will not scale to that level. Therefore, dedicating more silicon area to single high-end core will be considered as worthwhile for future multicores, either in the context of heterogeneous multicores or homogeneous multicore. In particular, spending transistors on specialized, performance and/or power optimized units, such as a value predictor. In this report, we first build on the concept of value prediction. We introduce a new value predictor VTAGE harnessing the global branch history. VTAGE directly inherits the structure of the indirect jump predictor ITTAGE. We show that VTAGE is able to predict with a very high accuracy many values that were not correctly predicted by previously proposed predictors, such as the FCM predictor and the stride predictor. Compared with these previously proposed solutions, VTAGE can accommodate very long prediction latencies. The introduction of VTAGE opens the path to the design of new hybrid predictors. Three sources of information can be harnessed by these predictors: the global branch history, the differences of successive values and the local history of values. We show that the predictor components using these %three sources of information are all amenable to very high accuracy at the cost of some prediction coverage. %On SPEC 2006 Using SPEC 2006 benchmarks, our study shows that with a large hybrid predictor, in average 56.76% of the values can be predicted with a 99.48% accuracy against respectively 55.50% and 98.62% without advanced confidence estimation and the VTAGE component., La prédiction de valeur a été proposée dans les années 90 pour améliorer la performance des processeurs haut de gamme. Malheureusement, à notre connaissance, aucune implémentation n'est disponible sur le marché. De plus, la recherche dédiée aux techniques de prédiction a presque disparue au début des années 2000 car il était plus intéréssant d'augmenter le nombre de coeurs que de dédier du silicium à cette technique. Cependant, les processeurs haut de gamme possèdent de nos jours 8 à 16 coeurs et les progrès technologiques futurs permettront d'implémenter 50 à 100 coeurs similaires aux coeurs actuels sur une seule puce. De plus, la loi d'Amdahl suggère que la performance de la majorité des programmes ne passera pas à l'échelle sur un tel nombre de coeurs. Conséquemment, dédier plus de surface de silicium à un unique coeur haute performance sera considéré comme digne d'intérêt pour les futurs multicoeurs, que ce soit dans le contexte des multicoeurs hétérogènes ou homogènes. En particulier, dépenser des transistors dans des unités optimisées pour la performance et/ou la consommation, tel qu'un prédicteur de valeur. Dans ce rapport, nous commen\c cons par augmenter le concept de prédiction de valeurs. Nous introduisons un nouveau prédicteur de valeur VTAGE tirant parti de l'historique global de branchement. VTAGE hérite directement de la structure du prédicteur de sauts indirects ITTAGE. Nous montrons que VTAGE est capable de prédire avec une très haute précision un grand nombre de valeurs n'étant pas prédites correctement par les prédicteurs proposés précédemment, tels que le prédicteur FCM ou le prédicteur Stride. Contrairement à ces solutions, VTAGE n'est pas impacté par la latence de la prédiction. L'introduction de VTAGE rend aussi possible l'utilisation de nouveaux prédicteurs hybrides. Trois sources d'informations peuvent être utilisées par ces prédicteurs : L'historique global de branchement, la différence entre les valeurs successivement produites et l'historique local des valeurs. Nous montrons que les composants utilisant ces sources d'informations peuvent tous atteindre une très haute précision au prix d'une perte de couverture. En utilisant des benchmarks de la suite SPEC 2006, notre étude montre qu'avec un grand prédicteur hybride, en moyenne 56.76% des valeurs peuvent être prédites avec une précision de 99.48%, contre respectivement 55.50% et 98.62% sans notre méchanisme d'estimation de confiance avancé et VTAGE.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

33 results on '"Perais, Arthur"'

1. Branch Target Buffer Organizations

2. Rebasing Microarchitectural Research with Industry Traces

3. Toward Practical 128-Bit General Purpose Microarchitectures

4. 128-bit Addresses for the Masses (of Memory and Devices)

5. We had 64-bit, yes. What about second 64-bit?

6. Exploring Instruction Fusion Opportunities in General Purpose Processors

7. Free atomics

8. Free Atomics : Hardware Atomic Operations without Fences

9. Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential

10. A Case for Speculative Strength Reduction

11. Take A Way: Exploring the Security Implications of AMD's Cache Way Predictors

12. Elastic Instruction Fetching

13. Cost effective speculation with the omnipredictor

14. SPF: Selective Pipeline Flush

15. Storage-Free Memory Dependency Prediction

16. On the Interactions Between Value Prediction and Compiler Optimizations in the Context of EOLE

17. La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires

18. Register sharing for equality prediction

19. EOLE

20. Cost effective physical register sharing

21. Long Term Parking (LTP) : Criticality-aware Resource Allocation in OOO Processors

22. Cost-effective speculative scheduling in high performance processors

23. Long term parking (LTP)

24. Exploiting Value Prediction With Quasi-Unlimited Resources

25. Cost-effective speculative scheduling in high performance processors

26. EOLE: Toward a Practical Implementation of Value Prediction

27. BeBoP: A cost effective predictor infrastructure for superscalar value prediction

28. EOLE

29. EOLE: Paving the way for an effective implementation of value prediction

30. Practical data value speculation for future high-end processors

31. EOLE.

32. Free atomics

33. Revisiting Value Prediction

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

33 results on '"Perais, Arthur"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources