12 results on '"Soria Pardos, Víctor"'
Search Results
2. GenArchBench: A genomics benchmark suite for arm HPC processors
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, López Villellas, Lorien, Langarita Benítez, Rubén, Badouh, Asaf, Soria Pardos, Víctor, Aguado Puig, Quim, López Paradís, Guillem, Doblas Font, Max, Setoain, Javier, Kim, Chulho, Ono, Makoto, Armejach Sanosa, Adrià, Marco Sola, Santiago, Alastruey Benedé, Jesús, Ibáñez Marín, Pablo, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, López Villellas, Lorien, Langarita Benítez, Rubén, Badouh, Asaf, Soria Pardos, Víctor, Aguado Puig, Quim, López Paradís, Guillem, Doblas Font, Max, Setoain, Javier, Kim, Chulho, Ono, Makoto, Armejach Sanosa, Adrià, Marco Sola, Santiago, Alastruey Benedé, Jesús, Ibáñez Marín, Pablo, and Moretó Planas, Miquel
- Abstract
Arm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held the top position on the Top500 list between June 2020 and June 2022, currently sitting in the fourth position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7 g) is also powered by Arm Graviton3 processors. Projects like European Mont-Blanc and U.S. DOE/NNSA Astra are further examples of Arm irruption in HPC. In parallel, over the last decade, the rapid improvement of genomic sequencing technologies and the exponential growth of sequencing data has placed a significant bottleneck on the computational side. While most genomics applications have been thoroughly tested and optimized for x86 systems, just a few are prepared to perform efficiently on Arm machines. Moreover, these applications do not exploit the newly introduced Scalable Vector Extensions (SVE). This paper presents GenArchBench, the first genome analysis benchmark suite targeting Arm architectures. We have selected computationally demanding kernels from the most widely used tools in genome data analysis and ported them to Arm-based A64FX and Graviton3 processors. Overall, the GenArch benchmark suite comprises 13 multi-core kernels from critical stages of widely-used genome analysis pipelines, including base-calling, read mapping, variant calling, and genome assembly. Our benchmark suite includes different input data sets per kernel (small and large), each with a corresponding regression test to verify the correctness of each execution automatically. Moreover, the porting features the usage of the novel Arm SVE instructions, algorithmic and code optimizations, and the exploitation of Arm-optimized libraries. We present the optimizations implemented in each kernel and a detailed performance evaluation and comparison of their performance on four different HPC machines (i.e., A64FX, Graviton3, Intel Xeon, This work has been partially supported by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contracts PID2019-107255GB-C21, PID2019-105660RB-C21, PID2022136454NB-C22, and TED2021-132634A-I00), by the Generalitat de Catalunya, Spain (contract 2021-SGR-763), by the Gobierno de Aragón (T58_23R research group), by the European Union NextGenerationEU/ PRTR, and by Lenovo BSC Contract-Framework Contract (2020)., Peer Reviewed, Postprint (published version)
- Published
- 2024
3. A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors
- Author
-
Siracusa, Marco, primary, Soria-Pardos, Víctor, additional, Sgherzi, Francesco, additional, Randall, Joshua, additional, Joseph, Douglas J., additional, Moretó Planas, Miquel, additional, and Armejach, Adrià, additional
- Published
- 2023
- Full Text
- View/download PDF
4. DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations
- Author
-
Soria-Pardos, Víctor, primary, Armejach, Adrià, additional, Mück, Tiago, additional, Suárez-Gracia, Dario, additional, Joao, José, additional, Rico, Alejandro, additional, and Moretó, Miquel, additional
- Published
- 2023
- Full Text
- View/download PDF
5. A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Siracusa, Marco, Soria Pardos, Víctor, Sgherzi, Francesco, Randall, Joshua, Joseph, Douglas J., Moretó Planas, Miquel, Armejach Sanosa, Adrià, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Siracusa, Marco, Soria Pardos, Víctor, Sgherzi, Francesco, Randall, Joshua, Joseph, Douglas J., Moretó Planas, Miquel, and Armejach Sanosa, Adrià
- Abstract
This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor workloads running on today’s computing infrastructures. The TMU leverages a novel multi-lane design that enables parallel tensor loading and merging, which naturally pro-duces vector operands that are marshaled into the core for efficient SIMD computation. The TMU supports all the necessary primitives to be tensor-format and tensor-algebra complete. We evaluate the TMU on a simulated multicore system using a broad set of ten-sor algebra workloads, achieving 3.6×, 2.8×, and 4.9× speedups over memory-intensive, compute-intensive, and merge-intensive vectorized software implementations, respectively., This work has been partially supported by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contract PID2019-107255GB-C21), the Generalitat of Catalunya (contract 2021-SGR-00763), the Arm-BSC Center of Excellence, the European HiPEAC Network of Excellence, and the European Processor Initiative (EPI), which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. M. Siracusa has been supported through an FI fellowship [2022FI_B 00969] and V. Soria-Pardos through an FPU fellowship [FPU20-02132]. A. Armejach is a Serra Hunter Fellow., Peer Reviewed, Postprint (author's final draft)
- Published
- 2023
6. Sargantana: an academic SoC RISC-V processor in 22nm FDSOI technology
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Doblas Font, Max, Candón Arenas, Gerard, Carril Gil, Xavier, Dominguez de la Rocha, Marc, Erra, Enric, González Trejo, Alberto, Jiménez, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Oltra Oltra, Josep Angel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas Quiroga, Narcís, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruiz Ramirez, Abraham Josafat, Safadi Figueroa, Hugo Ernesto, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Arreza, Fernando, Figueras Bagué, Roger, Fontova Muste, Pau, Marimon Illana, Joan, Aragonès Cervera, Xavier, Cristal Kestelman, Adrián, Mateo Peña, Diego, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Palomar Pérez, Óscar, Sonmez, Nehir, Unsal, Osman Sabri, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Doblas Font, Max, Candón Arenas, Gerard, Carril Gil, Xavier, Dominguez de la Rocha, Marc, Erra, Enric, González Trejo, Alberto, Jiménez, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Oltra Oltra, Josep Angel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas Quiroga, Narcís, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruiz Ramirez, Abraham Josafat, Safadi Figueroa, Hugo Ernesto, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Arreza, Fernando, Figueras Bagué, Roger, Fontova Muste, Pau, Marimon Illana, Joan, Aragonès Cervera, Xavier, Cristal Kestelman, Adrián, Mateo Peña, Diego, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Palomar Pérez, Óscar, Sonmez, Nehir, Unsal, Osman Sabri, and Valero Cortés, Mateo
- Abstract
This paper describes the Sargantana System on chip (SoC), a 64-bit RISC-V single core processor designed by a number of academic institutions and manufactured in 22 nm FDSOI technology: BSC, UPC, UB, UAB, CIC-IPN and IMB-CNM (CSIC). The SoC includes the processor as well as, among other components, a Phase Locked Loop (PLL) operating up to 2 GHz, interfaces to HyperRAM and a Serdes up to 8 Gbps. The processor has demonstrated experimental correct operation at 800 MHz., The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politécnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN)., Peer Reviewed, Article signat per 48 autors/es: Max Doblas∗, Gerard Candón∗, Xavier Carril∗, Marc Domínguez∗, Enric Erra∗, Alberto González∗, César Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Josep Oltra∗, Julián Pavón∗, Cristóbal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Hugo Safadi∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Fernando Arreza∗, Roger Figueras∗, Pau Fontova-Musté∗, Joan Marimon∗, Ricardo Martínez‡, Sergio Moreno¶, Jordi Sacristán‡, Oscar Alonso¶, Xavier Aragonés§, Adrián Cristal∗, Ángel Diéguez¶, Manuel López¶, Diego Mateo§, Francesc Moll∗§, Miquel Moretó∗§, Oscar Palomar∗, Marco A. Ramírez†, Francesc Serra-Graells∥‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luis Villa† / ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: name.surname@bsc.es; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡Institut de Microelectrònica de Barcelona, IMB-CNM (CSIC), Spain. Email: name.surname@imb-cnm.csic.es; §Universitat Politècnica de Catalunya (UPC), Barcelona, Spain. Email: name.surname@upc.edu; ¶Universitat de Barcelona (UB), Barcelona, Spain. Email: name.surname@ub.edu; ∥Universitat Autònoma de Barcelona (UAB), Barcelona, Spain. Email: name.surname@uab.cat, Postprint (author's final draft)
- Published
- 2023
7. DynAMO: Improving parallelism through dynamic placement of atomic memory operations
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, and Moretó Planas, Miquel
- Abstract
With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09× across all workloads and 1.31× on AMO-intensive applications with respect to executing all AMOs near., This research was supported by the Spanish Ministry of Science and Innovation (MCIN) through contracts [PID2019-107255GB-C21], [TED2021-132634A-I00], and [PID2019-105660RB-C21]; the Generalitat of Catalunya through contract [2021-SGR-00763]; the Government of Aragon [T5820R]; the Arm-BSC Center of Excellence, and the European Processor Initiative (EPI) which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. V. Soria-Pardos has been supported through an FPU fellowship [FPU20-02132]; A. Armejach is a Serra Hunter Fellow and has been partially supported by the Grant [IJCI-2017-33945] funded by MCIN/AEI/10.13039/501100011033; M. Moreto through a Ramón y Cajal fellowship [RYC-2016-21104]., Peer Reviewed, Postprint (author's final draft)
- Published
- 2023
8. DynAMO: Improving parallelism through dynamic placement of atomic memory operations
- Author
-
Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, Moreto Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Barcelona Supercomputing Center
- Subjects
Atomic memory operations ,Parallel processing (Electronic computers) ,Processament en paral·lel (Ordinadors) ,Sistemes monoxip ,Systems on a chip ,Multi-core architectures ,Data placement ,Microarchitecture ,Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC] - Abstract
With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09× across all workloads and 1.31× on AMO-intensive applications with respect to executing all AMOs near. This research was supported by the Spanish Ministry of Science and Innovation (MCIN) through contracts [PID2019-107255GB-C21], [TED2021-132634A-I00], and [PID2019-105660RB-C21]; the Generalitat of Catalunya through contract [2021-SGR-00763]; the Government of Aragon [T5820R]; the Arm-BSC Center of Excellence, and the European Processor Initiative (EPI) which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. V. Soria-Pardos has been supported through an FPU fellowship [FPU20-02132]; A. Armejach is a Serra Hunter Fellow and has been partially supported by the Grant [IJCI-2017-33945] funded by MCIN/AEI/10.13039/501100011033; M. Moreto through a Ramón y Cajal fellowship [RYC-2016-21104].
- Published
- 2023
9. DVINO: A RISC-V vector processor implemented in 65nm technology
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Cabo Pitarch, Guillem, Candon, Gerard, Carril, Xavier, Doblas Font, Max, Dominguez de la Rocha, Marc, González Trejo, Alberto, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel Israel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Minervini Minervini, Francesco, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas, Narcis, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruíz Ramírez, Abraham Josafat, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Figueras Bagué, Roger, Fontova, Pau, Marimon Illana, Joan, Montabes, Víctor, Cristal Kestelman, Adrián, Hernández Luz, Carles, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Palomar Pérez, Óscar, Rubio Sola, Jose Antonio, Sonmez, Nehir, Unsal, Osman Sabri, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Cabo Pitarch, Guillem, Candon, Gerard, Carril, Xavier, Doblas Font, Max, Dominguez de la Rocha, Marc, González Trejo, Alberto, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel Israel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Minervini Minervini, Francesco, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas, Narcis, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruíz Ramírez, Abraham Josafat, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Figueras Bagué, Roger, Fontova, Pau, Marimon Illana, Joan, Montabes, Víctor, Cristal Kestelman, Adrián, Hernández Luz, Carles, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Palomar Pérez, Óscar, Rubio Sola, Jose Antonio, Sonmez, Nehir, Unsal, Osman Sabri, and Valero Cortés, Mateo
- Abstract
This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology., The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN)., Peer Reviewed, Article signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: name.surname@bsc.es; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: name.surname@imb-cnm.csic.es; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: name.surname@upc.edu, Postprint (author's final draft)
- Published
- 2022
10. Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Doblas Font, Max, López Paradís, Guillem, Candón Arenas, Gerard, Rodas Quiroga, Narcís, Carril Gil, Xavier, Fontova Muste, Pau, Leyva Santes, Neiel Israel, Marco-Sola, Santiago, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Doblas Font, Max, López Paradís, Guillem, Candón Arenas, Gerard, Rodas Quiroga, Narcís, Carril Gil, Xavier, Fontova Muste, Pau, Leyva Santes, Neiel Israel, Marco-Sola, Santiago, and Moretó Planas, Miquel
- Abstract
The RISC-V open Instruction Set Architecture (ISA) has proven to be a solid alternative to licensed ISAs. In the past 5 years, a plethora of industrial and academic cores and accelerators have been developed implementing this open ISA. In this paper, we present Sargantana, a 64-bit processor based on RISC-V that implements the RV64G ISA, a subset of the vector instructions extension (RVV 0.7.1), and custom application-specific instructions. Sargantana features a highly optimized 7-stage pipeline implementing out-of-order write-back, register renaming, and a non-blocking memory pipeline. Moreover, Sar-gantana features a Single Instruction Multiple Data (SIMD) unit that accelerates domain-specific applications. Sargantana achieves a 1.26 GHz frequency in the typical corner, and up to 1.69 GHz in the fast corner using 22nm FD-SOI commercial technology. As a result, Sargantana delivers a 1.77× higher Instructions Per Cycle (IPC) than our previous 5-stage in-order DVINO core, reaching 2.44 CoreMark/MHz. Our core design delivers comparable or even higher performance than other state-of-the-art academic cores performance under Autobench EEMBC benchmark suite. This way, Sargantana lays the foundations for future RISC-V based core designs able to meet industrial-class performance requirements for scientific, real-time, and high-performance computing applications., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (contract PID2019- 107255GB-C21), by the Generalitat de Catalunya (contract 2017-SGR-1328), by the European Union within the framework of the ERDF of Catalonia 2014-2020 under the DRAC project [001-P-001723], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas and V. Soria-Pardos through a FPU fellowship no. FPU20-04076 and FPU20-02132 respectively. G. Lopez-Paradis has been supported by the Generalitat de Catalunya through a FI fellowship 2021FI-B00994. S. Marco-Sola was supported by Juan de la Cierva fellowship grant IJC2020-045916-I funded by MCIN/AEI/10.13039/501100011033 and by “European Union NextGenerationEU/PRTR”, and M. Moretó through a Ramon y Cajal fellowship no. RYC-2016-21104., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
11. Characterization and modeling of atomic memory operations in arm based architectures
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universidad de Zaragoza, Armejach Sanosa, Adrià, Moretó Planas, Miquel, Suárez, Darío, Soria Pardos, Víctor, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universidad de Zaragoza, Armejach Sanosa, Adrià, Moretó Planas, Miquel, Suárez, Darío, and Soria Pardos, Víctor
- Abstract
Efficient fine-grain synchronization is a classic computer architecture challenge that has been profusely addressed in the past. Load Link and Store Conditional (LL/SC) became one of the few solutions to this problem and today it is still part of the State-of-the-art. However, as the core count keeps growing many Instruction Set Architectures (ISA) start to support other synchronization instructions that scale better like Atomic Memory Operations (AMO). In this work we present a characterization of LL/SC and AMO instructions in two current Arm-based server machines. Furthermore, Arm has released its Network-on-Chip (NoC) specification enabling different hardware implementations of how AMO are executed in a multicore. Since the adoption of this new standard is still in its first stages, we have modeled six different AMO policies to explore the hardware design trade offs. We find out that there is no single implementation that outperforms the rest. Therefore, we have designed a hardware solution to dynamically select the best configuration obtaining up to 1.15x speed-ups on relevant benchmarks from the Splash-3 benchmark suite.
- Published
- 2022
12. Characterization and modeling of atomic memory operations in arm based architectures
- Author
-
Soria Pardos, Víctor, Armejach Sanosa, Adrià, Moreto Planas, Miquel, Suárez, Darío, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universidad de Zaragoza
- Subjects
Predictors ,Arm ,Computer architecture ,Synchronization ,Multicores ,Atomic ,Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC] ,Arquitectura d'ordinadors - Abstract
Efficient fine-grain synchronization is a classic computer architecture challenge that has been profusely addressed in the past. Load Link and Store Conditional (LL/SC) became one of the few solutions to this problem and today it is still part of the State-of-the-art. However, as the core count keeps growing many Instruction Set Architectures (ISA) start to support other synchronization instructions that scale better like Atomic Memory Operations (AMO). In this work we present a characterization of LL/SC and AMO instructions in two current Arm-based server machines. Furthermore, Arm has released its Network-on-Chip (NoC) specification enabling different hardware implementations of how AMO are executed in a multicore. Since the adoption of this new standard is still in its first stages, we have modeled six different AMO policies to explore the hardware design trade offs. We find out that there is no single implementation that outperforms the rest. Therefore, we have designed a hardware solution to dynamically select the best configuration obtaining up to 1.15x speed-ups on relevant benchmarks from the Splash-3 benchmark suite.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.