Author: "Soria Pardos, Víctor" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Soria Pardos, Víctor"' showing total 12 results

Start Over Author "Soria Pardos, Víctor" Publication Year Range Last 3 years

12 results on '"Soria Pardos, Víctor"'

1. GenArchBench: A genomics benchmark suite for arm HPC processors

Author: López-Villellas, Lorién, Langarita-Benítez, Rubén, Badouh, Asaf, Soria-Pardos, Víctor, Aguado-Puig, Quim, López-Paradís, Guillem, Doblas, Max, Setoain, Javier, Kim, Chulho, Ono, Makoto, Armejach, Adrià, Marco-Sola, Santiago, Alastruey-Benedé, Jesús, Ibáñez, Pablo, and Moretó, Miquel
Published: 2024
Full Text: View/download PDF

2. GenArchBench: A genomics benchmark suite for arm HPC processors

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, López Villellas, Lorien, Langarita Benítez, Rubén, Badouh, Asaf, Soria Pardos, Víctor, Aguado Puig, Quim, López Paradís, Guillem, Doblas Font, Max, Setoain, Javier, Kim, Chulho, Ono, Makoto, Armejach Sanosa, Adrià, Marco Sola, Santiago, Alastruey Benedé, Jesús, Ibáñez Marín, Pablo, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, López Villellas, Lorien, Langarita Benítez, Rubén, Badouh, Asaf, Soria Pardos, Víctor, Aguado Puig, Quim, López Paradís, Guillem, Doblas Font, Max, Setoain, Javier, Kim, Chulho, Ono, Makoto, Armejach Sanosa, Adrià, Marco Sola, Santiago, Alastruey Benedé, Jesús, Ibáñez Marín, Pablo, and Moretó Planas, Miquel
Abstract: Arm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held the top position on the Top500 list between June 2020 and June 2022, currently sitting in the fourth position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7 g) is also powered by Arm Graviton3 processors. Projects like European Mont-Blanc and U.S. DOE/NNSA Astra are further examples of Arm irruption in HPC. In parallel, over the last decade, the rapid improvement of genomic sequencing technologies and the exponential growth of sequencing data has placed a significant bottleneck on the computational side. While most genomics applications have been thoroughly tested and optimized for x86 systems, just a few are prepared to perform efficiently on Arm machines. Moreover, these applications do not exploit the newly introduced Scalable Vector Extensions (SVE). This paper presents GenArchBench, the first genome analysis benchmark suite targeting Arm architectures. We have selected computationally demanding kernels from the most widely used tools in genome data analysis and ported them to Arm-based A64FX and Graviton3 processors. Overall, the GenArch benchmark suite comprises 13 multi-core kernels from critical stages of widely-used genome analysis pipelines, including base-calling, read mapping, variant calling, and genome assembly. Our benchmark suite includes different input data sets per kernel (small and large), each with a corresponding regression test to verify the correctness of each execution automatically. Moreover, the porting features the usage of the novel Arm SVE instructions, algorithmic and code optimizations, and the exploitation of Arm-optimized libraries. We present the optimizations implemented in each kernel and a detailed performance evaluation and comparison of their performance on four different HPC machines (i.e., A64FX, Graviton3, Intel Xeon, This work has been partially supported by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contracts PID2019-107255GB-C21, PID2019-105660RB-C21, PID2022136454NB-C22, and TED2021-132634A-I00), by the Generalitat de Catalunya, Spain (contract 2021-SGR-763), by the Gobierno de Aragón (T58_23R research group), by the European Union NextGenerationEU/ PRTR, and by Lenovo BSC Contract-Framework Contract (2020)., Peer Reviewed, Postprint (published version)
Published: 2024

3. A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors

Author: Siracusa, Marco, primary, Soria-Pardos, Víctor, additional, Sgherzi, Francesco, additional, Randall, Joshua, additional, Joseph, Douglas J., additional, Moretó Planas, Miquel, additional, and Armejach, Adrià, additional
Published: 2023
Full Text: View/download PDF

4. DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

Author: Soria-Pardos, Víctor, primary, Armejach, Adrià, additional, Mück, Tiago, additional, Suárez-Gracia, Dario, additional, Joao, José, additional, Rico, Alejandro, additional, and Moretó, Miquel, additional
Published: 2023
Full Text: View/download PDF

5. A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Siracusa, Marco, Soria Pardos, Víctor, Sgherzi, Francesco, Randall, Joshua, Joseph, Douglas J., Moretó Planas, Miquel, Armejach Sanosa, Adrià, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Siracusa, Marco, Soria Pardos, Víctor, Sgherzi, Francesco, Randall, Joshua, Joseph, Douglas J., Moretó Planas, Miquel, and Armejach Sanosa, Adrià
Abstract: This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor workloads running on today’s computing infrastructures. The TMU leverages a novel multi-lane design that enables parallel tensor loading and merging, which naturally pro-duces vector operands that are marshaled into the core for efficient SIMD computation. The TMU supports all the necessary primitives to be tensor-format and tensor-algebra complete. We evaluate the TMU on a simulated multicore system using a broad set of ten-sor algebra workloads, achieving 3.6×, 2.8×, and 4.9× speedups over memory-intensive, compute-intensive, and merge-intensive vectorized software implementations, respectively., This work has been partially supported by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contract PID2019-107255GB-C21), the Generalitat of Catalunya (contract 2021-SGR-00763), the Arm-BSC Center of Excellence, the European HiPEAC Network of Excellence, and the European Processor Initiative (EPI), which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. M. Siracusa has been supported through an FI fellowship [2022FI_B 00969] and V. Soria-Pardos through an FPU fellowship [FPU20-02132]. A. Armejach is a Serra Hunter Fellow., Peer Reviewed, Postprint (author's final draft)
Published: 2023

6. Sargantana: an academic SoC RISC-V processor in 22nm FDSOI technology

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Doblas Font, Max, Candón Arenas, Gerard, Carril Gil, Xavier, Dominguez de la Rocha, Marc, Erra, Enric, González Trejo, Alberto, Jiménez, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Oltra Oltra, Josep Angel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas Quiroga, Narcís, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruiz Ramirez, Abraham Josafat, Safadi Figueroa, Hugo Ernesto, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Arreza, Fernando, Figueras Bagué, Roger, Fontova Muste, Pau, Marimon Illana, Joan, Aragonès Cervera, Xavier, Cristal Kestelman, Adrián, Mateo Peña, Diego, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Palomar Pérez, Óscar, Sonmez, Nehir, Unsal, Osman Sabri, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Doblas Font, Max, Candón Arenas, Gerard, Carril Gil, Xavier, Dominguez de la Rocha, Marc, Erra, Enric, González Trejo, Alberto, Jiménez, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Oltra Oltra, Josep Angel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas Quiroga, Narcís, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruiz Ramirez, Abraham Josafat, Safadi Figueroa, Hugo Ernesto, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Arreza, Fernando, Figueras Bagué, Roger, Fontova Muste, Pau, Marimon Illana, Joan, Aragonès Cervera, Xavier, Cristal Kestelman, Adrián, Mateo Peña, Diego, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Palomar Pérez, Óscar, Sonmez, Nehir, Unsal, Osman Sabri, and Valero Cortés, Mateo
Abstract: This paper describes the Sargantana System on chip (SoC), a 64-bit RISC-V single core processor designed by a number of academic institutions and manufactured in 22 nm FDSOI technology: BSC, UPC, UB, UAB, CIC-IPN and IMB-CNM (CSIC). The SoC includes the processor as well as, among other components, a Phase Locked Loop (PLL) operating up to 2 GHz, interfaces to HyperRAM and a Serdes up to 8 Gbps. The processor has demonstrated experimental correct operation at 800 MHz., The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politécnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN)., Peer Reviewed, Article signat per 48 autors/es: Max Doblas∗, Gerard Candón∗, Xavier Carril∗, Marc Domínguez∗, Enric Erra∗, Alberto González∗, César Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Josep Oltra∗, Julián Pavón∗, Cristóbal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Hugo Safadi∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Fernando Arreza∗, Roger Figueras∗, Pau Fontova-Musté∗, Joan Marimon∗, Ricardo Martínez‡, Sergio Moreno¶, Jordi Sacristán‡, Oscar Alonso¶, Xavier Aragonés§, Adrián Cristal∗, Ángel Diéguez¶, Manuel López¶, Diego Mateo§, Francesc Moll∗§, Miquel Moretó∗§, Oscar Palomar∗, Marco A. Ramírez†, Francesc Serra-Graells∥‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luis Villa† / ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: name.surname@bsc.es; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡Institut de Microelectrònica de Barcelona, IMB-CNM (CSIC), Spain. Email: name.surname@imb-cnm.csic.es; §Universitat Politècnica de Catalunya (UPC), Barcelona, Spain. Email: name.surname@upc.edu; ¶Universitat de Barcelona (UB), Barcelona, Spain. Email: name.surname@ub.edu; ∥Universitat Autònoma de Barcelona (UAB), Barcelona, Spain. Email: name.surname@uab.cat, Postprint (author's final draft)
Published: 2023

7. DynAMO: Improving parallelism through dynamic placement of atomic memory operations

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, and Moretó Planas, Miquel
Abstract: With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09× across all workloads and 1.31× on AMO-intensive applications with respect to executing all AMOs near., This research was supported by the Spanish Ministry of Science and Innovation (MCIN) through contracts [PID2019-107255GB-C21], [TED2021-132634A-I00], and [PID2019-105660RB-C21]; the Generalitat of Catalunya through contract [2021-SGR-00763]; the Government of Aragon [T5820R]; the Arm-BSC Center of Excellence, and the European Processor Initiative (EPI) which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. V. Soria-Pardos has been supported through an FPU fellowship [FPU20-02132]; A. Armejach is a Serra Hunter Fellow and has been partially supported by the Grant [IJCI-2017-33945] funded by MCIN/AEI/10.13039/501100011033; M. Moreto through a Ramón y Cajal fellowship [RYC-2016-21104]., Peer Reviewed, Postprint (author's final draft)
Published: 2023

8. DynAMO: Improving parallelism through dynamic placement of atomic memory operations

Author: Soria Pardos, Víctor, Armejach Sanosa, Adrià, Mück, Tiago, Suárez Gracía, Dario, Joao, Jose A., Rico, Alejandro, Moreto Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Barcelona Supercomputing Center
Subjects: Atomic memory operations, Parallel processing (Electronic computers), Processament en paral·lel (Ordinadors), Sistemes monoxip, Systems on a chip, Multi-core architectures, Data placement, Microarchitecture, Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC]
Abstract: With increasing core counts in modern multi-core designs, the overhead of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cache-coherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09× across all workloads and 1.31× on AMO-intensive applications with respect to executing all AMOs near. This research was supported by the Spanish Ministry of Science and Innovation (MCIN) through contracts [PID2019-107255GB-C21], [TED2021-132634A-I00], and [PID2019-105660RB-C21]; the Generalitat of Catalunya through contract [2021-SGR-00763]; the Government of Aragon [T5820R]; the Arm-BSC Center of Excellence, and the European Processor Initiative (EPI) which is part of the European Union’s Horizon 2020 research and innovation program under grant agreement No. 826647. V. Soria-Pardos has been supported through an FPU fellowship [FPU20-02132]; A. Armejach is a Serra Hunter Fellow and has been partially supported by the Grant [IJCI-2017-33945] funded by MCIN/AEI/10.13039/501100011033; M. Moreto through a Ramón y Cajal fellowship [RYC-2016-21104].
Published: 2023

9. DVINO: A RISC-V vector processor implemented in 65nm technology

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Cabo Pitarch, Guillem, Candon, Gerard, Carril, Xavier, Doblas Font, Max, Dominguez de la Rocha, Marc, González Trejo, Alberto, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel Israel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Minervini Minervini, Francesco, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas, Narcis, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruíz Ramírez, Abraham Josafat, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Figueras Bagué, Roger, Fontova, Pau, Marimon Illana, Joan, Montabes, Víctor, Cristal Kestelman, Adrián, Hernández Luz, Carles, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Palomar Pérez, Óscar, Rubio Sola, Jose Antonio, Sonmez, Nehir, Unsal, Osman Sabri, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. EFRICS - Efficient and Robust Integrated Circuits and Systems, Cabo Pitarch, Guillem, Candon, Gerard, Carril, Xavier, Doblas Font, Max, Dominguez de la Rocha, Marc, González Trejo, Alberto, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel Israel, López Paradís, Guillem, Mendoza Escobar, Jonnatan, Minervini Minervini, Francesco, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Rodas, Narcis, Reggiani, Enrico, Rodriguez, Mario, Rojas Morales, Carlos, Ruíz Ramírez, Abraham Josafat, Soria Pardos, Víctor, Vargas Valdivieso, Iván, Figueras Bagué, Roger, Fontova, Pau, Marimon Illana, Joan, Montabes, Víctor, Cristal Kestelman, Adrián, Hernández Luz, Carles, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Palomar Pérez, Óscar, Rubio Sola, Jose Antonio, Sonmez, Nehir, Unsal, Osman Sabri, and Valero Cortés, Mateo
Abstract: This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology., The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN)., Peer Reviewed, Article signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: name.surname@bsc.es; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: name.surname@imb-cnm.csic.es; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: name.surname@upc.edu, Postprint (author's final draft)
Published: 2022

10. Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Doblas Font, Max, López Paradís, Guillem, Candón Arenas, Gerard, Rodas Quiroga, Narcís, Carril Gil, Xavier, Fontova Muste, Pau, Leyva Santes, Neiel Israel, Marco-Sola, Santiago, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Soria Pardos, Víctor, Doblas Font, Max, López Paradís, Guillem, Candón Arenas, Gerard, Rodas Quiroga, Narcís, Carril Gil, Xavier, Fontova Muste, Pau, Leyva Santes, Neiel Israel, Marco-Sola, Santiago, and Moretó Planas, Miquel
Abstract: The RISC-V open Instruction Set Architecture (ISA) has proven to be a solid alternative to licensed ISAs. In the past 5 years, a plethora of industrial and academic cores and accelerators have been developed implementing this open ISA. In this paper, we present Sargantana, a 64-bit processor based on RISC-V that implements the RV64G ISA, a subset of the vector instructions extension (RVV 0.7.1), and custom application-specific instructions. Sargantana features a highly optimized 7-stage pipeline implementing out-of-order write-back, register renaming, and a non-blocking memory pipeline. Moreover, Sar-gantana features a Single Instruction Multiple Data (SIMD) unit that accelerates domain-specific applications. Sargantana achieves a 1.26 GHz frequency in the typical corner, and up to 1.69 GHz in the fast corner using 22nm FD-SOI commercial technology. As a result, Sargantana delivers a 1.77× higher Instructions Per Cycle (IPC) than our previous 5-stage in-order DVINO core, reaching 2.44 CoreMark/MHz. Our core design delivers comparable or even higher performance than other state-of-the-art academic cores performance under Autobench EEMBC benchmark suite. This way, Sargantana lays the foundations for future RISC-V based core designs able to meet industrial-class performance requirements for scientific, real-time, and high-performance computing applications., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (contract PID2019- 107255GB-C21), by the Generalitat de Catalunya (contract 2017-SGR-1328), by the European Union within the framework of the ERDF of Catalonia 2014-2020 under the DRAC project [001-P-001723], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas and V. Soria-Pardos through a FPU fellowship no. FPU20-04076 and FPU20-02132 respectively. G. Lopez-Paradis has been supported by the Generalitat de Catalunya through a FI fellowship 2021FI-B00994. S. Marco-Sola was supported by Juan de la Cierva fellowship grant IJC2020-045916-I funded by MCIN/AEI/10.13039/501100011033 and by “European Union NextGenerationEU/PRTR”, and M. Moretó through a Ramon y Cajal fellowship no. RYC-2016-21104., Peer Reviewed, Postprint (author's final draft)
Published: 2022

11. Characterization and modeling of atomic memory operations in arm based architectures

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universidad de Zaragoza, Armejach Sanosa, Adrià, Moretó Planas, Miquel, Suárez, Darío, Soria Pardos, Víctor, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universidad de Zaragoza, Armejach Sanosa, Adrià, Moretó Planas, Miquel, Suárez, Darío, and Soria Pardos, Víctor
Abstract: Efficient fine-grain synchronization is a classic computer architecture challenge that has been profusely addressed in the past. Load Link and Store Conditional (LL/SC) became one of the few solutions to this problem and today it is still part of the State-of-the-art. However, as the core count keeps growing many Instruction Set Architectures (ISA) start to support other synchronization instructions that scale better like Atomic Memory Operations (AMO). In this work we present a characterization of LL/SC and AMO instructions in two current Arm-based server machines. Furthermore, Arm has released its Network-on-Chip (NoC) specification enabling different hardware implementations of how AMO are executed in a multicore. Since the adoption of this new standard is still in its first stages, we have modeled six different AMO policies to explore the hardware design trade offs. We find out that there is no single implementation that outperforms the rest. Therefore, we have designed a hardware solution to dynamically select the best configuration obtaining up to 1.15x speed-ups on relevant benchmarks from the Splash-3 benchmark suite.
Published: 2022

12. Characterization and modeling of atomic memory operations in arm based architectures

Author: Soria Pardos, Víctor, Armejach Sanosa, Adrià, Moreto Planas, Miquel, Suárez, Darío, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universidad de Zaragoza
Subjects: Predictors, Arm, Computer architecture, Synchronization, Multicores, Atomic, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], Arquitectura d'ordinadors
Abstract: Efficient fine-grain synchronization is a classic computer architecture challenge that has been profusely addressed in the past. Load Link and Store Conditional (LL/SC) became one of the few solutions to this problem and today it is still part of the State-of-the-art. However, as the core count keeps growing many Instruction Set Architectures (ISA) start to support other synchronization instructions that scale better like Atomic Memory Operations (AMO). In this work we present a characterization of LL/SC and AMO instructions in two current Arm-based server machines. Furthermore, Arm has released its Network-on-Chip (NoC) specification enabling different hardware implementations of how AMO are executed in a multicore. Since the adoption of this new standard is still in its first stages, we have modeled six different AMO policies to explore the hardware design trade offs. We find out that there is no single implementation that outperforms the rest. Therefore, we have designed a hardware solution to dynamically select the best configuration obtaining up to 1.15x speed-ups on relevant benchmarks from the Splash-3 benchmark suite.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Soria Pardos, Víctor"'

1. GenArchBench: A genomics benchmark suite for arm HPC processors

2. GenArchBench: A genomics benchmark suite for arm HPC processors

3. A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors

4. DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

5. A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors

6. Sargantana: an academic SoC RISC-V processor in 22nm FDSOI technology

7. DynAMO: Improving parallelism through dynamic placement of atomic memory operations

8. DynAMO: Improving parallelism through dynamic placement of atomic memory operations

9. DVINO: A RISC-V vector processor implemented in 65nm technology

10. Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI

11. Characterization and modeling of atomic memory operations in arm based architectures

12. Characterization and modeling of atomic memory operations in arm based architectures

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

12 results on '"Soria Pardos, Víctor"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources