Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. HIPICS - Grup de Circuits i Sistemes Integrats d'Altes Prestacions, Pavón Rivera, Julián, Vargas Valdivieso, Iván, Barredo Ferreira, Adrián, Marimon Illana, Joan, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Unsal, Osman Sabri, Valero Cortés, Mateo, Cristal Kestelman, Adrián, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. HIPICS - Grup de Circuits i Sistemes Integrats d'Altes Prestacions, Pavón Rivera, Julián, Vargas Valdivieso, Iván, Barredo Ferreira, Adrián, Marimon Illana, Joan, Moretó Planas, Miquel, Moll Echeto, Francisco de Borja, Unsal, Osman Sabri, Valero Cortés, Mateo, and Cristal Kestelman, Adrián
Sparse matrix operations are critical kernels in multiple application domains such as High Performance Computing, artificial intelligence and big data. Vector processing is widely used to improve performance on mathematical kernels with dense matrices. Unfortunately, existing vector architectures do not cope well with sparse matrix computations, achieving much lower performance in comparison with their dense counterparts.To overcome this limitation, we present the Vector Indexed Architecture (VIA), a novel hardware vector architecture that accelerates applications with irregular memory access patterns such as sparse matrix computations. There are two main bottlenecks when computing with sparse matrices: irregular memory accesses and index matching. VIA addresses these two bottlenecks with a smart scratchpad that is tightly coupled to the Vector Functional Units within the core.Thanks to this structure, VIA improves locality for sparse-dense computations and improves the index matching search process for sparse computations. As a result, VIA achieves significant performance speedup over highly optimized state-of-the-art C++ algebra libraries. On average, VIA outperforms sparse matrix vector multiplication, sparse matrix addition and sparse matrix matrix multiplication kernels by 4.22 ×, 6.14 × and 6.00 ×, respectively, when evaluated over a thousand sparse matrices that arise in real applications. In addition, we prove the generality of VIA by showing that it can accelerate histogram and stencil applications by 4.5 × and 3.5 ×, respectively., This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. A. Barredo has been supported by the Spanish Government under Formation del Personal Investigador fellowship number BES-2017- 080635. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104., Peer Reviewed, Postprint (author's final draft)