Descriptor: "sparse matrix vector multiplication" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"sparse matrix vector multiplication"' showing total 17 results

Start Over Descriptor "sparse matrix vector multiplication"

17 results on '"sparse matrix vector multiplication"'

1. Performance evaluation and analysis of vectorized SpMV algorithm based on scratchpad memory.

Author: ZHANG Zong-mao, DONG De-zun, WANG Zi-cong, CHANG Jun-sheng, ZHANG Xiao-yun, and WANG Shao-cong
Published: 2024
Full Text: View/download PDF

2. A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

Author: Selvitopi, Oguz, Acer, Seher, and Aykanat, Cevdet
Subjects: Communication cost, bandwidth, latency, partitioning, hypergraph, recursive bipartitioning, load balancing, sparse matrix vector multiplication, combinatorial scientific computing, Distributed Computing, Computer Software, Communications Technologies
Published: 2017

3. CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network

Author: Shengyu Fan, Hui Yu, Dianjie Lu, Shuai Jiao, Weizhi Xu, Fangai Liu, and Zhiyong Liu
Subjects: Convolutional neural network, sparse matrix vector multiplication, neural networks, convolution, sparse matrices, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Convolutional Neural Networks (CNNs) have become one of the most successful machine learning techniques for image and video processing. The most computationally intensive part of the CNN is the convolutional layers, which have the multi-channel image and multiple kernels. However, due to the network pruning operation and the application of RELU activation function operation in the training process, numerous zero values are generated in the network. This paper proposes the convolution split compression calculation (CSCC) algorithm, which improves the performance of the convolution layer by utilizing the sparse characteristic of the feature map. In the CSCC algorithm, first, the feature map is directly converted into a sparse matrix of compressed sparse row (CSR) format, which avoids expanding feature map to an intermediate matrix and reduces the memory space consumption. Second, the convolution kernel is converted into a vector. Finally, the convolution result is obtained by the sparse matrix vector multiplication (SpMV). The experimental results show that the CSCC algorithm has a good advantage in computation speed and memory consumption compared with the other convolution algorithms.
Published: 2019
Full Text: View/download PDF

4. Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.

Author: Tan, Guangming, Liu, Junhong, and Li, Jiajia
Subjects: *MULTICORE processors, *SPARSE matrix software, *HIGH performance computing, *KERNEL (Mathematics), *MATHEMATICAL optimization
Abstract: Sparse matrix vector multiplication (SpMV) is an important computational kernel in traditional high-performance computing and emerging data-intensive applications. Previous SpMV libraries are optimized by either application-specific or architecture-specific approaches but present difficulties for use in real applications. In this work, we develop an auto-tuning system (SMATER) to bridge the gap between specific optimizations and general-purpose use. SMATER provides programmers a unified interface based on the compressed sparse row (CSR) sparse matrix format by implicitly choosing the best format and fastest implementation for any input sparse matrix during runtime. SMATER leverages a machine-learning model and retargetable back-end library to quickly predict the optimal combination. Performance parameters are extracted from 2,386 matrices in the SuiteSparse matrix collection. The experiments show that SMATER achieves good performance (up to 10 times that of the Intel Math Kernel Library (MKL) on Intel E5-2680 v3) while being portable on state-of-the-art x86 multicore processors, NVIDIA GPUs, and Intel Xeon Phi accelerators. Compared with the Intel MKL library, SMATER runs faster by more than 2.5 times on average. We further demonstrate its adaptivity in an algebraic multigrid solver from the Hypre library and report greater than 20% performance improvement. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

5. Technique detection software for Sparse Matrices

Author: KHAN Muhammad Taimoor and USMAN Anila
Subjects: sparse matrices, sparse storage formats, sparse matrix vector multiplication, Electronic computers. Computer science, QA75.5-76.95
Abstract: Sparse storage formats are techniques for storing and processing the sparse matrix data efficiently. The performance of these storage formats depend upon the distribution of non-zeros, within the matrix in different dimensions. In order to have better results we need a technique that suits best the organization of data in a particular matrix. So the decision of selecting a better technique is the main step towards improving the system's results otherwise the efficiency can be decreased. The purpose of this research is to help identify the best storage format in case of reduced storage size and high processing efficiency for a sparse matrix.
Published: 2009

6. Heterogeneous sparse matrix–vector multiplication via compressed sparse row format.

Author: Lane, Phillip Allen and Booth, Joshua Dennis
Subjects: *MATRICES (Mathematics), *SPARSE matrices, *FINITE differences, *MULTIPLICATION, *HETEROGENEOUS computing, *MATRIX multiplications
Abstract: Sparse matrix–vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR- k , that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 838 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance ≤ 10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR- k achieves this with reordering and by grouping rows into a hierarchical structure of super-rows and super–super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device, and this model can be used to select super-row and super–super-rows sizes in constant time. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network

Author: Hui Yu, Weizhi Xu, Dianjie Lu, Shengyu Fan, Fangai Liu, Zhiyong Liu, and Shuai Jiao
Subjects: General Computer Science, Artificial neural network, Computer science, sparse matrices, General Engineering, sparse matrix vector multiplication, Sparse matrix-vector multiplication, Convolutional neural network, Video processing, neural networks, 01 natural sciences, 010305 fluids & plasmas, Convolution, Matrix (mathematics), Kernel (image processing), Feature (computer vision), 0103 physical sciences, convolution, General Materials Science, lcsh:Electrical engineering. Electronics. Nuclear engineering, 010306 general physics, lcsh:TK1-9971, Algorithm, Sparse matrix
Abstract: Convolutional Neural Networks (CNNs) have become one of the most successful machine learning techniques for image and video processing. The most computationally intensive part of the CNN is the convolutional layers, which have the multi-channel image and multiple kernels. However, due to the network pruning operation and the application of RELU activation function operation in the training process, numerous zero values are generated in the network. This paper proposes the convolution split compression calculation (CSCC) algorithm, which improves the performance of the convolution layer by utilizing the sparse characteristic of the feature map. In the CSCC algorithm, first, the feature map is directly converted into a sparse matrix of compressed sparse row (CSR) format, which avoids expanding feature map to an intermediate matrix and reduces the memory space consumption. Second, the convolution kernel is converted into a vector. Finally, the convolution result is obtained by the sparse matrix vector multiplication (SpMV). The experimental results show that the CSCC algorithm has a good advantage in computation speed and memory consumption compared with the other convolution algorithms.
Published: 2019

8. A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies.

Author: Selvitopi, R. Oguz, Ozdal, Muhammet Mustafa, and Aykanat, Cevdet
Subjects: *ITERATIVE methods (Mathematics), *PEER-to-peer architecture (Computer networks), *SUPERCOMPUTERS, *SCALABILITY, *COMPUTER networks, *HEURISTIC programming
Abstract: In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvement-based heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

9. Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation.

Author: Pabst, Hans, Bachmayer, Bev, and Klemm, Michael
Abstract: Sparse matrix-vector multiplication (SpMV) is an important building block for many scientific applications. Various formats exist to store and represent sparse matrices in the computer's memory. The compressed row storage format (CRS or CSR) is typically a baseline to report a new hybrid or an improved representation of sparse matrices. In this paper, we describe the implementation and performance benefit of a structure-detecting SpMV algorithm using the CSR format. Our implementation detects contiguous rows in the sparse matrix representation to improve the performance of the computation by making better use of the available memory bandwidth. Applications with mixed or a-priori unknown matrix structures can take advantage of the runtime structure detection. We show that the additional control flow needed does not degrade performance, but may deliver up to twice the performance of the traditional SpMV algorithm. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

10. Exploiting dense substructures for fast sparse matrix vector multiplication.

Author: Shantharam, Manu, Chatterjee, Anirban, and Raghavan, Padma
Subjects: *SPARSE matrices, *PARTIAL differential equations, *COMPUTATIONAL fluid dynamics, *STRUCTURAL analysis (Engineering), *VECTOR analysis, *KERNEL functions, *OPL (Computer program language), *HIGH performance computing
Abstract: The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y ← A · x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

11. MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner.

Author: Oyarzun, G., Borrell, R., Gorobets, A., and Oliva, A.
Subjects: *MAGNETIC particle imaging, *CUDA (Computer architecture), *SPARSE matrices, *MULTIPLICATION, *CONJUGATE gradient methods, *APPROXIMATION theory
Abstract: Highlights: [•] A hybrid MPI-CUDA parallelization strategy for the PCG method is presented. [•] A two-stream concurrent model is used to overlap calculations and data transfer. [•] A two-level partitioning is used to manage the work-flow between CPUs and GPUs. [•] Up to 4.4X speedup has been achieved when comparing with a CPU-only implementation. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

12. Fast encoding of quasi‐cyclic low‐density parity‐check codes in IEEE 802.15.3c.

Author: Zhang, Peng, Du, Shuai, Liu, Changyin, and Jiang, Qianqian
Abstract: A high‐speed encoder is proposed for quasi‐cyclic low‐density parity‐check codes. By merging some sub‐matrices of a parity‐check matrix H in an approximately lower triangular form, a compact encoding process is obtained, reducing pipeline stages from six to three. Moreover, well‐designed circuits are used to implement back‐substitution and sparse‐matrix–vector multiplication. The low‐density parity‐check (672, 336) code in IEEE 802.15.3c shows that the proposed encoder is easy to implement, runs fast, and requires no memory. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

13. A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies

Author: Cevdet Aykanat, R. Oguz Selvitopi, Muhammet Mustafa Ozdal, and Aykanat, Cevdet
Subjects: Iterative methods, Computer science, Iterative method, Inner product, Parallel algorithm, Parallel computing, Conjugate gradient method, Sparse matrix-vector multiplication, Matrix algebra, Point-to-point Communication, Iterative Improvement Heuristic, Matrix (mathematics), Sparse Matrix Vector Multiplication, Latency (engineering), Conjugate gradient, Sparse matrix, Message Latency Overhead, Iterative improvements, Inner Product Computation, Parallel processing systems, Iterative improvement heuristic, Vectors, Supercomputers, Costs, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, Scalability, Message latency, Hiding Latency, Iterative solvers, Collective Communication, Parallel Linear Iterative Solvers, Collective communications, Avoiding Latency
Abstract: Cataloged from PDF version of article. In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvement-based heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.
Published: 2015

14. yInMem: A Parallel Distributed Indexed In-Memory Computation System for Big Data Analytics

Author: Huang, Yin
Subjects: In-memory computing, spectral clustering, big data, sparse matrix vector multiplication, yInMem
Abstract: Cluster computing is experiencing a surge of interest in in-memory computing system with the advances in hardware such as memory. However, the network media has the smallest bandwidth as compared to memory and disk in a typical setting of cluster computing environment. In addition, the sparse nature of graph applications, such as social network, imposes new challenges for in-memory computing system. Examples of such challenges are data locality, workload balance and memory management. As a result, fine control over data partitioning and data sharing plays a crucial role in improving the speed of large-scale data-parallel processing systems by reducing the cross-node communication. In order to maximize the performance, in-memory computing system should be offering optimized data throughput for parallel computation in large-scale data analytics. This dissertation presents yInMem: a parallel, distributed, indexed, in-memory computing system for big data analytics. With the goal of building an in-memory computing system that enables optimal data partitioning and improves efficiency of iterative machine learning and graph algorithms, yInMem bridges the gap between HPC and Hadoop by parallelizing the computation with MPI while obtaining the advantage of distributed data storage, such as NoSQL database built on top of Hadoop. The novelty of yInMem results from introducing indexes or associative arrays to the in-memory computing system. Such a design offers benefits of fine control over data distribution with parallel computation to maximize the computing resources usage in the cluster. By analyzing the linear algebra characteristics of iterative machine learning and graph algorithms, such as spectral clustering and PageRank, we find that yInMem is capable of maximizing the usage of computing resources in the cluster. Leveraging the insights of Sparse Matrix-Vector Multiplication (SpMV), we also provide an optimal data partitioning algorithm on top of yInMem for load balance and data locality. In order to evaluate yInMem, we investigate iterative machine learning and graph algorithms using both synthetic benchmarks and real user applications. yInMem matches or exceeds the performance of existing specialized systems.
Published: 2017
Full Text: View/download PDF

15. A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

Author: Cevdet Aykanat, Seher Acer, Oguz Selvitopi, and Aykanat, Cevdet
Subjects: bandwidth, Hypergraph, Computer science, load balancing, hypergraph, sparse matrix vector multiplication, 010103 numerical & computational mathematics, 02 engineering and technology, Parallel computing, Solid modeling, 01 natural sciences, Computer Software, partitioning, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Latency (engineering), Resource allocation, latency, Sparse matrix, Communications Technologies, 020203 distributed computing, Sparse matrix-vector multiplication, Load balancing (computing), recursive bipartitioning, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, combinatorial scientific computing, Communication cost, Distributed Computing
Abstract: Intelligent partitioning models are commonly used for efficient parallelization of irregular applications on distributed systems. These models usually aim to minimize a single communication cost metric, which is either related to communication volume or message count. However, both volume- and message-related metrics should be taken into account during partitioning for a more efficient parallelization. There are only a few works that consider both of them and they usually address each in separate phases of a two-phase approach. In this work, we propose a recursive hypergraph bipartitioning framework that reduces the total volume and total message count in a single phase. In this framework, the standard hypergraph models, nets of which already capture the bandwidth cost, are augmented with message nets . The message nets encode the message count so that minimizing conventional cutsize captures the minimization of bandwidth and latency costs together. Our model provides a more accurate representation of the overall communication cost by incorporating both the bandwidth and the latency components into the partitioning objective. The use of the widely-adopted successful recursive bipartitioning framework provides the flexibility of using any existing hypergraph partitioner. The experiments on instances from different domains show that our model on the average achieves up to $52$ percent reduction in total message count and hence results in $29$ percent reduction in parallel running time compared to the model that considers only the total volume.
Published: 2016

16. Technique detection software for Sparse Matrices

Author: KHAN Muhammad Taimoor and USMAN Anila
Subjects: FOS: Computer and information sciences, sparse storage formats, sparse matrices, sparse matrix vector multiplication, Computer Science - Mathematical Software, lcsh:Electronic computers. Computer science, Mathematical Software (cs.MS), lcsh:QA75.5-76.95
Abstract: Sparse storage formats are techniques for storing and processing the sparse matrix data efficiently. The performance of these storage formats depend upon the distribution of non-zeros, within the matrix in different dimensions. In order to have better results we need a technique that suits best the organization of data in a particular matrix. So the decision of selecting a better technique is the main step towards improving the system's results otherwise the efficiency can be decreased. The purpose of this research is to help identify the best storage format in case of reduced storage size and high processing efficiency for a sparse matrix., 10 pages
Published: 2012

17. Minimizing communication through computational redundancy in parallel iterative solvers

Author: Torun, Fahreddin Şükrü, Aykanat, Cevdet, and Diğer
Subjects: Parallel computing, Sparse matrix vector multiplication, Iterative methods (Mathematics), Matrices, Replication, Iterative solvers, QA188 .T67 2011, Sparse matrices--Data processing, Parallel, Computer Engineering and Computer Science and Control, Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Sparse matrix
Abstract: y=Ax biçimindeki seyrek matris-vektör çarpımı (SMxV) bilimsel uygulamalardayinelemeli doğrusal denklem çözümleyicilerinde kullanılan bir çekirdekoperasyondur. Bu çözümleyicilerde, yinelemeler vasıtasıyla yakınsayıncaya kadaraynı seyrek matris ile SMxV operasyonu tekrarlanarak uygulanır. Paralel ortamdaparalel SMxV operasyonu matrise ve onun ayrışımına göre işlemcilerarasında haberleşmeye ihtiyaç duyar. Bu haberleşme akıllı ayrışımlar ileazaltılabilinir. Fakat, biz veri replikasyonu ve fazla hesaplama ile bu haberleşmeyidaha da fazla azaltabiliriz. Satır-paralel SMxV hesaplamada bu haberleşme x vektörelemanlarının transferi yüzünden oluşur. Bir sonraki yinelemenin girdivektörü x, bazı doğrusal operasyonlar vasıtasıyla yürürlükteki yinelemenin çiktivektörü y ile hesaplanır. Bundan dolayı, bir işlemci başka bir işlemciden bir x vektörelemanı almak yerine fazla bir y vektör elemanını, ki bu y vektör elemanıbir sonraki yinelemenin x vektör elemanına öncülük eder, hesaplayabilir. Böylece,fazla y vektör elemanı hesaplamak haberleşmenin azalmasına yol açabilir.Bu tezde, biz yukarıda bahsedilen yinelemeli denklem çözümlayiciler içinhesaplama ve haberleşme desenini doğru yakalayan yönlü çizge tabanlı modeltasarladık. Bundan başka, biz fazla y vektör elemanı hesaplaması sebebiylehaberleşme azalışını yönlü çizge modeli uzerinde bir kombinatoriyal problemolarak formülledik. Biz bu kombinatoriyal problemi çözmek için iki tane buluşsalyöntem önerdik. Deneysel sonuçlar göstermektedir ki fazla hesaplama yaparakhaberleşme azaltma stratejisi gelecek vaat etmektedir. Sparse matrix vector multiplication (SpMxV) of the form y = Ax is a kerneloperation in iterative linear solvers used in scientific applications. In thesesolvers, the SpMxV operation is performed repeatedly with the same sparse matrixthrough iterations until convergence. Depending on the matrix and its decomposition,parallel SpMxV operation necessitates communication among processorsin the parallel environment. The communication can be reduced by intelligentdecomposition. However, we can further decrease the communication throughdata replication and redundant computation. The communication occurs due tothe transfer of x-vector entries in row-parallel SpMxV computation. The inputvector x of the next iteration is computed from the output vector of the currentiteration through linear vector operations. Hence, a processor may compute ay-vector entry redundantly, which leads to a x-vector entry in the following iteration,instead of receiving that x-vector entry from another processor. Thus,redundant computation of that y-vector entry may lead to reduction in communication.In this thesis, we devise a directed-graph-based model that correctly capturesthe computation and communication pattern for above-mentioned iterativesolvers. Moreover, we formulate the communication minimization by utilizingredundant computation of y-vector entries as a combinatorial problem on thisdirected graph model. We propose two heuristics to solve this combinatorialproblem. Experimental results indicate that the communication reducing strategyby redundantly computing is promising. 70
Published: 2011

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"sparse matrix vector multiplication"'

1. Performance evaluation and analysis of vectorized SpMV algorithm based on scratchpad memory.

2. A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

3. CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network

4. Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture.

5. Technique detection software for Sparse Matrices

6. Heterogeneous sparse matrix–vector multiplication via compressed sparse row format.

7. CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network

8. A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies.

9. Performance of a Structure-Detecting SpMV Using the CSR Matrix Representation.

10. Exploiting dense substructures for fast sparse matrix vector multiplication.

11. MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner.

12. Fast encoding of quasi‐cyclic low‐density parity‐check codes in IEEE 802.15.3c.

13. A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies

14. yInMem: A Parallel Distributed Indexed In-Memory Computation System for Big Data Analytics

15. A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

16. Technique detection software for Sparse Matrices

17. Minimizing communication through computational redundancy in parallel iterative solvers

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

17 results on '"sparse matrix vector multiplication"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources