770 results on '"Enrique S. Quintana-Ortí"'
Search Results
2. Hard SyDR: A Benchmarking Environment for Global Navigation Satellite System Algorithms
3. Solving Matrix Equations on Multi-Core and Many-Core Architectures
4. Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms
5. QAttn: Efficient GPU Kernels for mixed-precision Vision Transformers.
6. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors.
7. Acceleration of the Pre-processing Stage of the MVS Workflow using Graphics Processors.
8. Trading Off Performance for Energy in Linear Algebra Operations with Applications in Control Theory
9. Experience-guided, mixed-precision matrix multiplication with apache TVM for ARM processors.
10. Communication-Avoiding Fusion of GEMM-Based Convolutions for Deep Learning in the RISC-V GAP8 MCU.
11. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
12. Parallel GEMM-based convolution for deep learning on multicore RISC-V processors.
13. Automatic generation of ARM NEON micro-kernels for matrix multiplication.
14. Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
15. Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks.
16. Toward Matrix Multiplication for Deep Learning Inference on the Xilinx Versal.
17. Towards Benchmarking GNSS Algorithms on FPGA using SyDR.
18. Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.
19. GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal.
20. Parallel Reduced Order Modeling for Digital Twins using High-Performance Computing Workflows.
21. Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors.
22. Performance Analysis of Matrix Multiplication for Deep Learning on the Edge.
23. Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning.
24. Fast truncated SVD of sparse and dense matrices on graphics processors.
25. Compressed basis GMRES on high-performance graphics processing units.
26. Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
27. Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures.
28. Efficient and portable Winograd convolutions for multi-core processors.
29. Micro-kernels for portable and efficient matrix multiplication in deep learning.
30. Using Ginkgo's memory accessor for improving the accuracy of memory-bound low precision BLAS.
31. Anatomy of the BLIS Family of Algorithms for Matrix Multiplication.
32. Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.
33. NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors.
34. Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor.
35. RED-SEA: Network Solution for Exascale Architectures.
36. QR Factorization Using Malleable BLAS on Multicore Processors.
37. Performance Analysis of Matrix Multiplication for Deep Learning on the Edge.
38. Performance Analysis of Convolution Algorithms for Deep Learning on Edge Processors.
39. RED-SEA Project: Towards a new-generation European interconnect.
40. Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.
41. Resiliency in numerical algorithm design for extreme scale simulations.
42. Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors.
43. A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor.
44. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing.
45. Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence.
46. Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors.
47. GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge.
48. Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
49. Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs.
50. Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.