90 results on '"Igual, Francisco"'
Search Results
2. QR Factorization Using Malleable BLAS on Multicore Processors
3. Automatic generation of ARM NEON micro-kernels for matrix multiplication
4. Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
5. Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors
6. Automatic Generation of ARM NEON Micro-Kernels for Matrix Multiplication
7. Towards a Malleable Tensorflow Implementation
8. Detecting Time-Fragmented Cache Attacks Against AES Using Performance Monitoring Counters
9. Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
10. Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures
11. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors
12. Dynamic power budget redistribution under a power cap on multi-application environments
13. Micro-kernels for portable and efficient matrix multiplication in deep learning
14. NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors
15. Micro-Kernels for Portable and Efficient Matrix Multiplication in Deep Learning
16. Fine‐grain task‐parallel algorithms for matrix factorizations and inversion on many‐threaded CPUs
17. Anatomy of the BLIS Family of Algorithms for Matrix Multiplication
18. HeSP: A Simulation Framework for Solving the Task Scheduling-Partitioning Problem on Heterogeneous Architectures
19. Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors
20. Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors
21. Runtime Scheduling of the LU Factorization: Performance and Energy
22. An Efficient Implementation of GPU Virtualization in High Performance Clusters
23. Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures
24. A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
25. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
26. A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs
27. Attaining High Performance in General-Purpose Computations on Current Graphics Processors
28. Solving Dense Linear Systems on Graphics Processors
29. Resource Management for Power-Constrained HEVC Transcoding Using Reinforcement Learning
30. Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding
31. Integration and exploitation of intra-routine malleability in BLIS
32. Portability Study of an OpenCL Algorithm for Automatic Target Detection in Hyperspectral Images
33. STEEL-RT: combining single task–single executor model and expanded scheduling to ease heterogeneity exploitation
34. Algorithm 994
35. Practical Considerations for Acoustic Source Localization in the IoT Era: Platforms, Energy Efficiency, and Performance
36. Programming parallel dense matrix factorizations with look-ahead and OpenMP
37. MAMUT: Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video Transcoding
38. Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling
39. Acceleration and energy consumption optimization in cascading classifiers for face detection on low‐cost ARM big. LITTLE asymmetric architectures
40. Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL
41. Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices
42. Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors
43. Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations
44. Energy Efficiency Optimization of Task-Parallel Codes on Asymmetric Architectures
45. Performance-Power Evaluation of an OpenCL Implementation of the Simplex Growing Algorithm for Hyperspectral Unmixing
46. Performance and Scalability Study of FMM Kernels on Novel Multi- and Many-core Architectures
47. On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization
48. Solving Weighted Least Squares (WLS) problems on ARM-based architectures
49. Analytical Modeling Is Enough for High-Performance BLIS
50. Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.