310 results on '"Igual, Francisco"'
Search Results
2. Energy efficiency optimization of task-parallel codes on asymmetric architectures
3. Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding
4. Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures
5. Experience-guided, mixed-precision matrix multiplication with apache TVM for ARM processors
6. Balanced segmentation of CNNs for multi-TPU inference
7. Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
8. Automatic generation of ARM NEON micro-kernels for matrix multiplication
9. Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors
10. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors
11. Micro-kernels for portable and efficient matrix multiplication in deep learning
12. Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers
13. Detecting time-fragmented cache attacks against AES using Performance Monitoring Counters
14. Dynamic power budget redistribution under a power cap on multi-application environments
15. QR Factorization Using Malleable BLAS on Multicore Processors
16. Applying Game-Learning Environments to Power Capping Scenarios via Reinforcement Learning
17. Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP
18. Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code
19. Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors
20. HeSP: a simulation framework for solving the task scheduling-partitioning problem on heterogeneous architectures
21. Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors
22. Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra
23. Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors
24. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
25. Towards a Malleable Tensorflow Implementation
26. Scheduling Elastic Machine Learning Process through Containers /Coplanificacion de procesos maleables de aprendizaje automatico mediante contenedores
27. STEEL-RT: combining single task–single executor model and expanded scheduling to ease heterogeneity exploitation
28. Integration and exploitation of intra-routine malleability in BLIS
29. Programming parallel dense matrix factorizations with look-ahead and OpenMP
30. Solving Dense Generalized Eigenproblems on Multi-threaded Architectures
31. Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
32. Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors
33. Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling
34. Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL
35. Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors
36. Automatic Generation of ARM NEON Micro-Kernels for Matrix Multiplication
37. Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations
38. Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
39. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
40. Performance and Scalability Study of FMM Kernels on Novel Multi- and Many-core Architectures
41. On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization
42. Detecting Time-Fragmented Cache Attacks Against AES Using Performance Monitoring Counters
43. Fine‐grain task‐parallel algorithms for matrix factorizations and inversion on many‐threaded CPUs.
44. Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
45. Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures
46. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors
47. Accelerating fluid–solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures
48. Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi
49. Non-negative Matrix Factorization on Low-Power Architectures and Accelerators: A Comparative Study
50. Time and energy modeling of high–performance Level-3 BLAS on x86 architectures
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.