Author: "Igual, Francisco" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Igual, Francisco"' showing total 168 results

Start Over Author "Igual, Francisco" Search Limiters Full Text

168 results on '"Igual, Francisco"'

1. Energy efficiency optimization of task-parallel codes on asymmetric architectures

Author: Costero, Luis, Igual, Francisco D., Olcoz, Katzalin, and Tirado, Francisco
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution points, depending on the internal status of the scheduler. Experimental results on an asymmetric SoC (Exynos 5422) and for a specific operation (Cholesky factorization) reveal gains up to 29% in terms of energy efficiency and considerable reductions in average power.
Published: 2024
Full Text: View/download PDF

2. Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding

Author: Costero, Luis, Igual, Francisco D., Olcoz, Katzalin, and Tirado, Francisco
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion becomes mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24x compared with alternative approaches considering homogeneous QoS requests.
Published: 2024
Full Text: View/download PDF

3. Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures

Author: Corpas, Alberto, Costero, Luis, Botella, Guillermo, Igual, Francisco D., García, Carlos, and Rodríguez, Manuel
Subjects: Computer Science - Performance
Abstract: This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost Asymmetric Multicore Processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big.LITTLE ARM processors, including: (I) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion; (II) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on asymmetric multicore processors with a suitable task scheduling, with that of a homogeneous symmetric architecture.
Published: 2024
Full Text: View/download PDF

4. Experience-guided, mixed-precision matrix multiplication with apache TVM for ARM processors

Author: Castelló, Adrián, Martínez, Héctor, Catalán, Sandra, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Published: 2025
Full Text: View/download PDF

5. Balanced segmentation of CNNs for multi-TPU inference

Author: Villarrubia, Jorge, Costero, Luis, Igual, Francisco D., and Olcoz, Katzalin
Published: 2025
Full Text: View/download PDF

6. Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Author: Alaejos, Guillermo, Castelló, Adrián, Alonso-Jordá, Pedro, Igual, Francisco D., Martínez, Héctor, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Computation and Language
Abstract: We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. % In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves portability, maintainability and, globally, streamlines the software life cycle; 2)~provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3)~features a small memory footprint., Comment: 35 pages, 22 figures. Submitted to ACM TOMS
Published: 2023

7. Automatic generation of ARM NEON micro-kernels for matrix multiplication

Author: Alaejos, Guillermo, Martínez, Héctor, Castelló, Adrián, Dolz, Manuel F., Igual, Francisco D., Alonso-Jordá, Pedro, and Quintana-Ortí, Enrique S.
Published: 2024
Full Text: View/download PDF

8. Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Author: Martínez, Héctor, Catalán, Sandra, Igual, Francisco D., Herrero, José R., Rodríguez-Sánchez, Rafael, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.
Published: 2023

9. Micro-kernels for portable and efficient matrix multiplication in deep learning

Author: Alaejos, Guillermo, Castelló, Adrián, Martínez, Héctor, Alonso-Jordá, Pedro, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Published: 2023
Full Text: View/download PDF

10. Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers

Author: Quintana-Ortí, Gregorio, Hernando, Fernando, and Igual, Francisco D.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Information Theory
Abstract: The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this paper, we introduce a family of implementations of the Brouwer-Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over F2. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.
Published: 2019

11. Detecting time-fragmented cache attacks against AES using Performance Monitoring Counters

Author: Prada, Iván, Igual, Francisco D., and Olcoz, Katzalin
Subjects: Computer Science - Cryptography and Security
Abstract: Cache timing attacks use shared caches in multi-core processors as side channels to extract information from victim processes. These attacks are particularly dangerous in cloud infrastructures, in which the deployed countermeasures cause collateral effects in terms of performance loss and increase in energy consumption. We propose to monitor the victim process using an independent monitoring (detector) process, that continuously measures selected Performance Monitoring Counters (PMC) to detect the presence of an attack. Ad-hoc countermeasures can be applied only when such a risky situation arises. In our case, the victim process is the AES encryption algorithm and the attack is performed by means of random encryption requests. We demonstrate that PMCs are a feasible tool to detect the attack and that sampling PMCs at high frequencies is worse than sampling at lower frequencies in terms of detection capabilities, particularly when the attack is fragmented in time to try to be hidden from detection.
Published: 2019

12. Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP

Author: Catalán, Sandra, Castelló, Adrián, Igual, Francisco D., Rodríguez-Sánchez, Rafael, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Abstract: We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multithreaded version of BLAS. This approach is also different from the more sophisticated runtime-assisted implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a highlevel of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of LAPACK functionality on any multicore platform with an OpenMP-like runtime., Comment: 28 pages
Published: 2018

13. Fast Algorithms for the Computation of the Minimum Distance of a Random Linear Code

Author: Hernando, Fernando, Igual, Francisco D., and Quintana-Ortí, Gregorio
Subjects: Computer Science - Information Theory, 68Q30, 68Q25, 65F30, 11Y16
Abstract: The minimum distance of a code is an important concept in information theory. Hence, computing the minimum distance of a code with a minimum computational cost is a crucial process to many problems in this area. In this paper, we present and evaluate a family of algorithms and implementations to compute the minimum distance of a random linear code over $\mathbb{F}_{2}$ that are faster than different current implementations. In addition to the basic sequential implementations, we present parallel and vectorized implementations that render high performances on modern architectures. The attained performance results show the benefits of the developed optimized algorithms, which obtain remarkable performance improvements compared with state-of-the-art implementations widely used nowadays.
Published: 2016
Full Text: View/download PDF

14. Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors

Author: San Juan, Pablo, Rodríguez-Sánchez, Rafael, Igual, Francisco D., Alonso-Jordá, Pedro, and Quintana-Ortí, Enrique S.
Published: 2021
Full Text: View/download PDF

15. HeSP: a simulation framework for solving the task scheduling-partitioning problem on heterogeneous architectures

Author: Rey, Anton, Igual, Francisco D., and Prieto-Matías, Manuel
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: In this paper we describe HeSP, a complete simulation framework to study a general task scheduling-partitioning problem on heterogeneous architectures, which treats recursive task partitioning and scheduling decisions on equal footing. Considering recursive partitioning as an additional degree of freedom, tasks can be dynamically partitioned or merged at runtime for each available processor type, exposing additional or reduced degrees of parallelism as needed. Our simulations reveal that, for a specific class of dense linear algebra algorithms taken as a driving example, simultaneous decisions on task scheduling and partitioning yield significant performance gains on two different heterogeneous platforms: a highly heterogeneous CPU-GPU system and a low-power asymmetric big.LITTLE ARM platform. The insights extracted from the framework can be further applied to actual runtime task schedulers in order to improve performance on current or future architectures and for different task-parallel codes.
Published: 2016

16. Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

Author: Catalán, Sandra, Herrero, José R., Igual, Francisco D., Rodríguez-Sánchez, Rafael, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Mathematical Software, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures,the adaption of these libraries for asymmetric multicore processors (AMPs)is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the BLIS framework, and tailored for AMPs equipped with two types of cores: fast/power hungry versus slow/energy efficient. For this purpose, we integrate coarse-grain and fine-grain parallelization strategies into the library routines which, respectively, dynamically distribute the workload between the two core types and statically repartition this work among the cores of the same type. Our results on an ARM big.LITTLE processor embedded in the Exynos 5422 SoC, using the asymmetry-aware version of the BLAS and a plain migration of the legacy version of LAPACK, experimentally assess the benefits, limitations, and potential of this approach.
Published: 2015

17. Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra

Author: Costero, Luis, Igual, Francisco D., Olcoz, Katzalin, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallel applications, and there exist early attempts to address this problem via ad-hoc strategies embedded into a runtime framework. In this paper we take a different path, which consists in addressing the complexity of the problem at the library level, via a few asymmetry-aware fundamental kernels, hiding the architecture heterogeneity from the task scheduler. For the specific domain of dense linear algebra, we show that this is not only possible but delivers much higher performance than a naive approach based on an asymmetry-oblivious scheduler. Furthermore, this solution also outperforms an ad-hoc asymmetry-aware scheduler furnished with sophisticated scheduling techniques.
Published: 2015

18. Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors

Author: Catalán, Sandra, Igual, Francisco D., Mayo, Rafael, Piñuel, Luis, Quintana-Ortí, Enrique S., and Rodríguez-Sánchez, Rafael
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Asymmetric processors have emerged as an appealing technology for severely energy-constrained environments, especially in the mobile market where heterogeneity in applications is mainstream. In addition, given the growing interest on ultra low-power architectures for high performance computing, this type of platforms are also being investigated in the road towards the implementation of energy- efficient high-performance scientific applications. In this paper, we propose a first step towards a complete implementation of the BLAS interface adapted to asymmetric ARM big.LITTLE processors, analyzing the trade-offs between performance and energy efficiency when compared to existing homogeneous (symmetric) multi-threaded BLAS implementations. Our experimental results reveal important gains in performance while maintaining the energy efficiency of homogeneous solutions by efficiently exploiting all the resources of the asymmetric processor., Comment: Presented at HiPEAC 2015, Amsterdam. Foundation of the Asymmetric BLIS implementation
Published: 2015

19. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

Author: Catalán, Sandra, Igual, Francisco D., Mayo, Rafael, Rodríguez-Sánchez, Rafael, and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software, Computer Science - Numerical Analysis
Abstract: Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.
Published: 2015

20. Scheduling Elastic Machine Learning Process through Containers /Coplanificacion de procesos maleables de aprendizaje automatico mediante contenedores

Author: Libutti, Leandro Ariel, Igual, Francisco, and de Giusti, Laura
Published: 2023
Full Text: View/download PDF

21. STEEL-RT: combining single task–single executor model and expanded scheduling to ease heterogeneity exploitation

Author: Rey, Antón, Igual, Francisco D., and Prieto-Matías, Manuel
Published: 2020
Full Text: View/download PDF

22. Integration and exploitation of intra-routine malleability in BLIS

Author: Rodríguez-Sánchez, Rafael, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Published: 2020
Full Text: View/download PDF

23. Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

Author: Aliaga, José I., Bientinesi, Paolo, Davidović, Davor, Di Napoli, Edoardo, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Subjects: Computer Science - Performance, Condensed Matter - Materials Science, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Abstract: We compare two approaches to compute a portion of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism of the method to the performance of the solvers. The experimental results on a state-of-the-art 8-core platform, equipped with a graphics processing unit (GPU), reveal that in real applications, iterative Krylov-subspace methods can be a competitive approach also for the solution of dense problems., Comment: 5 tables and 4 figures. In press by Applied Mathematics and Computation. Accepted version
Published: 2011
Full Text: View/download PDF

24. Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling

Author: Rey, Antón, Igual, Francisco D., and Prieto-Matías, Manuel
Published: 2019
Full Text: View/download PDF

25. Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL

Author: Badía, Jose M., Belloch, Jose A., Cobos, Maximo, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Published: 2019
Full Text: View/download PDF

26. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.

Author: Rodríguez-Sánchez, Rafael, Castelló, Adrián, Catalán, Sandra, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Subjects: MULTICORE processors, LINEAR algebra
Abstract: Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Performance and Scalability Study of FMM Kernels on Novel Multi- and Many-core Architectures

Author: Rey, Antón, Igual, Francisco D., Prieto-Matías, Manuel, and Prins, Jan F.
Published: 2017
Full Text: View/download PDF

28. On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization

Author: Belloch, Jose A., Badia, Jose M., Igual, Francisco D., Cobos, Maximo, and Quintana-Ortí, Enrique S.
Published: 2017
Full Text: View/download PDF

29. Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Author: Catalán, Sandra, primary, Igual, Francisco D., additional, Herrero, José R., additional, Rodríguez-Sánchez, Rafael, additional, and Quintana-Ortí, Enrique S., additional
Published: 2023
Full Text: View/download PDF

30. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

Author: Rodríguez-Sánchez, Rafael, primary, Castelló, Adrián, additional, Catalán, Sandra, additional, Igual, Francisco D., additional, and Quintana-Ortí, Enrique S., additional
Published: 2023
Full Text: View/download PDF

31. Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi

Author: Dolz, Manuel F., Igual, Francisco D., Ludwig, Thomas, Piñuel, Luis, and Quintana-Ortí, Enrique S.
Published: 2015
Full Text: View/download PDF

32. Non-negative Matrix Factorization on Low-Power Architectures and Accelerators: A Comparative Study

Author: Igual, Francisco D., García, Carlos, Botella, Guillermo, Piñuel, Luis, Prieto-Matías, Manuel, and Tirado, Francisco
Published: 2015
Full Text: View/download PDF

33. Micro-kernels for portable and efficient matrix multiplication in deep learning

Author: Alaejos, Guillermo, primary, Castelló, Adrián, additional, Martínez, Héctor, additional, Alonso-Jordá, Pedro, additional, Igual, Francisco D., additional, and Quintana-Ortí, Enrique S., additional
Published: 2022
Full Text: View/download PDF

34. Solving Weighted Least Squares (WLS) problems on ARM-based architectures

Author: Belloch, Jose A., Bank, Balázs, Igual, Francisco D., Quintana-Ortí, Enrique S., and Vidal, Antonio M.
Published: 2017
Full Text: View/download PDF

35. Time and energy modeling of a high-performance multi-threaded Cholesky factorization

Author: Catalán, Sandra, Igual, Francisco D., Mayo, Rafael, Rodríguez-Sánchez, Rafael, and Quintana-Ortí, Enrique S.
Published: 2017
Full Text: View/download PDF

36. Speeding up the log-polar transform with inexpensive parallel hardware: graphics units and multi-core architectures

Author: Antonelli, Marco, Igual, Francisco D., Ramos, Francisco, and Traver, V. Javier
Published: 2015
Full Text: View/download PDF

37. Robust motion estimation on a low-power multi-core DSP

Author: Igual, Francisco D., Botella, Guillermo, García, Carlos, Prieto, Manuel, and Tirado, Francisco
Published: 2013
Full Text: View/download PDF

38. Extending OpenMP to Survive the Heterogeneous Multi-Core Era

Author: Ayguadé, Eduard, Badia, Rosa M., Bellens, Pieter, Cabrera, Daniel, Duran, Alejandro, Ferrer, Roger, Gonzàlez, Marc, Igual, Francisco, Jiménez-González, Daniel, Labarta, Jesús, Martinell, Luis, Martorell, Xavier, Mayo, Rafael, Pérez, Josep M., Planas, Judit, and Quintana-Ortí, Enrique S.
Published: 2010
Full Text: View/download PDF

39. Análisis cuantitativo del buffy coat en medicina veterinaria canina

Author: Lavín González, Santiago, Mora Igual, Francisco Javier, and Viñas Borrell, Luis
Subjects: Buffy coat, Perro, Dog, Hematología, Haematology
Abstract: En el presente trabajo se evalúa la utilidad clínica del análisis cuantitativo del bufly coat mediante el Sistema QBC-V para su uso en Medicina Veterinaria canina. Se compara el Sistema QBC-V frente al método manual de referencia en 100 muestras de sangre de perro. Los resultados obtenidos indican que existe una correlación muy elevada para el valor hematocrito y el recuento de leucocitos y granulocitos (neutrófilos, eosinófilos y basófilos), elevada para el recuento de agranulocitos (linfocitos y monocitos) y baja para las plaquetas. This study evaluates the clinical utility value of the "buffy coat" quantitative analysis by means of the QBC-V system for canine Veterinary Medicine purposes. Tbe QBC-V system is compared with the reference manual method in the blood samples of lOOdogs. Tbe results obtained shaw that the height of the "buffy coat" correlates very highly with the haematocrit value, totalleucocyte and granulocyte (neutropbils, eosinophils and basophils) count, highly with the agranulocyte (limphocytes and monocytes) count and lowly with the platelets count.
Published: 2021

40. Towards a Malleable Tensorflow Implementation

Author: Libutti, Leandro Ariel, Igual, Francisco, Piñuel, Luis, De Giusti, Laura Cristina, Naiouf, Marcelo, Rucci, Enzo, and Chichizola, Franco
Subjects: Flexibility (engineering), 020203 distributed computing, Co-scheduling, Computer science, TensorFlow, Distributed computing, Resource management, Inference, Ciencias Informáticas, Malleability, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Containers, Elasticity (cloud computing), 0202 electrical engineering, electronic engineering, information engineering, Parallelism (grammar), Hardware acceleration, Leverage (statistics), 0105 earth and related environmental sciences
Abstract: The TensorFlow framework was designed since its inception to provide multi-thread capabilities, extended with hardware accelerator support to leverage the potential of modern architectures. The amount of parallelism in current versions of the framework can be selected at multiple levels (intra- and inter-paralellism) under demand. However, this selection is fixed, and cannot vary during the execution of training/inference sessions. This heavily restricts the flexibility and elasticity of the framework, especially in scenarios in which multiple TensorFlow instances co-exist in a parallel architecture. In this work, we propose the necessary modifications within TensorFlow to support dynamic selection of threads, in order to provide transparent malleability to the infrastructure. Experimental results show that this approach is effective in the variation of parallelism, and paves the road towards future co-scheduling techniques for multi-TensorFlow scenarios. Instituto de Investigación en Informática
Published: 2020

41. Implementation and Performance Evaluation of a Semantic Image Segmentation System on a Mobile Device

Author: Piñuel Moreno, Luis, Igual, Francisco D., Carreño Alocén, Esther, Piñuel Moreno, Luis, Igual, Francisco D., and Carreño Alocén, Esther
Abstract: There has been a growth of interest in semantic segmentation in recent times, and its employment in tasks such as autonomous driving, medical diagnosis or video surveillance is now crucial. The training and inference processes in Deep Neural Networks (DNNs) are performed in data centres, which causes unbearable latency. Edge Computing is a response to this limitation. Nevertheless, it is restricted by the computing power and energy consumption in devices. This project proposes the implementation of algorithms for semantic segmentation of images using DeepLab and TensorFlow as a basis, along with its adaptation and throughput evaluation in terms of response time in a mobile device among different semantic segmentation models. A Raspberry Pi was used along with a Coral USB Accelerator by Google, which provides an Edge TPU to accelerate inference in Machine Learning by quantizing the models. The final goal is to prove an efficient implementation in this low energy consumption architecture., En los últimos tiempos ha habido un aumento en el interés por la segmentación semántica, y su uso en tareas como la conducción autónoma, el diagnóstico médico o la videovigilancia es crucial. Los procesos de entrenamiento e inferencia en Redes Neuronales Profundas (DNNs) se realizan en centros de datos, lo que causa una latencia insostenible. El Edge Computing es una respuesta a esta limitación, pero está restringido por la potencia computacional y el consumo de energía de los dispositivos. Este proyecto propone la implementación de algoritmos de segmentación semántica de imágenes usando como base DeepLab y TensorFlow, además de su adaptación y evaluación de rendimiento según el tiempo de respuesta en dispositivos móviles entre diferentes modelos de segmentación semántica. Para ello se ha utilizado una Raspberry Pi y se ha optado por el acelerador USB Coral, de Google, que ofrece un Edge TPU para acelerar la inferencia en Machine Learning mediante la cuantización. El objetivo final es demostrar que una implementación eficiente en una arquitectura de bajo consumo energético es posible.
Published: 2021

42. Resource Management for Power-Constrained HEVC Transcoding Using Reinforcement Learning

Author: Costero, Luis, primary, Iranfar, Arman, additional, Zapater, Marina, additional, Igual, Francisco D., additional, Olcoz, Katzalin, additional, and Atienza, David, additional
Published: 2020
Full Text: View/download PDF

43. Programming parallel dense matrix factorizations with look-ahead and OpenMP

Author: Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors, European Commission, Generalitat Valenciana, European Regional Development Fund, Ministerio de Economía y Competitividad, Ministerio de Educación, Cultura y Deporte, Comisión Interministerial de Ciencia y Tecnología, Catalán, Sandra, Castelló, Adrián, Igual, Francisco D., Rodríguez-Sánchez, Rafael, Quintana Ortí, Enrique Salvador, Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors, European Commission, Generalitat Valenciana, European Regional Development Fund, Ministerio de Economía y Competitividad, Ministerio de Educación, Cultura y Deporte, Comisión Interministerial de Ciencia y Tecnología, Catalán, Sandra, Castelló, Adrián, Igual, Francisco D., Rodríguez-Sánchez, Rafael, and Quintana Ortí, Enrique Salvador
Abstract: [EN] We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime.
Published: 2020

44. Integration and exploitation of intra-routine malleability in BLIS

Author: Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors, Comunidad de Madrid, Agencia Estatal de Investigación, European Regional Development Fund, Ministerio de Economía y Competitividad, Rodríguez-Sánchez, Rafael, Igual, Francisco D., Quintana-Ortí, Enrique S., Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors, Comunidad de Madrid, Agencia Estatal de Investigación, European Regional Development Fund, Ministerio de Economía y Competitividad, Rodríguez-Sánchez, Rafael, Igual, Francisco D., and Quintana-Ortí, Enrique S.
Abstract: [EN] Malleability is a property of certain applications (or tasks) that, given an external request or autonomously, can accommodate a dynamic modification of the degree of parallelism being exploited at runtime. Malleability improves resource usage (core occupation) on modern multicore architectures for applications that exhibit irregular and divergent execution paths and heavily depend on the underlying library performance to attain high performance. The integration of malleability within high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is nonexistent, and, in addition, it is difficult to attain given the rigidity of current application programming interfaces (APIs). In this paper, we overcome these issues presenting the integration of a malleability mechanism within BLIS, a high-performance and portable framework to implement BLAS-like operations. For this purpose, we leverage low-level (yet simple) APIs to integrate on-demand malleability across all Level-3 BLAS routines, and we demonstrate the performance benefits of this approach by means of a higher-level dense matrix operation: the LU factorization with partial pivoting and look-ahead
Published: 2020

45. Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding

Author: Costero, Luis, primary, Igual, Francisco D., additional, Olcoz, Katzalin, additional, and Tirado, Francisco, additional
Published: 2020
Full Text: View/download PDF

46. Integration and exploitation of intra-routine malleability in BLIS

Author: Rodríguez-Sánchez, Rafael, primary, Igual, Francisco D., additional, and Quintana-Ortí, Enrique S., additional
Published: 2019
Full Text: View/download PDF

47. Portability Study of an OpenCL Algorithm for Automatic Target Detection in Hyperspectral Images

Author: Bernabe, Sergio, primary, Garcia, Carlos, additional, Igual, Francisco D., additional, Botella, Guillermo, additional, Prieto-Matias, Manuel, additional, and Plaza, Antonio, additional
Published: 2019
Full Text: View/download PDF

48. STEEL-RT: combining single task–single executor model and expanded scheduling to ease heterogeneity exploitation

Author: Rey, Antón, primary, Igual, Francisco D., additional, and Prieto-Matías, Manuel, additional
Published: 2019
Full Text: View/download PDF

49. Bibliotecas De Álgebra Lineal Densa Conscientes De La Asimetría Del Procesador

Author: Alonso, Pedro, Catalán, Sandra, Costero, Luis, Herrero, José R., Igual, Francisco D., Quintana-Ortí, Enrique S., Olcoz, Katzalin, and Rodríguez-Sánchez, Rafael
Subjects: Álgebra lineal densa, BLAS, LAPACK, Procesadores multicore asimétricos, Multihilo, Computación de altas prestaciones
Abstract: En este artículo se presenta una implementación de BLAS, basada en la biblioteca BLIS, para AMPs. La evaluación de esta versión consciente de la asimetría se lleva a cabo a través de tres operaciones comunes de la biblioteca LAPACK: la factorización LU, la factorización Cholesky y la reducción a la forma tridiagonal. Los tests iniciales que emplean la implementación para AMPs directamente muestran las mejoras obtenidas con la adaptación del software al utilizarlo como base de las operaciones LAPACK, obteniendo hasta un 90% del rendimiento máximo esperado. Además, estas mejoras se ven incrementadas al combinar la versión diseñada para AMPs con un runtime, en cuyo caso se obtienen rendimientos hasta un 30% superiores respecto a la utilización directa de la versión para AMPs.
Published: 2017
Full Text: View/download PDF

50. Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL

Author: Badía, Jose M., primary, Belloch, Jose A., additional, Cobos, Maximo, additional, Igual, Francisco D., additional, and Quintana-Ortí, Enrique S., additional
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

168 results on '"Igual, Francisco"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources