9 results on '"Numerical linear algebra"'
Search Results
2. MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures.
- Author
-
Abdelfattah, Ahmad, Beams, Natalie, Carson, Robert, Ghysels, Pieter, Kolev, Tzanio, Stitt, Thomas, Vargas, Arturo, Tomov, Stanimire, and Dongarra, Jack
- Subjects
- *
NUMERICAL solutions for linear algebra , *MATRICES (Mathematics) , *LINEAR algebra , *LANDSCAPE architecture , *MAGMAS - Abstract
MAGMA (Matrix Algebra for GPU and Multicore Architectures) is a pivotal open-source library in the landscape of GPU-enabled dense and sparse linear algebra computations. With a repertoire of approximately 750 numerical routines across four precisions, MAGMA is deeply ingrained in the DOE software stack, playing a crucial role in high-performance computing. Notable projects such as ExaConstit, HiOP, MARBL, and STRUMPACK, among others, directly harness the capabilities of MAGMA. In addition, the MAGMA development team has been acknowledged multiple times for contributing to the vendors' numerical software stacks. Looking back over the time of the Exascale Computing Project (ECP), we highlight how MAGMA has adapted to recent changes in modern HPC systems, especially the growing gap between CPU and GPU compute capabilities, as well as the introduction of low precision arithmetic in modern GPUs. We also describe MAGMA's direct impact on several ECP projects. Maintaining portable performance across NVIDIA and AMD GPUs, and with current efforts toward supporting Intel GPUs, MAGMA ensures its adaptability and relevance in the ever-evolving landscape of GPU architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. GROWTH FACTORS OF ORTHOGONAL MATRICES AND LOCAL BEHAVIOR OF GAUSSIAN ELIMINATION WITH PARTIAL AND COMPLETE PIVOTING.
- Author
-
PECA-MEDLIN, JOHN
- Subjects
- *
NUMERICAL solutions for linear algebra , *GAUSSIAN elimination , *GROWTH factors , *LINEAR systems , *FACTOR analysis - Abstract
Gaussian elimination (GE) is the most used dense linear solver. Error analysis of GE with selected pivoting strategies on well-conditioned systems can focus on studying the behavior of growth factors. Although exponential growth is possible with GE with partial pivoting (GEPP), growth tends to stay much smaller in practice. Support for this behavior was provided recently by Huang and Tikhomirov's average-case analysis of GEPP, which showed GEPP growth factors for Gaussian matrices stay at most polynomial with very high probability. GE with complete pivoting (GECP) has also seen a lot of recent interest, with improvements to both lower and upper bounds on worst-case GECP growth provided by Bisain, Edelman, and Urschel in 2023. We are interested in studying how GEPP and GECP behave on the same linear systems as well as studying large growth on particular subclasses of matrices, including orthogonal matrices. Moreover, as a means to better address the question of why large growth is rarely encountered, we further study matrices with a large difference in growth between using GEPP and GECP, and we explore how the smaller growth strategy dominates behavior in a small neighborhood of the initial matrix. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. On the Sample Complexity of Stabilizing Linear Dynamical Systems from Data.
- Author
-
Werner, Steffen W. R. and Peherstorfer, Benjamin
- Subjects
- *
LINEAR dynamical systems , *DYNAMICAL systems , *SCIENCE education , *NUMERICAL solutions for linear algebra - Abstract
Learning controllers from data for stabilizing dynamical systems typically follows a two-step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) n , then there always exist n states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. FAST AND ACCURATE RANDOMIZED ALGORITHMS FOR LINEAR SYSTEMS AND EIGENVALUE PROBLEMS.
- Author
-
YUJI NAKATSUKASA and TROPP, JOEL A.
- Subjects
- *
LINEAR systems , *NUMERICAL solutions for linear algebra , *EIGENVALUES - Abstract
This paper develops a class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized dimension reduction ("sketching") to accelerate standard subspace projection methods, such as GMRES and Rayleigh-Ritz. This modification makes it possible to incorporate nontraditional bases for the approximation subspace that are easier to construct. When the basis is numerically full rank, the new algorithms have accuracy similar to classic methods but run faster and may use less storage. For model problems, numerical experiments show large advantages over the optimized MATLAB routines, including a 70x speedup over gmres and a 10x speedup over eigs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. ADAPTIVE PRECISION SPARSE MATRIX-VECTOR PRODUCT AND ITS APPLICATION TO KRYLOV SOLVERS.
- Author
-
GRAILLAT, STEF, JÉZÉQUEL, FABIENNE, MARY, THEO, and MOLINA, ROMÉO
- Subjects
- *
SPARSE matrices , *NUMERICAL solutions for linear algebra , *COMPUTER algorithms , *FLOATING-point arithmetic , *LINEAR systems , *ADAPTIVE control systems - Abstract
We introduce a mixed precision algorithm for computing sparse matrix-vector products and use it to accelerate the solution of sparse linear systems by iterative methods. Our approach is based on the idea of adapting the precision of each matrix element to their magnitude: we split the elements into buckets and use progressively lower precisions for the buckets of progressively smaller elements. We carry out a rounding error analysis of this algorithm that provides us with an explicit rule to decide which element goes into which bucket and allows us to rigorously control the accuracy of the algorithm. We implement the algorithm on a multicore computer and obtain significant speedups (up to a factor 7\times) with respect to uniform precision algorithms, without loss of accuracy, on a range of sparse matrices from real-life applications. We showcase the effectiveness of our algorithm by plugging it into various Krylov solvers for sparse linear systems and observe that the convergence of the solution is essentially unaffected by the use of adaptive precision. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. COMMUNICATION AVOIDING BLOCK LOW-RANK PARALLEL MULTIFRONTAL TRIANGULAR SOLVE WITH MANY RIGHT-HAND SIDES.
- Author
-
AMESTOY, PATRICK, BOITEAU, OLIVIER, BUTTARI, ALFREDO, GEREST, MATTHIEU, JÉZÉQUEL, FABIENNE, LÉXCELLENT, JEAN-YVES, and MARY, THEO
- Subjects
- *
NUMERICAL solutions for linear algebra , *PATTERNS (Mathematics) , *SPARSE matrices , *LOW-rank matrices - Abstract
Block low-rank (BLR) compression can significantly reduce the memory and time costs of parallel sparse direct solvers. In this paper, we investigate the performance of the BLR triangular solve phase, which we observe to be underwhelming when dealing with many right-hand sides (RHS). We explain that this is because the bottleneck of the triangular solve is not in accessing the BLR LU factors, but rather in accessing the RHS, which are uncompressed. Motivated by this finding, we propose several new hybrid variants, which combine the right-looking and left-looking communication patterns to minimize the number of accesses to the RHS. We confirm via a theoretical analysis that these new variants can significantly reduce the total communication volume. We assess the impact of this reduction on the time performance on a range of real-life applications using the MUMPS solver, obtaining up to 20% time reduction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Pseudospectral Divide-and-Conquer for the Generalized Eigenvalue Problem
- Author
-
Schneider, Ryan
- Subjects
Applied mathematics ,Eigenvalue problems ,Numerical linear algebra ,Random matrices - Abstract
\indent Over the last two decades, randomization has emerged as a leading tool for pursuing efficiency in numerical linear algebra. Its benefits can be explained in part by smoothed analysis, where an algorithm that fails spectacularly in certain cases may be unlikely to do so on random -- or randomly perturbed -- inputs. This observation implies a simple framework for developing accurate and efficient randomized algorithms:\ apply a random perturbation and run an existing method, whose worst-case error (or run-time) can be avoided with high probability. \\\indent Recent work of Banks, Garza-Vargas, Kulkarni, and Srivastava applied this framework to the standard eigenvalue problem, developing a randomized algorithm that (with high probability) diagonalizes a matrix in nearly matrix multiplication time [Foundations of Computational Math 2022]. Central to their work is the phenomenon of \textit{pseudospectral shattering}, in which a small Gaussian perturbation regularizes the pseudospectrum of a matrix, with high probability breaking it into disjoint components and allowing classical, divide-and-conquer eigensolvers to run successfully. Prior to their work, no way of accessing the benefits of divide-and-conquer’s natural parallelization was known in general. \\\indent In this thesis, we extend the work of Banks et al.\ to the generalized eigenvalue problem -- e.g., $Av = \lambda Bv$ for matrices $A,B \in {\mathbb C}^{n \times n}$. Our main contributions can be summarized as follows.\begin{enumerate}[itemsep=0em] \item First, we show that pseudospectral shattering generalizes directly:\ randomly perturbing $A$ and $B$ has a similar regularizing effect on the pseudospectra of the corresponding matrix pencil $(A,B)$ with high probability. \item Building on pseudospectral shattering, we construct and analyze a fast, randomized, divide-and-conquer algorithm for diagonalizing $(A,B)$, which begins by randomly perturbing the inputs. \item Finally, we demonstrate that both pseudospectral shattering and the corresponding diagonalization algorithm can be adapted to definite pencils, further pursuing efficiency by preserving and exploiting symmetry. \end{enumerate}\indent The resulting algorithm, which we call pseudospectral divide-and-conquer, is the first general, sub-$O(n^3)$ solver for the generalized eigenvalue problem. It is not only highly parallel and capable of accommodating structure, but also promotes stability by avoiding matrix inversion. In essence, this thesis is a handbook for understanding, adapting, and implementing the method.
- Published
- 2024
9. High Performance Computing for the Optimization of Radiation Therapy Treatment Plans
- Author
-
Liu, Felix and Liu, Felix
- Abstract
Radiation therapy is a clinical field in which computer simulations play a crucial role. Before patients undergo radiation therapy, an individual treatment plan for each patient needs to be created based on the specifics of their case (a process often referred to as treatment planning). The main aspect of this is setting the control parameters for the treatment machine so that the radioactive dose delivered to the patient is as concentrated to the tumor volume as possible. The inverse problem of determining such appropriate control parameters is typically formulated as an optimization problem, which, considering the complexity of modern treatment machines, requires computerized algorithms to solve accurately. Solving this optimization problem can be a key computational bottleneck in the treatment planning workflow. In many cases, finding a suitable treatment plan is a trial-and-error process, requiring multiple solutions of the optimization problems with different weights and parameters. This thesis proposes different methods to enable the use of high performance computing (HPC) hardware to accelerate the optimization process in radiation therapy. We deal with two main computational aspects of the optimization workflow, the calculation of dose, gradients and objective functions; as well as the optimization solver itself. For dose calculation during optimization, we propose a CUDA kernel for sparse matrix-vector products tuned to dose deposition matrices from proton therapy. For the evaluation of the objective function -- which is often constructed as a weighted sum -- we develop a method to distribute the calculation of the objective function and its gradient across computational nodes using message passing. For the optimization solver itself we first propose a task-based parallel implementation for Cholesky factorization of banded matrices. This can be an important kernel in interior point methods (IPM) for optimization, depending on the structure of the optimizati, Strålterapi är en klinisk gren där datorsimuleringar spelar en viktig roll. Innan patienters strålbehandlingar påbörjas behövs individuella behandlingsplaner skapas baserat på varje patientfalls specifika omständigheter (en process som ofta kallas dosplanering (eng. treament planning)). Huvudaspekten av denna process är att bestämma styrparametrar för behandlingsmaskinen så att stråldosen som levereras till patienten är så koncentrerad till tumören som möjligt. Det inversa problemet att bestämma sådana styrparametrar formuleras typiskt som ett optimeringsproblem, vilket, givet hur komplexa moderna behandlingsmaskiner är, kräver datoralgoritmer för att lösas precist. Att lösa detta optimeringsproblem kan vara en viktig flaskhals i dosplaneringsprocessen. I många fall är det en iterativ process att hitta en lämplig behandlingsplan som kräver att man testar och löser optimeringsproblemet med flera olika vikter och parametrar. I denna avhandling studerar vi metoder för att möjliggöra användandet av hårdvara från högprestandaberäkningar för att accelerera optimeringsprocessen i strålterapi. Vi angriper två olika beräkningsaspekter i lösningen av optimeringsproblemet: beräkningen av stråldos, gradienter och målfunktioner, samt lösningen av optimeringsproblemet i sig. För dosberäkningen som krävs under optimeringen utvecklar vi en beräkningsmetod i CUDA för matris-vektor produkter med glesa matriser som är specifikt anpassad för dosmatriser (eng. dose deposition matrices) från protonterapi. För beräkningen av målfunktionen --- som i många fall är formulerad som en viktad summa --- utvecklar vi en metod för att distribuera beräkningen av målfunktionen och dess gradient över flera beräkningsnoder genom message passing. För optimeringslösaren själv så utvecklar vi först en parallel implementation av Cholesky-faktorisering för bandade matriser med parallelisering baserad på tasks. Just den beräkningsalgoritmen kan vara viktig för många inrepunktsmetoder för optimering, baserat, QC 20140423
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.