Author: "Felix Wolf" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. Mechanical Characterization of Nb3Sn Cable Insulation Systems Used for HL-LHC Accelerator Magnets at Ambient Temperature

Author: Felix Wolf, Arnaud Foussat, Friedrich Lackner, Stefano Sgobba, Mickael Crouvizier, Simon Canfer, and Steve Robertson
Subjects: Electrical and Electronic Engineering, Condensed Matter Physics, Electronic, Optical and Magnetic Materials
Published: 2022
Full Text: View/download PDF

2. Discovery of a Cryptic Nitro Intermediate in the Biosynthesis of the 3-(trans-2′-Aminocyclopropyl)alanine Moiety of Belactosin A

Author: Alicia Engelbrecht, Felix Wolf, Annika Esch, Andreas Kulik, Sergei I. Kozhushkov, Armin de Meijere, Chambers C. Hughes, and Leonard Kaysser
Subjects: Organic Chemistry, Physical and Theoretical Chemistry, Biochemistry
Published: 2022
Full Text: View/download PDF

3. Keeping up with technology: Teaching parallel, distributed, and high-performance computing

Author: Sushil Prasad, Sheikh Ghafoor, Martina Barnas, Felix Wolf, Erik Saule, Noemi Rodriguez, and Rizos Sakellariou
Subjects: Artificial Intelligence, Computer Networks and Communications, Hardware and Architecture, Software, Theoretical Computer Science
Published: 2022
Full Text: View/download PDF

4. ElastiSim: A Batch-System Simulator for Malleable Workloads

Author: Taylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard, and Felix Wolf
Published: 2022
Full Text: View/download PDF

5. Solving Maxwell's eigenvalue problem via isogeometric boundary elements and a contour integral method

Author: Sebastian Schöps, Felix Wolf, Gerhard Unger, and Stefan Kurz
Subjects: FOS: Computer and information sciences, Discretization, General Mathematics, General Engineering, Boundary (topology), Numerical Analysis (math.NA), 010103 numerical & computational mathematics, Contour integral method, Physics::Classical Physics, 01 natural sciences, Mathematics::Numerical Analysis, Computational Engineering, Finance, and Science (cs.CE), 010101 applied mathematics, FOS: Mathematics, 34L16, 35P30, 65N38, 65D07, Applied mathematics, Mathematics - Numerical Analysis, 0101 mathematics, Computer Science - Computational Engineering, Finance, and Science, Eigenvalues and eigenvectors, Mathematics
Abstract: We solve Maxwell's eigenvalue problem via isogeometric boundary elements and a contour integral method. We discuss the analytic properties of the discretisation, outline the implementation, and showcase numerical examples.
Published: 2021
Full Text: View/download PDF

6. A Numerical Comparison of an Isogeometric and a Parametric Higher Order Raviart–Thomas Approach to the Electric Field Integral Equation

Author: Felix Wolf, Sebastian Schöps, Jürgen Dölz, and Stefan Kurz
Subjects: Curvilinear coordinates, Discretization, Degrees of freedom (statistics), 020206 networking & telecommunications, 02 engineering and technology, Electric-field integral equation, Computer Science::Numerical Analysis, Integral equation, Mathematics::Numerical Analysis, 0202 electrical engineering, electronic engineering, information engineering, Order (group theory), Applied mathematics, Electrical and Electronic Engineering, Focus (optics), Parametric statistics, Mathematics
Abstract: We discuss numerical experiments to compare an isogeometric discretization of the electric field integral equation and a parametric Raviart–Thomas approach. Therein, we focus on accuracy with respect to degrees of freedom, briefly commenting on the conditioning of the system as well. Due to the utilization of parametric mappings even in the Raviart–Thomas approach, our investigation disregards any errors induced by meshing, which commonly favors the isogeometric approach when compared to curvilinear higher order Raviart–Thomas elements.
Published: 2020
Full Text: View/download PDF

7. Discovery of a Cryptic Nitro Intermediate in the Biosynthesis of the 3-(

Author: Alicia, Engelbrecht, Felix, Wolf, Annika, Esch, Andreas, Kulik, Sergei I, Kozhushkov, Armin, de Meijere, Chambers C, Hughes, and Leonard, Kaysser
Subjects: Alanine
Abstract: Belactosin A, a β-lactone proteasome inhibitor, contains a unique 3-(
Published: 2022

8. Simulating Structural Plasticity of the Brain more Scalable than Expected

Author: Fabian Czappa, Alexander Geiß, and Felix Wolf
Subjects: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Computer Science - Distributed, Parallel, and Cluster Computing, Artificial Intelligence, Computer Networks and Communications, Hardware and Architecture, Computer Science - Neural and Evolutionary Computing, Distributed, Parallel, and Cluster Computing (cs.DC), Neural and Evolutionary Computing (cs.NE), Software, Theoretical Computer Science
Abstract: Structural plasticity of the brain describes the creation of new and the deletion of old synapses over time. Rinke et al. (JPDC 2018) introduced a scalable algorithm that simulates structural plasticity for up to one billion neurons on current hardware using a variant of the Barnes-Hut algorithm. They demonstrate good scalability and prove a runtime complexity of $O(n \log^2 n)$. In this comment paper, we show that with careful consideration of the algorithm and a rigorous proof, the theoretical runtime can even be classified as $O(n \log n)$.
Published: 2022
Full Text: View/download PDF

9. Accelerating Brain Simulations with the Fast Multipole Method

Author: Fabian Czappa, Hannah Nöttgen, and Felix Wolf
Published: 2022
Full Text: View/download PDF

10. Resistance to Sustainability Policies and the Role of Public Participation: Key Lessons, Challenges, and Research Needs

Author: S. Slingerland, Maxime Köse, and Felix Wolf
Subjects: History, Polymers and Plastics, Business and International Management, Industrial and Manufacturing Engineering
Published: 2022
Full Text: View/download PDF

11. Tool-Supported Mini-App Extraction to Facilitate Program Analysis and Parallelization

Author: Florian Dewald, Heiko Mantel, Christian Bischof, Felix Wolf, Mohammad Norouzi, and Jan-Patrick Lehr
Subjects: Reduction (complexity), Identification (information), Source lines of code, Program analysis, Computer science, Automatic identification and data capture, Code (cryptography), Key (cryptography), Cyclomatic complexity, Parallel computing
Abstract: The size and complexity of high-performance computing applications present a serious challenge to manual reasoning about program behavior. The vastness and diversity of code bases often break automatic analysis tools, which could otherwise be used. As a consequence, developers resort to mini-apps, i.e., trimmed-down proxies of the original programs that retain key performance characteristics. Unfortunately, their construction is difficult and time consuming and prevents their mass production. In this paper, we propose a systematic and tool-supported approach to extract mini-apps from large-scale applications that reduces the manual effort needed to create them. Our approach covers the stages kernel identification, data capture, code extraction and representativeness validation. We demonstrate it using an astrophysics simulation with ≈ 8.5 million lines of code and extract a mini-app with only ≈ 1, 100 lines of code. For the mini-app, we evaluate the reduction of code complexity and execution similarity, and show how it enables the tool-supported discovery of unexploited parallelization opportunities, reducing the simulation’s runtime significantly.
Published: 2021
Full Text: View/download PDF

12. Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations

Author: Yannick Berens, Torsten Hoefler, Alexandru Calotoiu, Felix Wolf, Alexandre Strube, and Sergei Shudler
Subjects: 020203 distributed computing, Computer science, 02 engineering and technology, Benchmarking, Supercomputer, Field (computer science), Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, Regression testing, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Algorithm
Abstract: Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm and performance engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI and parallel sorting algorithms as examples, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.
Published: 2019
Full Text: View/download PDF

13. Mechanical Analysis of the Collaring Process of the 11 T Dipole Magnet

Author: Felix Wolf, Philippe Grosclaude, Jose Ferradas Troitino, Frederic Savary, Giorgio Vallone, Salvador Ferradas Troitino, E. Nilsson, Luca Bottura, Arnaud Devred, C. Loffler, Michael Daly, Susana Izquierdo Bermudez, Paolo Ferracin, Nicolas Bourcey, Michael Guinchard, Juan Carlos Perez, and Jose Luis Rudeiros Fernandez
Subjects: Large Hadron Collider, Materials science, Nuclear engineering, Condensed Matter Physics, 01 natural sciences, Electronic, Optical and Magnetic Materials, Stress (mechanics), Dipole, Dipole magnet, Mockup, Electromagnetic coil, Magnet, 0103 physical sciences, Electrical and Electronic Engineering, 010306 general physics, Strain gauge
Abstract: As part of the Large Hadron Collider (LHC) accelerator upgrades foreseen by the high luminosity-LHC project, the CERN 11 T program is aimed at replacing standard LHC Nb-Ti main dipole magnets, operating with a bore field of 8.3 T, with pairs of shorter Nb 3 Sn dipole magnets with a bore field of 11 T and the same total integrated field, thus providing space for additional collimators in the dispersion suppressor region. At the time of the submission of this paper, six single-aperture and two double-aperture short models have been fabricated and tested. As a result of a degraded quench performance observed in some of the short models, attributed to excessive stress on the Nb3Sn coil mid-planes, a thorough investigation of the room temperature loading procedure, and in particular of the collaring process, has been launched. A 150-mm-long collared coil mockup, instrumented with strain gauges and pressure sensitive films, has been used to study the peak stresses experienced by the brittle and strain sensitive Nb 3 Sn cables in the different phases of the collaring and as a function of coils' size and collaring force. In this paper, the results of the test campaign are described.
Published: 2019
Full Text: View/download PDF

14. Isogeometric Boundary Elements in Electromagnetism: Rigorous Analysis, Fast Methods, and Examples

Author: Sebastian Schöps, Felix Wolf, Stefan Kurz, and Jürgen Dölz
Subjects: Scattering, Applied Mathematics, Mathematical analysis, Boundary (topology), Numerical Analysis (math.NA), 010103 numerical & computational mathematics, Physics::Classical Physics, Computer Science::Numerical Analysis, 01 natural sciences, Mathematics::Numerical Analysis, 65D07, 65N38, 65Y20, Computational Mathematics, Electromagnetism, FOS: Mathematics, Computer Science::Mathematical Software, Mathematics - Numerical Analysis, 0101 mathematics, Fast methods, Boundary element method, Mathematics
Abstract: We present a new approach to three-dimensional electromagnetic scattering problems via fast isogeometric boundary element methods. Starting with an investigation of the theoretical setting around the electric field integral equation within the isogeometric framework, we show existence, uniqueness, and quasi-optimality of the isogeometric approach. For a fast and efficient computation, we then introduce and analyze an interpolation-based fast multipole method tailored to the isogeometric setting, which admits competitive algorithmic and complexity properties. This is followed by a series of numerical examples of industrial scope, together with a detailed presentation and interpretation of the results.
Published: 2019
Full Text: View/download PDF

15. Noise-Resilient Empirical Performance Modeling with Deep Neural Networks

Author: Alexandru Calotoiu, Alexander Geis, Johannes Wehrstein, Felix Wolf, Torsten Hoefler, Marcus Ritter, and Thorsten Reimann
Subjects: Noise, Artificial neural network, Computer science, Linear regression, Predictive power, Data mining, Set (psychology), computer.software_genre, computer, Scaling, Regression, Data modeling
Abstract: Empirical performance modeling is a proven instrument to analyze the scaling behavior of HPC applications. Using a set of smaller-scale experiments, it can provide important insights into application behavior at larger scales. Extra-P is an empirical modeling tool that applies linear regression to automatically generate human-readable performance models. Similar to other regression-based modeling techniques, the accuracy of the models created by Extra-P decreases as the amount of noise in the underlying data increases. This is why the performance variability observed in many contemporary systems can become a serious challenge. In this paper, we introduce a novel adaptive modeling approach that makes Extra-P more noise resilient, exploiting the ability of deep neural networks to discover the effects of numerical parameters, such as the number of processes or the problem size, on performance when dealing with noisy measurements. Using synthetic analysis and data from three different case studies, we demonstrate that our solution improves the model accuracy at high noise levels by up to 25% while increasing their predictive power by about 15%.
Published: 2021
Full Text: View/download PDF

16. Learning to Make Compiler Optimizations More Effective

Author: Michael Pradel, Felix Wolf, Marija Selakovic, and Rahim Mammadli
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Source code, Speedup, Computer Science - Programming Languages, Computer science, media_common.quotation_subject, Optimizing compiler, 020207 software engineering, 02 engineering and technology, Parallel computing, computer.software_genre, 020202 computer hardware & architecture, Machine Learning (cs.LG), 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Compiler, Heuristics, computer, Compiled language, Invariant (computer science), media_common, Programming Languages (cs.PL)
Abstract: Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer number of possible loop optimizations, including their combinations, exacerbates the problem further. Today's compilers use hard-coded heuristics to decide when, whether, and which of a limited set of optimizations to apply. Often, this leads to highly unstable behavior, making the success of compiler optimizations dependent on the precise way a loop has been written. This paper presents LoopLearner, which addresses the problem of compiler instability by predicting which way of writing a loop will lead to efficient compiled code. To this end, we train a neural network to find semantically invariant source-level transformations for loops that help the compiler generate more efficient code. Our model learns to extract useful features from the raw source code and predicts the speedup that a given transformation is likely to yield. We evaluate LoopLearner with 1,895 loops from various performance-relevant benchmarks. Applying the transformations that our model deems most favorable prior to compilation yields an average speedup of 1.14x. When trying the top-3 suggested transformations, the average speedup even increases to 1.29x. Comparing the approach with an exhaustive search through all available code transformations shows that LoopLearner helps to identify the most beneficial transformations in several orders of magnitude less time., Comment: 15 pages, 4 figures
Published: 2021
Full Text: View/download PDF

17. Analysis and Implementation of Isogeometric Boundary Elements for Electromagnetism

Author: Felix Wolf
Published: 2021
Full Text: View/download PDF

18. Isogeometric Boundary Elements

Author: Felix Wolf
Subjects: Mathematical analysis, Computer Science::Mathematical Software, Boundary (topology), Computer Science::Numerical Analysis, Boundary element method, Mathematics::Numerical Analysis, Mathematics
Abstract: This chapter introduces isogeometric discretisations and discusses analytical properties of the corresponding discrete spaces with boundary element methods in mind.
Published: 2020
Full Text: View/download PDF

19. Numerical Examples: Electromagnetic Scattering

Author: Felix Wolf
Subjects: Presentation, Corollary, Scattering, Computer science, media_common.quotation_subject, Benchmark (computing), Applied mathematics, media_common
Abstract: This chapter is devoted to the presentation of multiple numerical examples. These are designed to verify the behaviour predicted in Theorem 3.48 and Corollary 3.49, and to benchmark the method.
Published: 2020
Full Text: View/download PDF

20. Final Remarks

Author: Felix Wolf
Published: 2020
Full Text: View/download PDF

21. Algorithmic Considerations for Matrix Assembly

Author: Felix Wolf
Subjects: Matrix (mathematics), Discretization, Computer science, Linear system, Applied mathematics
Abstract: In this chapter, we will review the algorithmic approach to the matrix assembly of the linear system induced by the isogeometric discretisation of the EFIE.
Published: 2020
Full Text: View/download PDF

22. The Discrete Eigenvalue Problem

Author: Felix Wolf
Subjects: Physics, Computation, Mathematical analysis, Eigenvalues and eigenvectors
Abstract: This chapter is devoted to the discussion of the solution of Problem 2.32, i.e., the computation of resonant frequencies within perfectly conducting structures.
Published: 2020
Full Text: View/download PDF

23. Dynamic Multi-objective Scheduling of Microservices in the Cloud

Author: Hamid Mohammadi Fard, Felix Wolf, and Radu Prodan
Subjects: Job shop scheduling, Emerging technologies, Computer science, business.industry, Distributed computing, 05 social sciences, 050801 communication & media studies, Cloud computing, 02 engineering and technology, Microservices, Service provider, Scheduling (computing), 0508 media and communications, Knapsack problem, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, business
Abstract: For many applications, a microservices architecture promises better performance and flexibility compared to a conventional monolithic architecture. In spite of the advantages of a microservices architecture, deploying microservices poses various challenges for service developers and providers alike. One of these challenges is the efficient placement of microservices on the cluster nodes. Improper allocation of microservices can quickly waste resource capacities and cause low system throughput. In the last few years, new technologies in orchestration frameworks, such as the possibility of multiple schedulers for pods in Kubernetes, have improved scheduling solutions of microservices but using these technologies needs to involve both the service developer and the service provider in the behavior analysis of workloads. Using memory and CPU requests specified in the service manifest, we propose a general microservices scheduling mechanism that can operate efficiently in private clusters or enterprise clouds. We model the scheduling problem as a complex variant of the knapsack problem and solve it using a multi-objective optimization approach. Our experiments show that the proposed mechanism is highly scalable and simultaneously increases utilization of both memory and CPU, which in turn leads to better throughput when compared to the state-of-the-art.
Published: 2020
Full Text: View/download PDF

24. Foundations

Author: Felix Wolf
Published: 2020
Full Text: View/download PDF

25. Static Neural Compiler Optimization via Deep Reinforcement Learning

Author: Felix Wolf, Rahim Mammadli, and Ali Jannesari
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Source code, Speedup, Artificial neural network, Computer science, media_common.quotation_subject, Optimizing compiler, Machine Learning (stat.ML), computer.software_genre, Machine Learning (cs.LG), Set (abstract data type), Computer engineering, Statistics - Machine Learning, Reinforcement learning, Compiler, computer, media_common, Complement (set theory)
Abstract: The phase-ordering problem of modern compilers has received a lot of attention from the research community over the years, yet remains largely unsolved. Various optimization sequences exposed to the user are manually designed by compiler developers. In designing such a sequence developers have to choose the set of optimization passes, their parameters and ordering within a sequence. Resulting sequences usually fall short of achieving optimal runtime for a given source code and may sometimes even degrade the performance when compared to unoptimized version. In this paper, we employ a deep reinforcement learning approach to the phase-ordering problem. Provided with sub-sequences constituting LLVM's O3 sequence, our agent learns to outperform the O3 sequence on the set of source codes used for training and achieves competitive performance on the validation set, gaining up to 1.32x speedup on previously-unseen programs. Notably, our approach differs from autotuning methods by not depending on one or more test runs of the program for making successful optimization decisions. It has no dependence on any dynamic feature, but only on the statically-attainable intermediate representation of the source code. We believe that the models trained using our approach can be integrated into modern compilers as neural optimization agents, at first to complement, and eventually replace the hand-crafted optimization sequences., Comment: 11 pages, 5 figures
Published: 2020
Full Text: View/download PDF

26. Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling

Author: Alexandru Calotoiu, Felix Wolf, Thorsten Reimann, Torsten Hoefler, Sebastian Rinke, and Marcus Ritter
Subjects: Computational complexity theory, business.industry, Heuristic, Computer science, Design of experiments, Sampling (statistics), Machine learning, computer.software_genre, Data modeling, Set (abstract data type), Scalability, Reinforcement learning, Artificial intelligence, business, computer
Abstract: Identifying scalability bottlenecks in parallel applications is a vital but also laborious and expensive task. Empirical performance models have proven to be helpful to find such limitations, though they require a set of experiments in order to gain valuable insights. Therefore, the experiment design determines the quality and cost of the models. Extra-P is an empirical modeling tool that uses small-scale experiments to assess the scalability of applications. Its current version requires an exponential number of experiments per model parameter. This makes the creation of empirical performance models very expensive, and in some situations even impractical. In this paper, we propose a novel parameter-value selection heuristic, which functions as a guideline for the experiment design, leveraging sparse performance-modeling, a technique that only needs a polynomial number of experiments per model parameter. Using synthetic analysis and data from three different case studies, we show that our solution reduces the average modeling costs by about 85% while retaining 92% of the model accuracy.
Published: 2020
Full Text: View/download PDF

27. Accelerating winograd convolutions using symbolic computation and meta-programming

Author: Matthew W. Moskewicz, Ali Jannesari, Felix Wolf, Tim Beringer, and Arya Mazaheri
Subjects: Computational complexity theory, business.industry, Computer science, Deep learning, 020206 networking & telecommunications, 02 engineering and technology, Parallel computing, Symbolic computation, Metaprogramming, Convolution, CUDA, Software portability, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Code generation, Artificial intelligence, business
Abstract: Convolution operations are essential constituents of convolutional neural networks. Their efficient and performance-portable implementation demands tremendous programming effort and fine-tuning. Winograd's minimal filtering algorithm is a well-known method to reduce the computational complexity of convolution operations. Unfortunately, existing implementations of this algorithm are either vendor-specific or hard-coded to support a small subset of convolutions, thus limiting their versatility and performance portability. In this paper, we propose a novel method to optimize Winograd convolutions based on symbolic computation. Taking advantage meta-programming and auto-tuning, we further introduce a system to automate the generation of efficient and portable Winograd convolution code for various GPUs. We show that our optimization technique can effectively exploit repetitive patterns, enabling us to reduce the number of arithmetic operations by up to 62% without compromising numerical stability. Moreover, we demonstrate in experiments that we can generate efficient kernels with runtimes close to deep-learning libraries, requiring only a minimum of programming effort, which confirms the performance portability of our approach.
Published: 2020
Full Text: View/download PDF

28. Safer Parallelization

Author: Reiner Hähnle, Asmae Heydari Tabar, Arya Mazaheri, Mohammad Norouzi, Dominic Steinhöfel, and Felix Wolf
Published: 2020
Full Text: View/download PDF

29. A Container-Driven Approach for Resource Provisioning in Edge-Fog Cloud

Author: Radu Prodan, Felix Wolf, and Hamid Mohammadi Fard
Subjects: Ubiquitous computing, Computer science, business.industry, Distributed computing, Swarm behaviour, 020206 networking & telecommunications, 020207 software engineering, Provisioning, Cloud computing, 02 engineering and technology, Microservices, Virtualization, computer.software_genre, Scheduling (computing), 0202 electrical engineering, electronic engineering, information engineering, Orchestration (computing), business, computer, Edge computing
Abstract: With the emerging Internet of Things (IoT), distributed systems enter a new era. While pervasive and ubiquitous computing already became reality with the use of the cloud, IoT networks present new challenges because the ever growing number of IoT devices increases the latency of transferring data to central cloud data centers. Edge and fog computing represent practical solutions to counter the huge communication needs between IoT devices and the cloud. Considering the complexity and heterogeneity of edge and fog computing, however, resource provisioning remains the Achilles heel of efficiency for IoT applications. According to the importance of operating-system virtualization (so-called containerization), we propose an application-aware container scheduler that helps to orchestrate dynamic heterogeneous resources of edge and fog architectures. By considering available computational capacity, the proximity of computational resources to data producers and consumers, and the dynamic system status, our proposed scheduling mechanism selects the most adequate host to achieve the minimum response time for a given IoT service. We show how a hybrid use of containers and serverless microservices improves the performance of running IoT applications in fog-edge clouds and lowers usage fees. Moreover, our approach outperforms the scheduling mechanisms of Docker Swarm.
Published: 2020
Full Text: View/download PDF

30. Skipping Non-essential Instructions Makes Data-Dependence Profiling Faster

Author: Nicolas Morew, Mohammad Norouzi, Ali Jannesari, and Felix Wolf
Subjects: 010302 applied physics, Profiling (computer programming), Computer science, Data dependence, 020207 software engineering, 02 engineering and technology, Parallel computing, Static analysis, computer.software_genre, 01 natural sciences, Pointer (computer programming), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Compiler, computer, Compile time
Abstract: Data-dependence profiling is a dynamic program-analysis technique to discover potential parallelism in sequential programs. Unlike purely static analysis, which may overestimate the number of dependences because it does not know many pointers values and array indices at compile time, profiling has the advantage of recording data dependences that actually occur at runtime. But it has the disadvantage of significantly slowing down program execution, often by a factor of 100. In our earlier work, we lowered the overhead of data-dependence profiling by excluding polyhedral loops, which can be handled statically using certain compilers. However, neither does every program contain polyhedral loops, nor are statically identifiable dependences restricted to such loops. In this paper, we introduce an orthogonal approach, focusing on data dependences between accesses to scalar variables - across the entire program, inside and outside loops. We first analyze the program statically and identify memory-access instructions that create data dependences that would appear in any execution of these instructions. Then, we exclude these instructions from instrumentation, allowing the profiler to skip them at runtime and avoid the associated overhead. We evaluate our approach with 49 benchmarks from three benchmark suites. We improved the profiling time of all programs by at least 38%, with a median reduction of 61% across all the benchmarks.
Published: 2020
Full Text: View/download PDF

31. Efficient Ephemeris Models for Spacecraft Trajectory Simulations on GPUs

Author: Felix Wolf, Arya Mazaheri, Fabian Schrammel, and Florian Renk
Subjects: 020301 aerospace & aeronautics, Planetary protection, Spacecraft, business.industry, Computer science, Locality, 02 engineering and technology, Thread (computing), Ephemeris, 01 natural sciences, Computational science, High memory, 0203 mechanical engineering, 0103 physical sciences, Initial value problem, business, 010303 astronomy & astrophysics, Space debris
Abstract: When a spacecraft is released into space, its initial condition and future trajectory in terms of position and speed cannot be precisely predicted. To ensure that the object does not violate space debris mitigation or planetary protection standards, such that it causes potential damage or contamination of celestial bodies, spacecraft-mission designers conduct a multitude of simulations to verify the validity of the set of all probable trajectories. Such simulations are usually independent from each other, making them a perfect match for parallelization. The European Space Agency (ESA) developed a GPU-based simulator for this purpose and achieved reasonable speedups in comparison with the established multi-threaded CPU version. However, we noticed that the performance starts to degrade as the spacecraft trajectories diverge in time. Our empirical analysis using GPU profilers showed that the application suffers from poor data locality and high memory traffic. In this paper, we propose an alternative data layout, which increases data locality within thread blocks. Furthermore, we introduce alternative model configurations that lower both algorithmic effort and the number of memory requests without violating accuracy requirements. Our experiments show that our method is able to accelerate the computations up to a factor of 2.6.
Published: 2020
Full Text: View/download PDF

32. The Art of Getting Deep Neural Networks in Shape

Author: Ali Jannesari, Rahim Mammadli, and Felix Wolf
Subjects: Hyperparameter, Artificial neural network, Matching (graph theory), Computer science, business.industry, Process (computing), Topology (electrical circuits), 02 engineering and technology, Energy consumption, 010501 environmental sciences, Network topology, Machine learning, computer.software_genre, 01 natural sciences, Set (abstract data type), Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software, 0105 earth and related environmental sciences, Information Systems
Abstract: Training a deep neural network (DNN) involves selecting a set of hyperparameters that define the network topology and influence the accuracy of the resulting network. Often, the goal is to maximize prediction accuracy on a given dataset. However, non-functional requirements of the trained network -- such as inference speed, size, and energy consumption -- can be very important as well. In this article, we aim to automate the process of selecting an appropriate DNN topology that fulfills both functional and non-functional requirements of the application. Specifically, we focus on tuning two important hyperparameters, depth and width, which together define the shape of the resulting network and directly affect its accuracy, speed, size, and energy consumption. To reduce the time needed to search the design space, we train a fraction of DNNs and build a model to predict the performances of the remaining ones. We are able to produce tuned ResNets, which are up to 4.22 times faster than original depth-scaled ResNets on a batch of 128 images while matching their accuracy.
Published: 2018
Full Text: View/download PDF

33. Idealized Coil Cross Sections With Minimized Conductor Area for High Field Dipoles

Author: Glyn Kirby, Felix Wolf, Herman H.J. ten Kate, Jaakko Samuel Murtomaki, Gijs de Rijk, Lucio Rossi, and Jeroen van Nugteren
Subjects: 010302 applied physics, Physics, Aperture, Acoustics, Superconducting magnet, Condensed Matter Physics, 01 natural sciences, Electronic, Optical and Magnetic Materials, Conductor, Magnetic field, Dipole, Electromagnetic coil, Magnet, 0103 physical sciences, Electrical and Electronic Engineering, 010306 general physics, Current density
Abstract: In the design of superconducting accelerator magnets, the shape of the coil cross section is mainly driven by the minimization of the conductor volume, constrained by requirements on the central magnetic field and its homogeneity. Such optimizations commonly assume either a Block or Cosine Theta coil type, which is then filled with (predetermined) rectangular or key-stoned Rutherford cables. By optimizing the positions, angles, and the number of turns, the field quality requirements and cost minimization are achieved. However, this leaves to wonder what the optimal coil geometry looks like, when such practical constraints are not present. Although such a coil cross section has always been presented as the intersection between two ellipses, this method results in a noncircular aperture and is thus not fully representative of a realistic coil. This paper introduces a method in which organically shaped (nongraded) dipole coil layouts are optimized without any assumptions on the conductor. The resulting layouts are presented as a function of overall current density, aperture size, and required magnetic field (inside the aperture). The layouts presented should be viewed as an ultimate limit of what can be achieved, for comparison with real coil layouts, and as an initial guide for finding an optimal cross section for a realistic magnet.
Published: 2018
Full Text: View/download PDF

34. 10 kA Joints for HTS Roebel Cables

Author: Felix Wolf, Lucio Rossi, Gijs de Rijk, Janne Ruuskanen, Jeroen van Nugteren, J. Fleiter, Jaakko Samuel Murtomaki, Oscar Sacristan-de-Frutos, Francois-Olivier Pincot, Glyn Kirby, Pierre-Antoine Contat, and Antti Stenvall
Subjects: High-temperature superconductivity, Large Hadron Collider, Materials science, Mechanical engineering, 02 engineering and technology, Superconducting magnet, 021001 nanoscience & nanotechnology, Condensed Matter Physics, 01 natural sciences, Temperature measurement, Electronic, Optical and Magnetic Materials, law.invention, Conductor, law, Condensed Matter::Superconductivity, Magnet, Soldering, 0103 physical sciences, Electrical and Electronic Engineering, 010306 general physics, 0210 nano-technology, Joint (geology)
Abstract: Future high temperature superconductor (HTS) high field magnets using multitape HTS cables need 10-kA low-resistance connections. The connections are needed between the poles of the magnets and at the terminals in a wide-operating temperature range, from 1.9–85 K. The EuCARD-WP10 Future Magnets collaboration aims at testing HTS-based Roebel cables in an accelerator magnet. Usually, low temperature superconductor (LTS) cables are jointed inside a relatively short soldered block. Powering tests at CERN have highlighted excess heating of a joint following classical LTS joint design. The HTS Roebel cables are assembled from REBCO-coated conductor tapes in a transposed configuration. Due to this, the tapes surface the cable at an angle with the cable axis. A low-resistance joint requires a sufficiently large interface area for each tape. Within one twist pitch length, each tape is located at the surface of the cable over a relatively small non-constant area. This geometry prevents making a well-controlled joint in a compact length along the cable. This paper presents a compact joint configuration for the Roebel cable overcoming these practical challenges. A new joint called fin-block is designed. The joint resistance is estimated computationally. Finally, the test results as a function of current and temperature are presented.
Published: 2018
Full Text: View/download PDF

35. Design-time performance modeling of compositional parallel programs

Author: Fabian Czappa, Alexandru Calotoiu, Heiko Mantel, Felix Wolf, Thomas Hohl, and Toni Nguyen
Subjects: Class (computer programming), Computer Networks and Communications, Computer science, Parallel design, Alternative development, Computer Graphics and Computer-Aided Design, Industrial engineering, Pipeline (software), Theoretical Computer Science, Task (project management), Artificial Intelligence, Hardware and Architecture, Software design pattern, Systems design, Performance model, Software
Abstract: Performance models are powerful instruments for understanding the performance of parallel systems and uncovering their bottlenecks. Already during system design, performance models can help ponder alternative development options. However, creating a performance model – whether theoretically or empirically – for an entire application that does not exist yet is challenging. In this paper, we propose to generate performance models of full programs from performance models of their components using formal composition operators derived from parallel design patterns. As long as the design of the overall system follows such a pattern, its performance model can be predicted with reasonable accuracy without an actual implementation. We demonstrate our approach with design patterns of varying complexity, including pipeline, task pool, and eventually MapReduce, which is representative of a broad class of data-analytics applications.
Published: 2021
Full Text: View/download PDF

36. Biosynthesis of the β-Lactone Proteasome Inhibitors Belactosin and Cystargolide

Author: Andreas Kulik, Leonard Kaysser, Judith S. Bauer, Jörn Kalinowski, Theresa M. Bendel, Harald Gross, and Felix Wolf
Subjects: Magnetic Resonance Spectroscopy, natural products, Stereochemistry, Streptomycetaceae, 010402 general chemistry, 01 natural sciences, Catalysis, Actinobacteria, Ligases, Lactones, chemistry.chemical_compound, Biosynthesis, Tandem Mass Spectrometry, inhibitors, Amino Acids, Gene, chemistry.chemical_classification, Natural product, ATP synthase, biology, 010405 organic chemistry, Dipeptides, General Chemistry, biology.organism_classification, 0104 chemical sciences, Amino acid, chemistry, Proteasome, Biochemistry, Multigene Family, biology.protein, Intercellular Signaling Peptides and Proteins, biosynthesis, Peptides, metabolism, Proteasome Inhibitors, Genome, Bacterial, Lactone
Abstract: Belactosins and cystargolides are natural product proteasome inhibitors from Actinobacteria. Both feature dipeptidic backbones and a unique beta-lactone building block. Herein, we present a detailed investigation of their biosynthesis. Identification and analysis of the corresponding gene clusters indicated that both compounds are assembled by rare single-enzyme amino acid ligases. Feeding experiments with isotope-labeled precursors and in vitro biochemistry showed that the formation of the beta-lactone warhead is unprecedented and reminiscent of leucine biosynthesis, and that it involves the action of isopropylmalate synthase homologues.
Published: 2017
Full Text: View/download PDF

37. Die Biosynthese der β-Lacton-haltigen Proteasominhibitoren Belactosin und Cystargolid

Author: Theresa M. Bendel, Andreas Kulik, Judith S. Bauer, Felix Wolf, Harald Gross, Leonard Kaysser, and Jörn Kalinowski
Subjects: 0301 basic medicine, 03 medical and health sciences, 030104 developmental biology, General Medicine
Abstract: Belactosine und Cystargolide sind von Actinobakterien produzierte Naturstoffe mit proteasominhibitorischen Eigenschaften. Beide weisen ein Peptidruckgrat aus zwei Aminosauren sowie einen einzigartigen β-Lacton-Baustein auf. Hier wird eine detaillierte Untersuchung der Biosynthese der beiden Naturstoffe beschrieben. Die Identifizierung und Analyse der entsprechenden Gencluster weist darauf hin, dass beide Stoffe uber seltene Aminosaure-Ligasen gebildet werden. Futterungsversuche mit isotopenmarkierten Vorstufen und In-vitro-Biochemie zeigen einen bislang unbeschriebenen Mechanismus zur β-Lacton-Biosynthese, der sich aus der Leucin-Bildung ableitet und die Beteiligung eines Isopropylmalat-Synthase-ahnlichen Enzyms vorsieht.
Published: 2017
Full Text: View/download PDF

38. Synthesis of Phenylpropanoids via Matsuda–Heck Coupling of Arene Diazonium Salts

Author: Felix Wolf and Bernd Schmidt
Subjects: Phenylpropanoid, 010405 organic chemistry, Chemistry, Heck reaction, Organic Chemistry, Organic chemistry, 010402 general chemistry, 01 natural sciences, 0104 chemical sciences
Abstract: The Pd-catalyzed Heck-type coupling (Matsuda-Heck reaction) of electron rich arene diazonium salts with electron deficient olefins has been exploited for the synthesis of phenylpropanoid natural products. Examples described herein are the naturally occurring benzofurans methyl wutaifuranate, wutaifuranol, wutaifuranal, their 7-methoxy derivatives, and the O-prenylated natural products boropinols A and C.
Published: 2017
Full Text: View/download PDF

39. Genombasierte Suche nach Protease-Inhibitoren aus bakteriellen Quellen

Author: Leonard Kaysser and Felix Wolf
Subjects: chemistry.chemical_classification, Proteases, Protease, 010405 organic chemistry, medicine.medical_treatment, Bacterial genome size, Biology, 010402 general chemistry, biology.organism_classification, 01 natural sciences, 0104 chemical sciences, chemistry.chemical_compound, Enzyme, Biosynthesis, chemistry, Biochemistry, medicine, Mode of action, Molecular Biology, Gene, Bacteria, Biotechnology
Abstract: Bacteria are a rich source for small-molecule inhibitors of proteases and protease-like enzymes with various therapeutic applications. These molecules often comprise distinct structural moieties, so called warheads which mediate specific interactions with the target enzymes. Knowledge on the biosynthesis of the protease inhibitors warheads can be used to search for orphan pathways with similar capacities in bacterial genomes. The identified biosynthetic gene clusters likely encode the production of new protease inhibitors with the same mode of action as the template compound.
Published: 2017
Full Text: View/download PDF

40. Isoefficiency in Practice

Author: Torsten Hoefler, Sergei Shudler, Alexandru Calotoiu, and Felix Wolf
Subjects: 020203 distributed computing, Computer science, Distributed computing, Computation, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, Upper and lower bounds, Computer Graphics and Computer-Aided Design, Software, Scheduling (computing)
Abstract: Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separation of problem decomposition and parallelism requires a sufficiently large input problem to achieve satisfactory efficiency on a given number of cores. Unfortunately, finding a good match between input size and core count usually requires significant experimentation, which is expensive and sometimes even impractical. In this paper, we propose an automated empirical method for finding the isoefficiency function of a task-based program, binding efficiency, core count, and the input size in one analytical expression. This allows the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we not only find (i) the actual isoefficiency function but also (ii) the function one would yield if the program execution was free of resource contention and (iii) an upper bound that could only be reached if the program was able to maintain its average parallelism throughout its execution. The difference between the three helps to explain low efficiency, and in particular, it helps to differentiate between resource contention and structural conflicts related to task dependencies or scheduling. The insights gained can be used to co-design programs and shared system resources.
Published: 2017
Full Text: View/download PDF

41. Automatic Instrumentation Refinement for Empirical Performance Modeling

Author: Jan-Patrick Lehr, Felix Wolf, Alexandru Calotoiu, and Christian Bischof
Subjects: Computer engineering, Parallel processing (DSP implementation), Filter (video), Computer science, Code (cryptography), Overhead (computing), Spec#, Instrumentation (computer programming), Supercomputer, Focus (optics), computer, computer.programming_language
Abstract: The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e.,such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.
Published: 2019
Full Text: View/download PDF

42. Designing Efficient Parallel Software via Compositional Performance Modeling

Author: Alexandru Calotoiu, Toni Nguyen, Felix Wolf, Heiko Mantel, and Thomas Hohl
Subjects: Parallel software, Computer science, Distributed computing, Parallel design, Systems design, Pipeline (software), Performance model, Task (project management)
Abstract: Performance models are powerful instruments for understanding the performance of parallel systems and uncovering their bottlenecks. Already during system design, performance models can help ponder alternatives. However, creating a performance model - whether theoretically or empirically - for an entire application that does not exist yet is challenging unless the interactions between all system components are well understood, which is often not the case during design. In this paper, we propose to generate performance models of full programs from performance models of their components using formal composition operators derived from parallel design patterns such as pipeline or task pool. As long as the design of the overall system follows such a pattern, its performance model can be predicted with reasonable accuracy without an actual implementation.
Published: 2019
Full Text: View/download PDF

43. PRIMA-X - Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing

Author: Daniel Lorenz and Felix Wolf
Subjects: Computer science, business.industry, Embedded system, Retargeting, Instrumentation (computer programming), business, Exascale computing
Published: 2019
Full Text: View/download PDF

44. Automatic construct selection and variable classification in OpenMP

Author: Felix Wolf, Mohammad Norouzi, and Ali Jannesari
Subjects: 020203 distributed computing, Correctness, Speedup, Semantics (computer science), Computer science, 02 engineering and technology, Construct (python library), Parallel computing, 020202 computer hardware & architecture, Variable (computer science), Task (computing), Software design pattern, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography)
Abstract: A major task of parallelization with OpenMP is to decide where in a program to insert which OpenMP construct such that speedup is maximized and correctness is preserved. Another challenge is the classification of variables that appear in a construct according to their data-sharing semantics. Manual classification is tedious and error prone. Moreover, the choice of the data-sharing attribute can significantly affect performance. Grounded on the notion of parallel design patterns, we propose a method that identifies code regions to parallelize and selects appropriate OpenMP constructs for them. Also, we classify variables in the chosen constructs by analyzing data dependences that have been dynamically extracted from the program. Using our approach, we created OpenMP versions of 49 sequential benchmarks and compared them with the code produced by three state-of-the-art parallelization tools: Our codes are faster in most cases with average speedups relative to any of the three ranging from 1.8 to 2.7. Additionally, we automatically reclassified variables of OpenMP programs parallelized manually or with the help of these tools, improving their execution time by up to 29%.
Published: 2019
Full Text: View/download PDF

45. How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications

Author: Akihiro Nomura, Aamer Shah, Satoshi Matsuoka, Chih-Song Kuo, and Felix Wolf
Subjects: File system, Computer Networks and Communications, Computer science, Distributed computing, Small files, computer.software_genre, Computer Science Applications, Scheduling (computing), Computational Theory and Mathematics, Hardware and Architecture, Cluster (physics), Versa, computer, Software, Information Systems
Abstract: On large-scale clusters, tens to hundreds of applications can simultaneously access a parallel file system, leading to contention and, in its wake, to degraded application performance. In this article, we analyze the influence of file-access patterns on the degree of interference. As it is by experience most intrusive, we focus our attention on write-write contention. We observe considerable differences among the interference potentials of several typical write patterns. In particular, we found that if one parallel program writes large output files while another one writes small checkpointing files, then the latter is slowed down when the checkpointing files are small enough and the former is vice versa. Moreover, applications with a few processes writing large output files already can significantly hinder applications with many processes from checkpointing small files. Such effects can seriously impact the runtime of real applications—up to a factor of five in one instance. Our insights and measurement techniques offer an opportunity to automatically classify the interference potential between applications and to adjust scheduling decisions accordingly.
Published: 2019
Full Text: View/download PDF

46. Isogeometric Discretizations of the Electric Field Integral Equation

Author: Felix Wolf, Jürgen Dölz, Sebastian Schöps, and Stefan Kurz
Subjects: Boundary (topology), Triangulation (social science), 020206 networking & telecommunications, CAD, 02 engineering and technology, Electric-field integral equation, 01 natural sciences, Range (mathematics), 0103 physical sciences, Convergence (routing), 0202 electrical engineering, electronic engineering, information engineering, Applied mathematics, 010306 general physics, Boundary element method, Mathematics
Abstract: This contribution proposes a fast high-order isogeometric boundary element method for the solution of the electric field integral equation. The approach bridges the gap between computer-aided design (CAD) and computer-aided engineering since CAD geometries given by boundary representations can be utilized without loss of information. Furthermore, the method promises better accuracy per degree of freedom than classical triangulation-based approaches. We present our findings which show that the method is mathematically sound, yields optimal convergence rates and possesses competitive complexity properties. We support these results by a range of numerical experiments.
Published: 2019
Full Text: View/download PDF

47. Biosynthetic reconstitution of deoxysugar phosphoramidate metalloprotease inhibitors using an N-P-bond-forming kinase

Author: Andreas Kulik, Jörn Kalinowski, Leonard Kaysser, Daniel Wibberg, Felix Wolf, Alexandra Baulig, Harald Gross, Irina Helmle, Arwa Al-Dilaimi, and Marius Bader
Subjects: Dipeptide, biology, 010405 organic chemistry, Stereochemistry, Phosphoramidon, Phosphoramidate, General Chemistry, respiratory system, 010402 general chemistry, 01 natural sciences, 0104 chemical sciences, chemistry.chemical_compound, chemistry, Biosynthesis, Phosphodiester bond, Glycosyltransferase, biology.protein, Moiety, Metalloprotease inhibitor
Abstract: Phosphoramidon is a potent metalloprotease inhibitor and a widespread tool in cell biology research. It contains a dipeptide backbone that is uniquely linked to a 6-deoxysugar via a phosphoramidate bridge. Herein, we report the identification of a gene cluster for the formation of phosphoramidon and its detailed characterization. In vitro reconstitution of the biosynthesis established TalE as a phosphoramidate-forming kinase and TalC as the glycosyltransferase which installs the L-rhamnose moiety by phosphoester linkage.
Published: 2019

48. Accelerating Data-Dependence Profiling with Static Hints

Author: Mohammad Norouzi, Felix Wolf, Ali Jannesari, and Qamar Ilias
Subjects: 010302 applied physics, Profiling (computer programming), Computer science, Data dependence, 020207 software engineering, 02 engineering and technology, Parallel computing, Static analysis, Dependence analysis, 01 natural sciences, Pointer (computer programming), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Persistent data structure, Merge (version control)
Abstract: Data-dependence profiling is a program-analysis technique to discover potential parallelism in sequential programs. Contrary to purely static dependence analysis, profiling has the advantage that it captures only those dependences that actually occur during execution. Lacking critical runtime information such as the value of pointers and array indices, purely static analysis may overestimate the amount of dependences. On the downside, dependence profiling significantly slows down the program, not seldom prolonging execution by a factor of 100. In this paper, we propose a hybrid approach that substantially reduces this overhead. First, we statically identify persistent data dependences that will appear in any execution. We then exclude the affected source-code locations from instrumentation, allowing the profiler to skip them at runtime and avoiding the associated overhead. At the end, we merge static and dynamic dependences. We evaluated our approach with 38 benchmarks from two benchmark suites and obtained a median reduction of the profiling time by 62% across all the benchmarks.
Published: 2019
Full Text: View/download PDF

49. Understanding the Scalability of Molecular Simulation Using Empirical Performance Modeling

Author: Jadran Vrabec, Felix Wolf, and Sergei Shudler
Subjects: Range (mathematics), Molecular dynamics, Workflow, Scale (chemistry), Distributed computing, Scalability, ddc:000, Code (cryptography), ddc:621, Field (computer science), Molecular engineering
Abstract: The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-17872-7_8. Molecular dynamics (MD) simulation allows for the study of static and dynamic properties of molecular ensembles at various molecular scales, from monatomics to macromolecules such as proteins and nucleic acids. It has applications in biology, materials science, biochemistry, and biophysics. Recent developments in simulation techniques spurred the emergence of the computational molecular engineering (CME) field, which focuses specifically on the needs of industrial users in engineering. Within CME, the simulation code ms2 allows users to calculate thermodynamic properties of bulk fluids. It is a parallel code that aims to scale the temporal range of the simulation while keeping the execution time minimal. In this paper, we use empirical performance modeling to study the impact of simulation parameters on the execution time. Our approach is a systematic workflow that can be used as a blue-print in other fields that aim to scale their simulation codes. We show that the generated models can help users better understand how to scale the simulation with minimal increase in execution time.
Published: 2019
Full Text: View/download PDF

50. Enhancing the Programmability and Performance Portability of GPU Tensor Operations

Author: Johannes Schulte, Arya Mazaheri, Felix Wolf, Matthew W. Moskewicz, and Ali Jannesari
Subjects: CUDA, Software portability, Kernel (image processing), Computer architecture, Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, General-purpose computing on graphics processing units
Abstract: Deep-learning models with convolutional networks are widely used for many artificial-intelligence tasks, thanks to the increasing adoption of high-throughput GPUs, even in mobile phones. CUDA and OpenCL are the two largely used programming interfaces for accessing the computing power of GPUs. However, attaining code portability has always been a challenge, until the introduction of the Vulkan API. Still, performance portability is not necessarily provided. In this paper, we investigate the unique characteristics of CUDA, OpenCL, and Vulkan kernels and propose a method for abstracting away syntactic differences. Such abstraction creates a single-source kernel which we use for generating code for each GPU programming interface. In addition, we expose auto-tuning parameters to further enhance performance portability. We implemented a selection of convolution operations, covering the core operations needed for deploying three common image-processing neural networks, and tuned them for NVIDIA, AMD, and ARM Mali GPUs. Our experiments show that we can generate deep-learning kernels with minimal effort for new platforms and achieve reasonable performance. Specifically, our Vulkan backend is able to provide competitive performance compared to vendor deep-learning libraries.
Published: 2019
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

274 results on '"Felix Wolf"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources