126,174 results on '"Parallel computing"'
Search Results
2. On Distributed Computing: A View, Physical Versus Logical Objects, and a Look at Fully Anonymous Systems
- Author
-
Raynal, Michel, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Masuzawa, Toshimitsu, editor, Katayama, Yoshiaki, editor, Kakugawa, Hirotsugu, editor, Nakamura, Junya, editor, and Kim, Yonghwan, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Automated and Automatic Systems of Management of a Programs Package for Making Optimal Decisions
- Author
-
Quliyev, Samir, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Mammadova, Gulchohra, editor, Aliev, Telman, editor, and Aida-zade, Kamil, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Noise-tolerant NMF-based parallel algorithm for respiratory rate estimation.
- Author
-
Revuelta-Sanz, Pablo, Muñoz-Montoro, Antonio J., Torre-Cruz, Juan, Canadas-Quesada, Francisco J., and Ranilla, José
- Subjects
- *
MATRIX decomposition , *NONNEGATIVE matrices , *PARALLEL algorithms , *PARALLEL programming , *RESPIRATORY organs - Abstract
The accurate estimation of respiratory rate (RR) is crucial for assessing the respiratory system's health in humans, particularly during auscultation processes. Despite the numerous automated RR estimation approaches proposed in the literature, challenges persist in accurately estimating RR in noisy environments, typical of real-life situations. This becomes especially critical when periodic noise patterns interfere with the target signal. In this study, we present a parallel driver designed to address the challenges of RR estimation in real-world environments, combining multi-core architectures with parallel and high-performance techniques. The proposed system employs a nonnegative matrix factorization (NMF) approach to mitigate the impact of noise interference in the input signal. This NMF approach is guided by pre-trained bases of respiratory sounds and incorporates an orthogonal constraint to enhance accuracy. The proposed solution is tailored for real-time processing on low-power hardware. Experimental results across various scenarios demonstrate promising outcomes in terms of accuracy and computational efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Enhancing security and scalability by AI/ML workload optimization in the cloud.
- Author
-
Priyadarshini, Sabina, Sawant, Tukaram Namdev, Bhimrao Yadav, Gitanjali, Premalatha, J., and Pawar, Sanjay R.
- Subjects
- *
ARTIFICIAL intelligence , *PARALLEL programming , *MACHINE learning , *COMMUNICATION infrastructure , *RESOURCE allocation - Abstract
The pervasive adoption of Artificial Intelligence (AI) and Machine Learning (ML) applications has exponentially increased the demand for efficient resource allocation, workload scheduling, and parallel computing capabilities in cloud environments. This research addresses the critical need for enhancing both the scalability and security of AI/ML workloads in cloud computing settings. The study emphasizes the optimization of resource allocation strategies to accommodate the diverse requirements of AI/ML workloads. Efficient resource allocation ensures that computational resources are utilized judiciously, avoiding bottlenecks and latency issues that could hinder the performance of AI/ML applications. The research explores advanced parallel computing techniques to harness the full possible cloud infrastructure, enhancing the speed and efficiency of AI/ML computations. The integration of robust security measures is crucial to safeguard sensitive data and models processed in the cloud. The research delves into secure multi-party computation and encryption techniques like the Hybrid Heft Pso Ga algorithm, Heuristic Function for Adaptive Batch Stream Scheduling Module (ABSS) and allocation of resources parallel computing and Kuhn–Munkres algorithm tailored for AI/ML workloads, ensuring confidentiality and integrity throughout the computation lifecycle. To validate the proposed methodologies, the research employs extensive simulations and real-world experiments. The proposed ABSS_SSMM method achieves the highest accuracy and throughput values of 98% and 94%, respectively. The contributions of this research extend to the broader cloud computing and AI/ML communities. By providing scalable and secure solutions, the study aims to empower cloud service providers, enterprises, and researchers to leverage AI/ML technologies with confidence. The findings are anticipated to inform the design and implementation of next-generation cloud platforms that seamlessly support the evolving landscape of AI/ML applications, fostering innovation and driving the adoption of intelligent technologies in diverse domains. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Parallel lightweight Block Cipher algorithm for Multicore CPUs.
- Author
-
Hamiza, Hawraa J. and Fanfakh, Ahmed
- Abstract
Copyright of Baghdad Science Journal is the property of Republic of Iraq Ministry of Higher Education & Scientific Research (MOHESR) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
7. Exact likelihood for inverse gamma stochastic volatility models.
- Author
-
Leon‐Gonzalez, Roberto and Majoni, Blessings
- Abstract
We obtain a novel analytic expression of the likelihood for a stationary inverse gamma stochastic volatility (SV) model. This allows us to obtain the maximum likelihood estimator for this nonlinear non‐Gaussian state space model. Further, we obtain both the filtering and smoothing distributions for the inverse volatilities as mixtures of gammas, and therefore, we can provide the smoothed estimates of the volatility. We show that by integrating out the volatilities the model that we obtain has the resemblance of a GARCH in the sense that the formulas are similar, which simplifies computations significantly. The model allows for fat tails in the observed data. We provide empirical applications using exchange rates data for seven currencies and quarterly inflation data for four countries. We find that the empirical fit of our proposed model is overall better than alternative models for four countries currency data and for two countries inflation data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Meerkat: A Framework for Dynamic Graph Algorithms on GPUs.
- Author
-
Concessao, Kevin Jude, Cheramangalath, Unnikrishnan, Dev, Ricky, and Nasre, Rupesh
- Subjects
- *
PARALLEL programming , *REPRESENTATIONS of graphs , *PARALLEL processing , *STRUCTURAL frames , *MEERKAT - Abstract
Graph algorithms are challenging to implement due to their varying topology and irregular access patterns. Real-world graphs are dynamic in nature and routinely undergo edge and vertex additions, as well as, deletions. Typical examples of dynamic graphs are social networks, collaboration networks, and road networks. Applying static algorithms repeatedly on dynamic graphs is inefficient. Further, due to the rapid growth of unstructured and semi-structured data, graph algorithms demand efficient parallel processing. Unfortunately, we know only a little about how to efficiently process dynamic graphs on massively parallel architectures such as GPUs. Existing approaches to represent and process dynamic graphs are either not general or are inefficient. In this work, we propose a graph library for dynamic graph algorithms over a GPU-tailored graph representation and exploits the warp-cooperative work-sharing execution model. The library, named Meerkat, builds upon a recently proposed dynamic graph representation on GPUs. This representation exploits a hashtable-based mechanism to store a vertex's neighborhood. Meerkat also enables fast iteration through a group of vertices, a pattern common and crucial for achieving performance in graph applications. Our framework supports dynamic edge additions and edge deletions, along with their batched versions. Based on the efficient iterative patterns encoded in Meerkat, we implement dynamic versions of popular graph algorithms such as breadth-first search, single-source shortest paths, triangle counting, PageRank, and weakly connected components. We evaluated our implementations over the ones in other publicly available dynamic graph data structures and frameworks: GPMA, Hornet, and faimGraph. Using a variety of real-world graphs, we observe that Meerkat significantly improves the efficiency of the underlying dynamic graph algorithm, outperforming these frameworks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Extending parallel programming patterns with adaptability features.
- Author
-
Galante, Guilherme, da Rosa Righi, Rodrigo, and de Andrade, Cristiane
- Subjects
- *
ELASTICITY , *COMPUTERS , *COMPUTER software , *PARALLEL programming , *HARDWARE - Abstract
Today, all computers have some degree of usable parallelism. Modern computers are explicitly equipped with hardware support for parallelism, such as multiple nodes, multicores, multiple CPUs, and accelerators. At the same time, the Cloud Continuum has become a viable platform for running parallel applications. Building software for these parallel and distributed platforms can be challenging due to the numerous considerations programmers must make during the development process. With this in mind, the high-performance computing literature proposed the concept of parallel patterns to hide some complexities. However, there are no patterns that address the design and creation of adaptive applications. Taking the compute continuum era in mind, we present how adaptability features can be explored within each parallel programming pattern, providing technical details on managing dynamic resources and handling changes in application behavior. In addition to this contribution, we also address practical implications by presenting some frameworks that can be used to implement adaptive applications and examples of using them with the proposed patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. An intelligent non-uniform mesh to improve errors of a stable numerical method for time-tempered fractional advection–diffusion equation with weakly singular solution.
- Author
-
Ahmadinia, Mahdi, Abbasi, Mokhtar, and Hadi, Parisa
- Subjects
- *
FINITE volume method , *FINITE element method , *PARALLEL programming , *ADVECTION-diffusion equations , *EQUATIONS - Abstract
This paper introduces a finite volume element method for solving the time-tempered fractional advection–diffusion equation with weakly singular solution at initial time t = 0 . An innovative approach is proposed to construct an intelligent non-uniform temporal mesh, which significantly reduces errors as compared to using a uniform temporal mesh. The error reduction is quantified in terms of percentage improvement of errors. Due to the presence of a large number of integral calculations involving complicated functions, we used parallel computing techniques to accelerate the computation process. The stability of the method is rigorously proven, and numerical examples are provided to demonstrate the effectiveness of the method and validate the theoretical results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Text encryption using secure and expeditious multiprocessing SerpentCTR using logistic map.
- Author
-
Elshoush, Huwaida T., Ahmed, Duaa M., Ishag, Abdalmajid A., Elsadig, Muawia A., and Altigani, Abdelrahman
- Subjects
CENTRAL processing units ,CRITICAL success factor ,ENCRYPTION protocols ,PARALLEL programming ,STATISTICAL correlation - Abstract
Unarguably performance is a critical factor to the success of any cipher. Al-Beit Serpent is more secure than advanced encryption standard (AES), it faces limitations such as speed and memory requirement. Hence, this paper proffers a text encryption method that ameliorates the performance by running Serpent in parallel using the counter (CTR) encryption mode and further enhances the security by generating sub-keys for each block using logistic map. The intricate logistic map generated keys adds robustness to the proposed algorithm. Comprehensive experiments using Python 3.9 on commonly used metrics verify the efficacy of the proposed method in terms of execution time, central processing unit (CPU) usage, security analysis including key space, strict avalanche effect and its randomness. The encryption/decryption reduction rate reached up to 80.81%. It is worthy of note that it is effectually resistant to brute force attacks having a large key space in addition to its dependency on the number of blocks besides the randomly generated keys. The enhanced Serpent was examined using the statistical test suite (STS) recommended by the National Institute of Standards and Technology (NIST) and verified its randomness by passing all tests. Furthermore, it efficaciously resisted statistical analysis, particularly histogram and correlation coefficient analysis. Moreover, it prevails over current methods when juxtaposed with them in terms of performance, key space, key sensitivity, avalanche effect, histogram analysis and correlation coefficient, ergo affirming its efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Parallel numerical simulation of the 2D acoustic wave equation.
- Author
-
Altybay, Arshyn, Darkenbayev, Dauren, and Mekebayev, Nurbapa
- Subjects
PARALLEL programming ,SOUND waves ,GRAPHICS processing units ,WAVE equation ,PARALLEL algorithms - Abstract
Mathematical simulation has significantly broadened with the advancement of parallel computing, particularly in its capacity to comprehend physical phenomena across extensive temporal and spatial dimensions. Highperformance parallel computing finds extensive application across diverse domains of technology and science, including the realm of acoustics. This research investigates the numerical modeling and parallel processing of the two-dimensional acoustic wave equation in both uniform and non-uniform media. Our approach employs implicit difference schemes, with the cyclic reduction algorithm used to obtain an approximate solution. We then adapt the sequential algorithm for parallel execution on a graphics processing unit (GPU). Ultimately, our findings demonstrate the effectiveness of the parallel approach in yielding favorable results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Nonconvex Dantzig selector and its parallel computing algorithm.
- Author
-
Wen, Jiawei, Yang, Songshan, and Zhao, Delin
- Abstract
The Dantzig selector is a popular ℓ 1 -type variable selection method widely used across various research fields. However, ℓ 1 -type methods may not perform well for variable selection without complex irrepresentable conditions. In this article, we introduce a nonconvex Dantzig selector for ultrahigh-dimensional linear models. We begin by demonstrating that the oracle estimator serves as a local optimum for the nonconvex Dantzig selector. In addition, we propose a one-step local linear approximation estimator, called the Dantzig-LLA estimator, for the nonconvex Dantzig selector, and establish its strong oracle property. The proposed regularization method avoids the restrictive conditions imposed by ℓ 1 regularization methods to guarantee the model selection consistency. Furthermore, we propose an efficient and parallelizable computing algorithm based on feature-splitting to address the computational challenges associated with the nonconvex Dantzig selector in high-dimensional settings. A comprehensive numerical study is conducted to evaluate the performance of the nonconvex Dantzig selector and the computing efficiency of the feature-splitting algorithm. The results demonstrate that the Dantzig selector with nonconvex penalty outperforms the ℓ 1 penalty-based selector, and the feature-splitting algorithm performs well in high-dimensional settings where linear programming solver may fail. Finally, we generalize the concept of nonconvex Dantzig selector to deal with more general loss functions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Implementation and analysis of GPU algorithms for Vecchia Approximation.
- Author
-
James, Zachary and Guinness, Joseph
- Abstract
Gaussian Processes have become an indispensable part of the spatial statistician’s toolbox but are unsuitable for analyzing large datasets because of the significant time and memory needed to fit the associated model exactly. Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms. While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units (GPUs) is lacking, despite the tremendous success GPUs have had in statistics and machine learning. We compare three different ways to implement Vecchia Approximation on a GPU: two of which are similar to methods used for other Gaussian Process approximations and one that is new. Our new method exploits the properties of Vecchia Approximation to nearly eliminate thread synchronization and reduce memory access times. We show that our new method outperforms the other two and then compare it to existing multi-core and GPU-accelerated software by fitting Gaussian Process models on various datasets, including a large spatial-temporal dataset of n > 10 6 points collected from an Earth-observing satellite. Our method works on larger datasets and provides higher predictive accuracy than existing GPU methods, and it runs up to 20 times faster than a single-core CPU implementation of Vecchia Approximation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A distributed memory parallel randomized Kaczmarz for sparse system of equations.
- Author
-
Bölükbaşı, Ercan Selçuk, Torun, Fahreddin Şükrü, and Manguoğlu, Murat
- Subjects
PARALLEL programming ,LINEAR equations ,LINEAR systems ,ALGORITHMS ,EQUATIONS - Abstract
Kaczmarz algorithm is an iterative projection method for solving system of linear equations that arise in science and engineering problems in various application domains. In addition to classical Kaczmarz, there are randomized and parallel variants. The main challenge of the parallel implementation is the dependency of each Kaczmarz iteration on its predecessor. Because of this dependency, frequent communication is required which results in a substantial overhead. In this study, a new distributed parallel method that reduces the communication overhead is proposed. The proposed method partitions the problem so that the Kaczmarz iterations on different blocks are less dependent. A frequency parameter is introduced to see the effect of communication frequency on the performance. The communication overhead is also decreased by allowing communication between processes only if they have shared non‐zero columns. The experiments are performed using problems from various domains to compare the effects of different partitioning methods on the communication overhead and performance. Finally, parallel speedups of the proposed method on larger problems are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Parallel differential evolution paradigm for multilayer electromechanical device optimization.
- Author
-
Zameer, Aneela, Naz, Sidra, and Raja, Muhammad Asif Zahoor
- Subjects
- *
ELECTROMECHANICAL devices , *PIEZOELECTRIC transducers , *PIEZOELECTRIC materials , *PARALLEL programming , *DIFFERENTIAL evolution , *FAULT diagnosis , *COMPUTING platforms , *SMART structures - Abstract
Design optimization of multilayer piezoelectric transducers is intended for efficient and practical usage of wideband transducers for fault diagnosis, biomedical, and underwater applications through adjusting layer thicknesses and volume fraction of piezoelectric material in each layer. In this context, we propose a parallel differential evolution (PDE) algorithm to mitigate the complexities of multivariate optimization as well as the computation time to achieve an optimized wideband transducer for the particular application. For lead magnesium niobate-lead titanate (PMN PT)- and PZT5h-based piezoelectric materials, the fitness function is formulated based on uniformity of mechanical pressure at the first three harmonics to achieve wide bandwidth in the required functional frequency range. It is carried out using a one-dimensional model (ODM), while input layer thicknesses and volume fractions of active material are evaluated using PDE. The simulation is performed on a parallel computing platform utilizing three different host machines to reduce computational time. Results of the proposed methodology for PDE are statistically represented in the form of minimum, maximum, mean, and standard deviation of fitness value, while graphically represented in terms of speedup and time. It can be observed that the execution time for parallel DE decreases with the increasing number of cores. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Multi-task photonic reservoir computing: wavelength division multiplexing for parallel computing with a silicon microring resonator.
- Author
-
Giron Castro, Bernard J., Peucheret, Christophe, Zibar, Darko, and Da Ros, Francesco
- Subjects
WAVELENGTH division multiplexing ,SIGNAL classification ,PARALLEL programming ,WIRELESS channels ,TASK performance ,FOOTPRINTS - Abstract
Nowadays, as the ever-increasing demand for more powerful computing resources continues, alternative advanced computing paradigms are under extensive investigation. Significant effort has been made to deviate from conventional Von Neumann architectures. In-memory computing has emerged in the field of electronics as a possible solution to the infamous bottleneck between memory and computing processors, which reduces the effective throughput of data. In photonics, novel schemes attempt to collocate the computing processor and memory in a single device. Photonics offers the flexibility of multiplexing streams of data not only spatially and in time, but also in frequency or, equivalently, in wavelength, which makes it highly suitable for parallel computing. Here, we numerically show the use of time and wavelength division multiplexing (WDM) to solve four independent tasks at the same time in a single photonic chip, serving as a proof of concept for our proposal. The system is a time-delay reservoir computing (TDRC) based on a microring resonator (MRR). The addressed tasks cover different applications: Time-series prediction, waveform signal classification, wireless channel equalization, and radar signal prediction. The system is also tested for simultaneous computing of up to 10 instances of the same task, exhibiting excellent performance. The footprint of the system is reduced by using time-division multiplexing of the nodes that act as the neurons of the studied neural network scheme. WDM is used for the parallelization of wavelength channels, each addressing a single task. By adjusting the input power and frequency of each optical channel, we can achieve levels of performance for each of the tasks that are comparable to those quoted in state-of-the-art reports focusing on single-task operation. We also quantify the memory capacity and nonlinearity of each parallelized RC and relate these properties to the performance of each task. Finally, we provide insight into the impact of the feedback mechanism on the performance of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Elegante: A Machine Learning-Based Threads Configuration Tool for SpMV Computations on Shared Memory Architecture.
- Author
-
Ahmad, Muhammad, Sardar, Usman, Batyrshin, Ildar, Hasnain, Muhammad, Sajid, Khan, and Sidorov, Grigori
- Abstract
The sparse matrix–vector product (SpMV) is a fundamental computational kernel utilized in a diverse range of scientific and engineering applications. It is commonly used to solve linear and partial differential equations. The parallel computation of the SpMV product is a challenging task. Existing solutions often employ a fixed number of threads assignment to rows based on empirical formulas, leading to sub-optimal configurations and significant performance losses. Elegante, our proposed machine learning-powered tool, utilizes a data-driven approach to identify the optimal thread configuration for SpMV computations within a shared memory architecture. It accomplishes this by predicting the best thread configuration based on the unique sparsity pattern of each sparse matrix. Our approach involves training and testing using various base and ensemble machine learning algorithms such as decision tree, random forest, gradient boosting, logistic regression, and support vector machine. We rigorously experimented with a dataset of nearly 1000+ real-world matrices. These matrices originated from 46 distinct application domains, spanning fields like robotics, power networks, 2D/3D meshing, and computational fluid dynamics. Our proposed methodology achieved 62% of the highest achievable performance and is 7.33 times faster, demonstrating a significant disparity from the default OpenMP configuration policy and traditional practice methods of manually or randomly selecting the number of threads. This work is the first attempt where the structure of the matrix is used to predict the optimal thread configuration for the optimization of parallel SpMV computation in a shared memory environment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Chaotic Video Encryption Based on DNA Coding, Confusion, and Diffusion.
- Author
-
Zhi, Li-Xun, Du, Yuan, Zhao, Xi-Jue, Chen, Tao, Cao, Ke-Yin, and Jiang, Dong
- Abstract
Drawing inspiration from the paradigm of engineering projects, this paper presents a novel real-time chaotic video encryption scheme based on multithreaded parallel computing and multiround confusion–diffusion architecture. It utilizes a contractor thread to retrieve and assign frames to worker threads, which operate concurrently to perform DNA coding, confusion, and diffusion operations on their designated frames for encryption, with the resulting frames processed by a dealer thread. To evaluate the performance of the proposed algorithm, it is implemented on a workstation equipped with an Intel Xeon Gold 6226 @ 2.9 GHz CPU and 64 GB of memory. The statistical and security analyses demonstrate that the proposed strategy exhibits exceptional statistical properties and provides robust resistance against various attacks. The encryption speed evaluations show that the deployed cryptosystem achieves delay-free 5 1 2 × 5 1 2 24 FPS video encryption, with an average encryption time of 34.69 ms, despite the execution of four rounds of DNA coding, five rounds of confusion, and three rounds of diffusion operations on each frame. In comparison to existing real-time chaotic video encryption schemes based on parallel computing, our method attains superior versatility and heightened security by processing each frame with an independent worker thread and incorporating DNA coding technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A parallel strategy to accelerate neighborhood operation for raster data coordinating CPU and GPU.
- Author
-
Yu, Zhixin, Zhou, Chen, and Li, Manchun
- Subjects
- *
CENTRAL processing units , *PARALLEL programming , *NEIGHBORHOODS , *ALGORITHMS , *SCHEDULING - Abstract
This study presents an asynchronous parallel strategy coordinating central processing unit (CPU) and graphic processing unit (GPU) to accelerate neighborhood operation (NO). Specifically, we propose a data partitioning method called multi-anchor task queuing and a task scheduling method called bi-direction task scheduling, which can support CPU and GPU to find the responsible data blocks rapidly and concurrently handle their tasks via a bi-direction merge. Moreover, we optimize the organization of threads distributed among the CPU and GPU. Experimental results show that when a 1.7 GB raster dataset is processed, the speedup ratio achieved by the proposed parallel algorithm reaches 29.63, which is 19% and 18% higher than those of the GPU and standard asynchronous parallel algorithm, respectively. Additionally, the load balance index is below 0.085, which is significantly better than the value achieved by a conventional algorithm. Thus, the strategy achieves a higher speedup ratio and more adaptable load balance, thereby accelerating the NO more efficiently. Further, the impacts of the data volume, computational intensity, organization mode of the GPU threads, and granularity of the GPU stream on the parallel efficiency are evaluated and discussed. We also test the efficiency of four other common NOs with our strategy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Scalable O (log 2 n) Dynamics Control for Soft Exoskeletons.
- Author
-
Colorado, Julian D., Mendez, Diego, Gomez-Bautista, Andres, Bermeo, John E., Alvarado-Rojas, Catalina, and Cuellar, Fredy
- Abstract
Robotic exoskeletons are being actively applied to support the activities of daily living (ADL) for patients with hand motion impairments. In terms of actuation, soft materials and sensors have opened new alternatives to conventional rigid body structures. In this arena, biomimetic soft systems play an important role in modeling and controlling human hand kinematics without the restrictions of rigid mechanical joints while having an entirely deformable body with limitless points of actuation. In this paper, we address the computational limitations of modeling large-scale articulated systems for soft robotic exoskeletons by integrating a parallel algorithm to compute the exoskeleton's dynamics equations of motion (EoM), achieving a computation with O (l o g 2 n) complexity for the highly articulated n degrees of freedom (DoF) running on p processing cores. The proposed parallel algorithm achieves an exponential speedup for n = p = 64 DoF while achieving a 0.96 degree of parallelism for n = p = 256 , which demonstrates the required scalability for controlling highly articulated soft exoskeletons in real time. However, scalability will be bounded by the n = p fraction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. M‐DFCPP: A runtime library for multi‐machine dataflow computing.
- Author
-
Luo, Qiuming, Liu, Senhong, Huang, Jinke, and Li, Jinrong
- Subjects
DIRECTED acyclic graphs ,DATA flow computing ,RANDOM graphs ,TREE graphs ,PARALLEL processing - Abstract
Summary: This article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145‐152.), and builds upon it to design and implement a multi‐machine C++ dataflow library, M‐DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user‐friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145‐152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non‐uniform memory access architectures, while other dataflow libraries lack attention to these issues. M‐DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi‐machine computing, while maintains the API compatible with DFCPP. M‐DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. Commun ACM. 2008;51(1):107‐113; Ghemawat S, Gobioff H, Leung ST. ACM SIGOPS Operating Systems Review. ACM; 2003:29‐43.), which form a worksharing framework as many multi‐machine system. To shift to the M‐DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M‐DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M‐DFCPP, as a runtime library supporting multi‐node dataflow computation, is compared with MPI, a runtime library supporting multi‐node control flow computation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Energy Efficient Heuristics to Schedule Task Graphs on Heterogeneous Voltage-Frequency Islands.
- Author
-
Sanchit, Jain, Anupam, Singh, Jagpreet, and Singh, Navjot
- Subjects
- *
PARALLEL programming , *MULTIPROCESSORS , *VOLTAGE , *ISLANDS , *SCHEDULING - Abstract
For energy and cost-efficient computing, many desktops and embedded computing devices these days use voltage frequency islands (VFIs) based processors. An island in VFIs consists of multiple homogeneous cores; however, multiple islands are generally heterogeneous in nature. In contrary to architectures (non-VFI) with per-core dynamic voltage frequency scaling (DVFS), in VFI, all the cores in an island run on the same voltage/frequency at the same time. The energy-aware scheduling of task graphs on VFI architectures is challenging and different from non-VFI architectures in which task-based DVFS is possible. However, in VFI scheduling, the time slot for which an island runs on a particular voltage will be decided. In this paper, we analyse 20 different scheduling heuristics for VFI architectures by varying the size of this time slot based on the workload properties and also by varying the voltage/frequency of the time slots. We also propose a heuristic OptSlotVFI to utilize all the slots optimally. The results show that the VFI scheduling heuristics are able to generate schedules which improve energy consumption up to 25% with an equivalent or shorter schedule length than the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Exponential family measurement error models for single-cell CRISPR screens.
- Author
-
Barry, Timothy, Roeder, Kathryn, and Katsevich, Eugene
- Subjects
- *
GENOME editing , *ERRORS-in-variables models , *RNA sequencing , *MICROSOFT Azure (Computing platform) , *GENE expression - Abstract
CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens—"thresholded regression"—exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. PreNTT:面向 zk-SNARK 的数论 变换计算并行加速方法.
- Author
-
丁冬, 李正权, and 柴志雷
- Subjects
- *
PARALLEL programming , *ALGORITHMS , *MEMORY , *SCHEDULING - Abstract
Zero-knowledge succinct non-interactive proofs (zk-SNARK) have found extensive applications in various fields, including cryptocurrencies, due to their swift and efficient proof verification process. However, the computational intensity of the proof generation process poses a significant challenge, particularly at the number theoretic transform (NTT) stage. This paper proposed a GPU-based acceleration method for NTT computations, named PreNTT, to address this bottleneck. The method employed precomputation and optimization of twiddle factor powers to reduce the parallel computation overhead in NTT. It also introduced dynamic precomputation to enhance the efficiency of these computations. The algorithm made use of dynamic adaptive kernel scheduling, which allocated GPU resources on-chip according to the NTT input size, thereby boosting the computational efficiency for large-scale tasks. Additionally, the approach combined external global data shuffling with internal local data shuffling to avoid memory access conflicts. The use of CUDA multi-stream technology allowed for effective concealment of precomputation times during data transfer and computation processes. Experimental results indicate that the zk-SNARK system utilizing PreNTT achieves a speed-up ratio ranging from 1.7x to 9x in NTT module running times compared to Bellperson, the industryleading system. PreNTT effectively increases the parallelism of the NTT algorithm and reduces the computational overhead in zk-SNARK operations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Theoretical Basis of Mathematical Apparatus for Parallel Computing Implementation in Computer-Aided Design Systems.
- Author
-
Konopatskiy, E.
- Subjects
- *
PARALLEL programming , *GEOMETRICAL constructions , *GEOMETRIC modeling , *PROJECTORS , *INFORMATION modeling - Abstract
The purpose of this work is to develop a mathematical apparatus and computational algorithms for implementation of parallel computing in geometric modeling and computer-aided design (CAD) systems. The analysis of existing approaches to parallel computing implementation in CAD systems is carried out. As a result, it is found that most information modeling and CAD systems do not support parallel computing at the level of the geometric kernel. A concept for the development of a CAD geometric kernel based on the invariants of parallel projection of geometric objects onto the axes of the global coordinate system is proposed. It combines the potential of constructive methods for geometric modeling, capable of parallelizing geometric constructions by tasks (message passing), and the mathematical apparatus of point calculus, capable of parallelization by data through coordinate-by-coordinate calculation (data parallel). The use of the coordinate-by-coordinate calculation for point equations not only makes it possible to parallelize computations along coordinate axes, but also ensures the consistency of computational operations with respect to threads, which significantly reduces the idle time and optimizes the CPU operation to achieve the maximum effect from the use of parallel computing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores.
- Author
-
Vogel, Adriano, Danelutto, Marco, Torquati, Massimo, Griebler, Dalvan, and Fernandes, Luiz Gustavo
- Subjects
- *
PARALLEL programming , *DECISION making , *EXECUTIONS & executioners - Abstract
Parallel computing is very important to accelerate the performance of computing applications. Moreover, parallel applications are expected to continue executing in more dynamic environments and react to changing conditions. In this context, applying self-adaptation is a potential solution to achieve a higher level of autonomic abstractions and runtime responsiveness. In our research, we aim to explore and assess the possible abstractions attainable through the transparent management of parallel executions by self-adaptation. Our primary objectives are to expand the adaptation space to better reflect real-world applications and assess the potential for self-adaptation to enhance efficiency. We provide the following scientific contributions: (I) A conceptual framework to improve the designing of self-adaptation; (II) A new decision-making strategy for applications with multiple parallel stages; (III) A comprehensive evaluation of the proposed decision-making strategy compared to the state-of-the-art. The results demonstrate that the proposed conceptual framework can help design and implement self-adaptive strategies that are more modular and reusable. The proposed decision-making strategy provides significant gains in accuracy compared to the state-of-the-art, increasing the parallel applications' performance and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. An FPGA-based Object Detection Accelerator Architecture with Multi-channel Parallel Computation.
- Author
-
Tianyong Ao, Suiao Yang, Lin Wang, Le Fu, and Yi Zhou
- Subjects
OBJECT recognition (Computer vision) ,FEATURE extraction ,DRONE aircraft ,ARCHITECTURAL design ,PARALLEL programming - Abstract
Object detection algorithms are widely used but involve significant amounts of computations, posing challenges for adaptation to resource-constrained application scenarios, such as Unmanned Aerial Vehicles (UAVs) and Unmanned Surface Vehicles (USVs). To address these challenges, this paper proposes an object detection accelerator architecture with multi-channel parallel computation. In this architecture, the computationally intensive modules, including the convolution layer, pooling layer, and upsampling layer, are accelerated using hardware, and other modules are dealt with CPU embedded in the FPGA. The methods of pipeline design, loop expansion, data reordering, and other technologies are fully utilized to design hardware acceleration modules. A data transmission architecture is designed, incorporating multi-channel transmission along with ping-pong buffering and employing a blocking strategy for off-chip data access. Furthermore, the architecture incorporates multiple acceleration IP cores to minimize data transmission delays. Based on this architecture, the tiny-YOLOv4 model is optimized and implemented on FPGA as a hardware accelerator for object detection. The network model is enhanced by integrating the convolutional layer with normalization, and different attention mechanisms are applied to improve feature extraction, thereby reducing computational load and enhancing accuracy. The performance of the tiny-YOLOv4 FPGA-based accelerator is validated using the SIMD dataset. Experimental results demonstrate that the hardware accelerator performs exceptionally, consuming only 2.4W and surpassing existing alternatives. Such adaptability facilitates its integration into complex environments such as intelligent transportation systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
29. Accelerated point set registration method.
- Author
-
Raettig, Ryan M, Anderson, James D, Nykl, Scott L, and Merkle, Laurence D
- Abstract
In computer vision and robotics, point set registration is a fundamental issue used to estimate the relative position and orientation (pose) of an object in an environment. In a rapidly changing scene, this method must be executed frequently and in a timely manner, or the pose estimation becomes outdated. The point registration method is a computational bottleneck of a vision-processing pipeline. For this reason, this paper focuses on speeding up a widely used point registration method, the iterative closest point (ICP) algorithm. In addition, the ICP algorithm is transformed into a massively parallel algorithm and mapped onto a vector processor to realize a speedup of approximately an order of magnitude. Finally, we provide algorithmic and run-time analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. A Survey on Parallel Nature Inspired Algorithms.
- Author
-
Kumar, Lalit, Pandey, Manish, and Ahirwal, Mitul Kumar
- Subjects
PARALLEL programming ,RESEARCH personnel ,PROBLEM solving ,ALGORITHMS ,SOCIAL problems - Abstract
Many computation tasks in science and engineering with large amount of data involve complex optimization problems for researchers. Nature Inspired based Algorithms (NIAs) are mostly know for solving complex problems in real world but also these algorithms have one of the major drawbacks of computation cost due to complex problems or increase in the dimension of the problems. Hence, several efforts are made to achieve better computation cost by using parallelization in Nature Inspired Algorithms (NIAs). Parallel Nature Inspired Algorithms (P-NIAs) come forward to overcome the cost of these type of optimization problems. Naturally some of the NIAs are inherently parallel and known for fast processing task. This survey presents a systematic review of existing literature related to parallelization in NIAs. We offered a systematic search to identify the literature through well-reputed publications as IEEE, Elsevier, Springer, ACM, Hindawi, Inderscience, and Wiley. Out of 100 studies 77 reputable studies were selected to identify the current research trends, challenges and opportunities for parallelization in NIAs. Moreover, some problems are also discussed related to algorithm's computation and performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. On a Boundary Model in Problems of the Gas Flow around Solids.
- Author
-
Polyakov, S. V. and Podryga, V. O.
- Abstract
This paper studies the development of a multiscale approach to calculate the gas flows near solid surfaces taking into account microscopic effects. In this line of research, the problem of setting boundary conditions on the surface of a solid body is considered, taking into account the effects preliminarily calculated at the atomic-molecular level. The main aim of this paper is to formulate macroscopic boundary equations that take into account the processes on the surface of a solid body around which there is a flow of gas. The macroscopic model is based on a system of quasi-gasdynamic (QGD) equations in the volume and the thermal conductivity equation in the near-surface layer of the streamlined body. The system is supplemented with real gas state equations and dependencies of the kinetic coefficients of the QGD equations on temperature and pressure, obtained on the basis of molecular dynamics calculations. To test the proposed boundary equations, the problem of the gas flow around a blunt body is considered. Dry air is selected as the gas. Nickel is chosen as the body coating. Calculations are carried out for two values of the inlet velocity. They confirm the qualitative correctness of the developed boundary model and the entire modeling technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Beyond-local neural information processing in neuronal networks
- Author
-
Johannes Balkenhol, Barbara Händel, Sounak Biswas, Johannes Grohmann, Jóakim v. Kistowski, Juan Prada, Conrado A. Bosman, Hannelore Ehrenreich, Sonja M. Wojcik, Samuel Kounev, Robert Blum, and Thomas Dandekar
- Subjects
Neuronal field model ,Information integration ,Neural network ,Columnar architecture ,Parallel computing ,Neuronal oscillations ,Biotechnology ,TP248.13-248.65 - Abstract
While there is much knowledge about local neuronal circuitry, considerably less is known about how neuronal input is integrated and combined across neuronal networks to encode higher order brain functions. One challenge lies in the large number of complex neural interactions. Neural networks use oscillating activity for information exchange between distributed nodes. To better understand building principles underlying the observation of synchronized oscillatory activity in a large-scale network, we developed a reductionistic neuronal network model. Fundamental building principles are laterally and temporally interconnected virtual nodes (microcircuits), wherein each node was modeled as a local oscillator. By this building principle, the neuronal network model can integrate information in time and space. The simulation gives rise to a wave interference pattern that spreads over all simulated columns in form of a travelling wave. The model design stabilizes states of efficient information processing across all participating neuronal equivalents. Model-specific oscillatory patterns, generated by complex input stimuli, were similar to electrophysiological high-frequency signals that we could confirm in the primate visual cortex during a visual perception task. Important oscillatory model pre-runners, limitations and strength of our reductionistic model are discussed. Our simple scalable model shows unique integration properties and successfully reproduces a variety of biological phenomena such as harmonics, coherence patterns, frequency-speed relationships, and oscillatory activities. We suggest that our scalable model simulates aspects of a basic building principle underlying oscillatory, large-scale integration of information in small and large brains.
- Published
- 2024
- Full Text
- View/download PDF
33. Automation of conceptual design and modification of aircraft type unmanned aerial vehicles using multidisciplinary optimization and evolutionary algorithms. Part 1: Methods and models
- Author
-
V. A. Komarov, O. E. Lukyanov, V. H. Hoang, E. I. Kurkin, and J. G. QuijadaPioquinto
- Subjects
unmanned aerial vehicle ,appearance ,design ,takeoff weight ,optimization ,evolutionary algorithm ,aerodynamics ,balancing ,penalty function ,parallel computing ,Motor vehicles. Aeronautics. Astronautics ,TL1-4050 - Abstract
This paper proposes a method for selecting rational parameters for large-size aircraft-type unmanned aerial vehicles at the initial design stages using an optimization algorithm of differential evolution and numerical mathematical modeling of aerodynamic problems. The method assumes implementation of weight and aerodynamic balance in the main flight modes, it can consider aircraft-type unmanned aerial vehicles with one or two lifting surfaces, applies parallel calculations, and automatically generates a three-dimensional geometric model of the aircraft appearance based on the optimization results. A method for accelerating by more than three times the process of solving the problem of optimizing aircraft takeoff weight parameters by introducing the target function into the set of design variables is proposed and demonstrated. The results of assessing the reliability of the mathematical models used for aerodynamics and the correct calculation of the target function are presented, taking into account various constraints. A comprehensive check of the operability and effectiveness of the method were considered by solving demonstration problems by optimizing more than ten main design parameters of the appearance of two existing heavy-class unmanned aerial vehicles with known characteristics from open sources. Examples of using the optimization results to modify prototypes are provided.
- Published
- 2024
- Full Text
- View/download PDF
34. Cooperative, collaborative, coevolutionary multi-objective optimization on CPU-GPU multi-core.
- Author
-
Sun, Zhuoran, Liu, Ying Ying, and Thulasiraman, Parimala
- Abstract
Coevolutionary multi-objective heuristics solve multi-objective optimization problems by evolving two different heuristics, simultaneously while exchanging information to produce diverse solutions and faster convergence. However, evolving two algorithms concurrently is computationally intensive and slow. In this research article, we study the parallelization of Cooperative, Concurrent, Coevolutionary for Multi-objective Optimization ( CO MO 3 ) algorithm, designed for dynamic problems. The two evolutionary algorithms, Non-dominated Sorting Genetic Algorithm II (NSGA-II) and Multi-objective Evolutionary Algorithm based on Decomposition (MOEA/D) are parallelized on GPU and CPU architectures, respectively. The populations in MOEA/D are further partitioned forming an island topology to preserve diversity. Using the bi-objective traveling salesperson benchmark dataset, we analyze the performance of the individual algorithms and coevolutionary algorithm with respect to time and accuracy of the results. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. A parallel computing approach to CNN-based QbE-STD using kernel-based matching.
- Author
-
Naik Gaonkar, Manisha, Thenkanidiyoor, Veena, and Dileep, Aroor Dinesh
- Abstract
In query-by-example spoken term detection (QbE-STD), reference utterances are matched with an audio query. A matching matrix-based approach to QbE-STD needs to compute a matching matrix between a query and reference utterance using an appropriate similarity metric. Recent approaches use kernel-based matching to compute this matching matrix. The matching matrices are converted to grayscale images and given to a CNN-based classifier. In this work, we propose to speed up QbE-STD by computing the matching matrix in parallel using a coarse-grained data parallelism approach. We explore two approaches to coarse-grained data parallelism: In the first approach, we compute parts of the matching matrix in parallel and then combine them to form a matching matrix, while in the second one, we propose to compute matrices in parallel. We also propose to convert the matching matrices into two-colored images using the threshold and use these images for QbE-STD. The efficacy of the proposed parallel computation approach is explored using the TIMIT dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. A new parallelized of hierarchical value iteration algorithm for discounted Markov decision processes.
- Author
-
Nachaoui, Mourad, Chafik, Sanae, and Daoui, Cherki
- Abstract
Markov Decision Process (MDP) is a popular mathematical framework for modeling stochastic sequential problems under uncertainty. These models appear in many applications, such as computer science, engineering, telecommunications, and finance, among others. One of the most challenging goals is to deal with complexity reduction in the case of large MDP. In this paper; we propose an optimal strategy deals with large MDP under discount reward. The proposed approach is based on an intelligent combination of a decomposition technique and an efficient parallel strategy. The global MDP is splitting into several 'sub-MDPs', subsequently, these MDPs are classified by level following the strongly connected components principle. A master-slave strategy base on Message Passing Interface (MPI) is proposed to solve the obtained problem. The performance of the proposed approach is shown in terms of scalability, cost, and execution speed. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. A distributed software system for integrating data-intensive imaging methods in a hard X-ray nanoprobe beamline at the SSRF
- Author
-
Peicheng Zhang, Zhisen Jiang, Yan He, and Aiguo Li
- Subjects
distributed software system ,synchrotron radiation big data ,ptychography ,parallel computing ,Nuclear and particle physics. Atomic energy. Radioactivity ,QC770-798 ,Crystallography ,QD901-999 - Abstract
The development of hard X-ray nanoprobe techniques has given rise to a number of experimental methods, like nano-XAS, nano-XRD, nano-XRF, ptychography and tomography. Each method has its own unique data processing algorithms. With the increase in data acquisition rate, the large amount of generated data is now a big challenge to these algorithms. In this work, an intuitive, user-friendly software system is introduced to integrate and manage these algorithms; by taking advantage of the loosely coupled, component-based design approach of the system, the data processing speed of the imaging algorithm is enhanced through optimization of the parallelism efficiency. This study provides meaningful solutions to tackle complexity challenges faced in synchrotron data processing.
- Published
- 2024
- Full Text
- View/download PDF
38. A Parallel Approach to Enhance the Performance of Supervised Machine Learning Realized in a Multicore Environment
- Author
-
Ashutosh Ghimire and Fathi Amsaad
- Subjects
machine learning ,parallel computing ,accuracy ,performance ,ensemble model ,multicore processing ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
Machine learning models play a critical role in applications such as image recognition, natural language processing, and medical diagnosis, where accuracy and efficiency are paramount. As datasets grow in complexity, so too do the computational demands of classification techniques. Previous research has achieved high accuracy but required significant computational time. This paper proposes a parallel architecture for Ensemble Machine Learning Models, harnessing multicore CPUs to expedite performance. The primary objective is to enhance machine learning efficiency without compromising accuracy through parallel computing. This study focuses on benchmark ensemble models including Random Forest, XGBoost, ADABoost, and K Nearest Neighbors. These models are applied to tasks such as wine quality classification and fraud detection in credit card transactions. The results demonstrate that, compared to single-core processing, machine learning tasks run 1.7 times and 3.8 times faster for small and large datasets on quad-core CPUs, respectively.
- Published
- 2024
- Full Text
- View/download PDF
39. Sustainable Material Selection in New Constructions: A Brute‐Force Optimization Framework Using Parallel Computing, Cost Benefits, and Thermal Performance Analysis.
- Author
-
Arab Anvari, Ehsan, Sadi, Sajad, Gholami, Javad, Fayaz, Rima, and Fan, Dingqiang
- Subjects
SUSTAINABILITY ,CONSUMPTION (Economics) ,EXTERIOR walls ,CONSTRUCTION materials ,PARALLEL programming - Abstract
Sustainable construction practices rely on carefully selecting building materials and balancing environmental and economic considerations. This study examines the complex link between local climate, market dynamics, and building material selection. Market data analysis, parametric modeling, and brute‐force optimization are used to provide insights into construction decision‐making. Across 5540 simulations, a thorough assessment of the financial and energy performance of various materials for walls, roofs, windows, and floors is conducted. Incorporating Pareto ranking, parallel simulation, and sensitivity analysis, the comprehensive evaluation reveals the intricate tradeoffs between cost, thermal properties, and energy savings. The findings highlight the potential for optimal external wall solutions to reduce U‐values by up to 30% and achieve source energy savings of up to 25% source energy savings across diverse climates. By emphasizing the importance of local context in material selection, this study highlights how energy consumption patterns and transmission losses influence financial and energy performance, thus advancing sustainable construction practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Histogram algorithm and its circuit design based on parallel computing for quantum video.
- Author
-
Zhang, Qianqian, Lu, Dayong, Hu, Yingying, and Xu, Meiyu
- Subjects
QUANTUM computing ,QUANTUM superposition ,QUANTUM statistics ,PARALLEL programming ,SEARCH algorithms - Abstract
Quantum image histogram as a preprocessing result in quantum image processing contains the gray information of the image and plays an important role in subsequent image processing. As far as we know, there are only a few results on quantum image histogram, and studies on quantum video histogram have not been conducted. So a novel histogram statistic algorithm for quantum video in terms of the idea of parallel computing is proposed in the paper. To this end, the quantum version of carry-lookahead full-adder is first devised, and based on the novel full-adder, an entirely new hierarchical quantum adder for superposition states is also devised, which not only improves the delays generated by mutual carries of classical adder, but also reduces the complexity of classical adder from O (2 m × n) to O (m 2) . Subsequently, in order to realize the parallel statistics of quantum video, the algorithm and circuit implementation of image stitching are also given. Finally, combining the results of image stitching and Grover's search algorithm, the quantum video histogram statistics is ultimately realized in parallel. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Probabilistic Cellular Automata Monte Carlo for the Maximum Clique Problem.
- Author
-
Troiani, Alessio
- Subjects
- *
MARKOV chain Monte Carlo , *PROBABILISTIC automata , *DISTRIBUTION (Probability theory) , *CELLULAR automata , *POLYNOMIAL time algorithms - Abstract
We consider the problem of finding the largest clique of a graph. This is an NP-hard problem and no exact algorithm to solve it exactly in polynomial time is known to exist. Several heuristic approaches have been proposed to find approximate solutions. Markov Chain Monte Carlo is one of these. In the context of Markov Chain Monte Carlo, we present a class of "parallel dynamics", known as Probabilistic Cellular Automata, which can be used in place of the more standard choice of sequential "single spin flip" to sample from a probability distribution concentrated on the largest cliques of the graph. We perform a numerical comparison between the two classes of chains both in terms of the quality of the solution and in terms of computational time. We show that the parallel dynamics are considerably faster than the sequential ones while providing solutions of comparable quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Iterative algorithms for partitioned neural network approximation to partial differential equations.
- Author
-
Yang, Hee Jun and Kim, Hyea Hyun
- Subjects
- *
ARTIFICIAL neural networks , *PARTIAL differential equations , *PARALLEL programming , *PARALLEL algorithms , *ALGORITHMS - Abstract
To enhance solution accuracy and training efficiency in neural network approximation to partial differential equations, partitioned neural networks can be used as a solution surrogate instead of a single large and deep neural network defined on the whole problem domain. In such a partitioned neural network approach, suitable interface conditions or subdomain boundary conditions are combined to obtain a convergent approximate solution. However, there has been no rigorous study on the convergence and parallel computing enhancement on the partitioned neural network approach. In this paper, iterative algorithms are proposed to enhance parallel computation performance in the partitioned neural network approximation. Our iterative algorithms are based on classical additive Schwarz domain decomposition methods. For the proposed iterative algorithms, their convergence is analyzed under an error assumption on the local and coarse neural network solutions. Numerical results are also included to show the performance of the proposed iterative algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework.
- Author
-
Zhang, Rongbin, Hou, Jingming, Li, Jingsi, Wang, Tian, and Imran, Muhammad
- Subjects
WATER management ,WATER distribution ,MUNICIPAL water supply ,MATRIX inversion ,WATER supply ,PARALLEL algorithms ,GRAPHICS processing units - Abstract
Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing challenges for practical applications. To address this issue, this paper proposes a parallel gradient calculation algorithm based on GPU hardware and the CUDA Toolkit library and compares it with the EPANET model and a model based on CPU hardware and the Armadillo library. The results show that the GPU-based model not only achieves a precision level very close to the EPANET model, reaching 99% accuracy, but also significantly outperforms the CPU-based model. Furthermore, during the simulation, the GPU architecture is able to efficiently handle large-scale data and achieve faster convergence, significantly reducing the overall simulation time. Particularly in handling larger-scale water distribution networks, the GPU architecture can improve computational efficiency by up to 13 times. Further analysis reveals that different GPU models exhibit significant differences in computational efficiency, with memory capacity being a key factor affecting performance. GPU devices with larger memory capacity demonstrate higher computational efficiency when processing large-scale water distribution networks. This study demonstrates the advantages of GPU acceleration technology in the simulation of large-scale urban water distribution networks and provides important theoretical and technical support for practical applications in this field. By carefully selecting and configuring GPU devices, the computational efficiency of large-scale water distribution networks can be significantly improved, providing more efficient solutions for future urban water resource management and planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Research on Multiplication Routine Based on Reconfigurable Four-Valued Logic Processor.
- Author
-
Liao, Shanchuan, Li, Shuang, Li, Luqun, Li, Xiaofeng, Gu, Xingquan, and Zhang, Sulan
- Subjects
PROCESS capability ,COMPUTERS ,COMPUTER systems ,PARALLEL programming ,MODERN society - Abstract
Despite the indispensable role of traditional electronic computers in modern society, their limitations in parallel processing capabilities, bit-width constraints, and processor bit-width are becoming increasingly apparent, especially when handling large-scale datasets and complex computational tasks. Although hardware technology and algorithm optimization continue to advance, the arithmetic units of traditional computers—adders—remain constrained by carry delay and bit-width limitations. This bottleneck is particularly pronounced in multiplication operations, mainly when adders are used for partial product accumulation. However, since 2018, the emergence of a new type of Reconfigurable Four-Valued Logic Electronic Processor (RFLEP) has provided a potential solution to these traditional limitations. With its large processor bit-width, flexible bit grouping capabilities, and dynamic hardware function reconfiguration features, this processor has brought revolutionary changes to the field of computing. In this context, this paper proposes and implements a Reconfigurable Four-Valued Logic Multiplication Routine (RFLMR) tailored explicitly for the RFLEP. The RFLMR utilizes the Modified Signed-Digit (MSD) representation method in multi-valued logic combined with the M transformation in four-valued logic to generate partial products. These partial products are then efficiently summed in parallel using the JW-MSD parallel adder, achieving the rapid execution of multiplication operations. Experimental results demonstrate that the multiplication routine based on the RFLEP performs multiplication operations accurately and meets theoretical expectations regarding implementation efficiency and performance. This research not only provides new ideas for developing next-generation high-performance computing systems but also paves the way for exploring more efficient and powerful computing models, heralding a profound transformation in future computing technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Parallel BFS through pennant data structure with reducer hyper‐object based data hiding for 3D mesh images.
- Author
-
Bandyopadhyay, Sakhi, Mukherjee, Subhadip, Mukhopadhyay, Somnath, and Sarkar, Sunita
- Subjects
- *
DATA structures , *TIME complexity , *THREE-dimensional imaging , *INTELLECTUAL property , *DATA security - Abstract
Data hiding refers to the technique employed to conceal sensitive data within various forms of multimedia, such as video, audio, text, 2D images, 3D images, and more, while also being able to successfully extract said information from the multimedia file. Although numerous data hiding techniques exist for 2D images, research on data hiding in 3D images is still in its early stages. Current 3D image steganography methods suffer from limitations in terms of embedding capacity and time complexity. To overcome these difficulties, we propose a novel 3D image steganographic technique using Parallel Breadth First Search (PBFS) with hyper‐object. Our approach utilizes the PBFS searching strategy, along with layer synchronization, to embed private information within the vertices of the 3D mesh images. To maximize the effectiveness of the process and minimize time, cost, complexity, and execution, we involved the data structure "bag" to parallelize the BFS. The implementation of bags is based on the pennant data structure. This methodology obtained an impressive Embedding Capacity (EC) of 9.00 bits per vertex (bpv) while maintaining superior visual quality and time complexity. Consequently, the proposed methodology holds great potential for widespread utilization across numerous sectors, including private and government organizations such as intelligence agencies, intellectual property rights management, cloud data security, defense, covert communication, medical imagery, and more. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. A distributed software system for integrating data‐intensive imaging methods in a hard X‐ray nanoprobe beamline at the SSRF.
- Author
-
Zhang, Peicheng, Jiang, Zhisen, He, Yan, and Li, Aiguo
- Subjects
- *
SYNCHROTRON radiation , *ELECTRONIC data processing , *INTEGRATED software , *SYSTEMS software , *BIG data - Abstract
The development of hard X‐ray nanoprobe techniques has given rise to a number of experimental methods, like nano‐XAS, nano‐XRD, nano‐XRF, ptychography and tomography. Each method has its own unique data processing algorithms. With the increase in data acquisition rate, the large amount of generated data is now a big challenge to these algorithms. In this work, an intuitive, user‐friendly software system is introduced to integrate and manage these algorithms; by taking advantage of the loosely coupled, component‐based design approach of the system, the data processing speed of the imaging algorithm is enhanced through optimization of the parallelism efficiency. This study provides meaningful solutions to tackle complexity challenges faced in synchrotron data processing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. A scalable parallel computing method for autonomous platoons.
- Author
-
Wu, Qing, Ge, Xiaohua, Han, Qing-Long, Cole, Colin, and Spiryagin, Maksym
- Subjects
- *
PARALLEL programming , *MESSAGE passing (Computer science) , *ADAPTIVE control systems , *PERSONAL computers , *MOTOR vehicle driving - Abstract
This paper developed a scalable parallel computing method that can be used for platoon simulations and controller validations. A scalable adaptive platooning control law was firstly designed, which accommodates a variety of vehicle-to-vehicle communication topologies. A road vehicle dynamics model that considered the Magic Formula tyre model and suspension dynamics was then derived and validated. The parallel computing method adopted the Message Passing Interface technique to allow fast and scalable simulations. Platoon length changes do not require controller and algorithm changes. An 11-vehicle platoon on a real-world 10 km long road section was simulated. Different localisation sensor errors, communication delays, heterogenous vehicle masses and driving modes were considered. Results show that localisation errors have negligible influences on space errors. Aggressive driving and heterogeneous vehicle masses slightly increase space errors (increases less than 0.23 m). Communication delays are the greatest influencer for space errors. Increases for 15, 45 and 75 ms delays were 0.43, 1.41 and 2.41 m, respectively. It is further shown that parallel computing can improve the computing speed by three times on personal computers and seven to 12 times on workstations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Mathematical modeling of the chemotaxis-biodenitrification process.
- Author
-
Abaali, M. and Ouchtout, S.
- Subjects
- *
FINITE element method , *PARALLEL programming , *NONLINEAR systems , *POROUS materials , *MATHEMATICAL models - Abstract
In this paper, we develop and study a new mathematical model to describe the good functioning of the biodenitrification process, combining both the biological and the mechanical aspects, and taking into account the chemotaxis-phenomenon. The model is governed by a non-linear reaction-diffusion-advection system for the bacterial activity and the Darcy flow for the porous medium. We prove the existence and uniqueness of the solution and we carry out the numerical approximation of the model within a variational framework. We propose a full discrete system based on a finite element method and we perform numerical simulations both in 2D and 3D. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching.
- Author
-
Solanki, Arnav, Chen, Tonglin, and Riedel, Marc
- Subjects
- *
PARALLEL programming , *SIMD (Computer architecture) , *COMPUTER science , *DNA - Abstract
Prior research has introduced the Single-Instruction-Multiple-Data paradigm for DNA computing (SIMD DNA). It offers the potential for storing information and performing in-memory computations on DNA, with massive parallelism. This paper introduces three new SIMD DNA operations: sorting, shifting, and searching. Each is a fundamental operation in computer science. Our implementations demonstrate the effectiveness of parallel pairwise operations with this new paradigm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Energy and Scientific Workflows: Smart Scheduling and Execution.
- Author
-
WARADE, MEHUL, LEE, KEVIN, RANAWEERA, CHATHURIKA, and SCHNEIDER, JEAN-GUY
- Subjects
HIGH performance computing ,PARALLEL programming ,COMPUTER workstation clusters ,ENERGY consumption ,SCIENTIFIC computing ,WORKFLOW management systems - Abstract
Energy-efficient computation is an increasingly important target in modern-day computing. Scientific computation is conducted using scientific workflows that are executed on highly scalable compute clusters. The execution of these workilows is generally geared towards optimizing run-time performance with the energy footprint of the execution being ignored. Evidently. minimizing both execution time as well as energy consumption does not have to be mutually exclusive. The aim of the research presented in this paper is to highlight the benefits of energy-aware scientific workflow execution. In this paper. a set of requirements for an energy-aware scheduler are outlined and a conceptual architecture for the scheduler is presented. The evaluation of the conceptual architecture was performed by developing a proof of concept scheduler which was able to achieve around 49.97% reduction in the energy consumption of the computation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.