35 results on '"Single-core"'
Search Results
2. Modelling multi-criticality vehicular software systems: evolution of an industrial component model
- Author
-
Bucaioni, Alessio, Mubeen, Saad, Ciccozzi, Federico, Cicchetti, Antonio, and Sjödin, Mikael
- Published
- 2020
- Full Text
- View/download PDF
3. On Parallel Scalable Uniform SAT Witness Generation
- Author
-
Kuldeep S. Meel, Moshe Y. Vardi, Sanjit A. Seshia, Daniel J. Fremont, and Supratik Chakraborty
- Subjects
Improved performance ,Multi-core processor ,Parallelizable manifold ,Speedup ,Computer science ,Hash function ,Scalability ,Single-core ,Parallel computing ,Conjunctive normal form ,Algorithm - Abstract
Constrained-random verification CRV is widely used in industry for validating hardware designs. The effectiveness of CRV depends on the uniformity of test stimuli generated from a given set of constraints. Most existing techniques sacrifice either uniformity or scalability when generating stimuli. While recent work based on random hash functions has shown that it is possible to generate almost uniform stimuli from constraints with 100,000+ variables, the performance still falls short of today's industrial requirements. In this paper, we focus on pushing the performance frontier of uniform stimulus generation further. We present a random hashing-based, easily parallelizable algorithm, UniGen2, for sampling solutions of propositional constraints. UniGen2 provides strong and relevant theoretical guarantees in the context of CRV, while also offering significantly improved performance compared to existing almost-uniform generators. Experiments on a diverse set of benchmarks show that UniGen2 achieves an average speedup of about 20× over a state-of-the-art sampling algorithm, even when running on a single core. Moreover, experiments with multiple cores show that UniGen2 achieves a near-linear speedup in the number of cores, thereby boosting performance even further.
- Published
- 2015
4. Image Classification Optimization of High Resolution Tissue Images
- Author
-
Levente Kovács, Gergely Windisch, K. Hegedűs, Miklos Kozlovszky, and G. Pintér
- Subjects
Theoretical computer science ,Contextual image classification ,business.industry ,Computer science ,Image processing ,Pattern recognition ,computer.software_genre ,Porting ,Software framework ,Software ,Scalability ,Single-core ,Artificial intelligence ,business ,computer ,Classifier (UML) - Abstract
Generic image classification methods are not performing well on tissue images. Such software solutions are producing high number of false negative and positive results, which prevents their clinical usage. We have created the MorphCeck high resolution tissue image processing framework, which enables us to collect morphological and morphometrical parameter values of the examined tissues. Size of such tissue images can easily reach the order of 100 MB–1 GB. Therefore, the image processing speed and effectiveness is an important factor. Our main goal is to accurately evaluate high resolution H-E (hematoxilin-eozin) stained colon tissue sample images, and based on the parameters classify the images into differentiated sets according to the structure and the surface manifestation of the tissues. We have interfaced our MorphCheck tissue image measurement software framework with the WND-CHARM general purpose image classifier and tried to classify high resolution tissue images with this combined software solution. The classification is by default initiated with a large training set and three main classes (healthy, adenoma, carcinoma), however the new image classification process’ wall-clock time was intolerably high on single core PC. The processing time is depending on the size/resolution of the image and the size of the training set. Due to the tissue specific image parameters the classification effectiveness was promising. So we have started a development process to decrease the processing time and further increase the accuracy of the classification. We have developed a workflow based parallel version of the MorphCheck and WND-CHARM classifier software. In collaboration with the MTA SZTAKI Application Porting Centre the WND-CHARM has been ported to some distributed computing infrastructure (DCI). The paper introduces the steps that were taken to optimize WND-CHARM applications running faster using DCIs and some performance results of the tissue image classification process.
- Published
- 2014
5. Infrastructure-Free Logging and Replay of Concurrent Execution on Multiple Cores
- Author
-
Xiangyu Zhang, Dohyeong Kim, and Kyu Hyung Lee
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Logging ,Shared variables ,Workload ,Thread (computing) ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Software quality ,Software ,Debugging ,Operating system ,Single-core ,business ,computer ,media_common - Abstract
We develop a logging and replay technique for real concurrent execution on multiple cores. Our technique directly works on binaries and does not require any hardware or complex software infrastructure support. We focus on minimizing logging overhead as it only logs a subset of system calls and thread spawns. Replay is on a single core. During replay, our technique first tries to follow only the event order in the log. However, due to schedule differences, replay may fail. An exploration process is then triggered to search for a schedule that allows the replay to make progress. Exploration is performed within a window preceding the point of replay failure. During exploration, our technique first tries to reorder synchronized blocks. If that does not lead to progress, it further reorders shared variable accesses. The exploration is facilitated by a sophisticated caching mechanism. Our experiments on real world programs and real workload show that the proposed technique has very low logging overhead (2.6% on average) and fast schedule reconstruction.
- Published
- 2014
6. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi
- Author
-
Tim Cramer, Matthias S. Müller, Christian Terboven, Sandra Wienke, and Dirk Schmidl
- Subjects
Xeon ,Computer science ,Scalability ,Programming paradigm ,x86 ,Single-core ,Hyper-threading ,Memory bandwidth ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Software_PROGRAMMINGTECHNIQUES ,Xeon Phi - Abstract
The Intel Xeon Phi has been introduced as a new type of compute accelerator that is capable of executing native x86 applications. It supports programming models that are well-established in the HPC community, namely MPI and OpenMP, thus removing the necessity to refactor codes for using accelerator-specific programming paradigms. Because of its native x86 support, the Xeon Phi may also be used stand-alone, meaning codes can be executed directly on the device without the need for interaction with a host. In this sense, the Xeon Phi resembles a big SMP on a chip if its 240 logical cores are compared to a common Xeon-based compute node offering up to 32 logical cores. In this work, we compare a Xeon-based two-socket compute node with the Xeon Phi stand-alone in scalability and performance using OpenMP codes. Considering both as individual SMP systems, they come at a very similar price and power envelope, but our results show significant differences in absolute application performance and scalability. We also show in how far common programming idioms for the Xeon multi-core architecture are applicable for the Xeon Phi many-core architecture and which challenges the changing ratio of core count to single core performance poses for the application programmer.
- Published
- 2013
7. Iterative Deblurring of Large 3D Datasets from Cryomicrotome Imaging Using an Array of GPUs
- Author
-
Jos A. E. Spaan, Jeroen P. H. M. van den Wijngaard, Pepijn van Horssen, Maria Siebes, and T. Geenen
- Subjects
Point spread function ,Deblurring ,Kernel (image processing) ,Pixel ,Shared memory ,Computer science ,Fast Fourier transform ,Kernel adaptive filter ,Single-core ,Computational science - Abstract
The aim was to enhance vessel like features of large 3D datasets (\(4000 \times 4000 \times 4000\) pixels) resulting from cryomicrotome images using a system specific point spread function (PSF). An iterative (Gauss-Seidel) spatial convolution strategy for GPU arrays was developed to enhance the vessels. The PSF is small and spatially invariant and resides in fast constant memory of the GPU while the unfiltered data reside in slower global memory but are prefetched by blocks of threads in shared GPU memory. Filtering is achieved by a series of unrolled loops in shared memory. Between iterations the filtered data is stored to disk using asynchronous MPI-IO effectively hiding the IO overhead with the kernel execution time. Our implementation reduces computational time up to 350 times on four GPU’s in parallel compared to a single core CPU implementation and outperforms FFT based filtering strategies on GPU’s. Although developed for filtering the complete arterial system of the heart, the method is general applicable.
- Published
- 2013
8. Solving a DLP with Auxiliary Input with the ρ-Algorithm
- Author
-
Tetsuya Izu, Masahiko Takenaka, Yumi Sakemi, and Masaya Yasuda
- Subjects
Elliptic curve ,Finite field ,Discrete logarithm ,Computer science ,Order (group theory) ,Single-core ,Cyclic group ,Algorithm ,Prime (order theory) ,Integer (computer science) - Abstract
The discrete logarithm problem with auxiliary input (DLPwAI) is a problem to find a positive integer α from elements G, αG, αdG in an additive cyclic group generated by G of prime order r and a positive integer d dividing r ---1. In 2011, Sakemi et al. implemented Cheon's algorithm for solving DLPwAI, and solved a DLPwAI in a group with 128-bit order r in about 131 hours with a single core on an elliptic curve defined over a prime finite field which is used in the TinyTate library for embedded cryptographic devices. However, since their implementation was based on Shanks' Baby-step Giant-step (BSGS) algorithm as a sub-algorithm, it required a large amount of memory (246 GByte) so that it was concluded that applying other DLPwAIs with larger parameter is infeasible. In this paper, we implemented Cheon's algorithm based on Pollard's ρ-algorithm in order to reduce the required memory. As a result, we have succeeded solving the same DLPwAI in about 136 hours by a single core with less memory (0.5 MByte).
- Published
- 2012
9. An Approach for Performance Estimation of Hybrid Systems with FPGAs and GPUs as Coprocessors
- Author
-
Thilo Pionteck, Volker Hampel, and Erik Maehle
- Subjects
Coprocessor ,Cycles per instruction ,Computer science ,Computation ,Hybrid system ,Clock rate ,Single-core ,Parallel computing ,Central processing unit ,Field-programmable gate array - Abstract
This paper presents an approach for modeling the achievable speed-ups of FPGAs (Field Programmable Gate Arrays) or GPUs (Graphic Processing Units) as coprocessors in hybrid computing systems. The underlying computation model assumes that the coprocessors are separate devices and that their input and output data are transferred from and into the system's memory. The model considers all overheads involved when (sub-)tasks are performed on a coprocessor instead of the CPU. By means of a sample application the validity of the model is checked against measured values. In addition, the theoretical maximum speed-ups of two hybrid systems compared to an optimal single core CPU implementation are approximated. Using penalty factor PSEQ as a measure to which degree a program cannot be fully parallelized due to data dependencies, a system with a Nvidia GTX 285 GPU achieves a speed-up of 2.7 times PSEQ , while for a single node of a Cray XD1 with a Xilinx Virtex4 LX160 the speed-up is about 1 times PSEQ .
- Published
- 2012
10. Research on Key Factors of Core Competence of Chinese Group-Buying Websites
- Author
-
Yinghan Tang
- Subjects
Group buying ,Core competency ,Single-core ,Business ,Marketing ,China ,Oligarchy ,Competence (human resources) ,Competitive advantage ,Profit (economics) - Abstract
China has witnessed a vigorous development of Group-buying Websites (GBWs) in recent years. Along with the rapidly growing number of GBWs is the increasing competition among them. How to become a final winner of thousands of GBWs? The answer lies in core competence. Each GBW must have its own core competence. Core competence is the competence that enables a GBW to maintain its long and sustainable competitive advantages so as to gain a stable and superior profit. After having systematically analyzed competitive factors of successful GBWs in China, the author found some distinct differences in the sources of core competence between traditional firms and GBWs. The source of core competence of GBWs can be derived elsewhere from their merchants and customers. Heterogeneous GBWs have different core competence factors while homogeneous GBWs have the same or similar core competence. In order to gain everlasting core competence, GBWs in China cannot afford to rely on a single core competence; on the contrary, they must establish their multi-facet core competence before an oligarchy monopolized market is formed.
- Published
- 2012
11. Evaluation of Different Magnetic Particle Systems with Respect to Its MPI Performance
- Author
-
Dietmar Eberbeck, Lutz Trahms, and Harald Kratz
- Subjects
chemistry.chemical_compound ,Magnetic anisotropy ,Materials science ,chemistry ,Anisotropy energy ,Magnetic nanoparticles ,Single-core ,Magnetic particle inspection ,Spectroscopy ,Molecular physics ,Order of magnitude ,Magnetite - Abstract
The Magnetic Particle Spectroscopy (MPS)-amplitudes were measured on 7 suspensions of magnetite based magnetic particles (MNP) differing in core size and magnetic anisotropy. The distributions of the effective domain sizes, estimated by means of quasistatic M(H) measurements and Magnetorelaxometry (MRX), matches well the core size distribution for the single core MNP-systems estimated by electron microscopy. Two systems, namely Resovist and M4E clearly exhibit a bimodal domain size distribution. It was shown, that the MPS amplitudes strongly increase with increasing domain size up to 21 nm, the mean value of the larger fraction of Resovist. For M4E with a mean size of the larger fraction of 33 nm the measured MPS-amplitudes became much smaller than those of Resovist, in particular for the higher harmonics. That behaviour was attributed to the mean anisotropy energy of these MNPs, estimated by MRX, exceeding that of Resovist by one order of magnitude. The effect of the MNP’s magnetic anisotropy is also supported by comparison of measured MPS-amplitudes with those which were calculated on the base of M(H)-data.
- Published
- 2012
12. Power Consumption in Multi-core Processors
- Author
-
M. Balakrishnan
- Subjects
Multi-core processor ,Power consumption ,Computer science ,business.industry ,Embedded system ,Clock rate ,Key (cryptography) ,Single-core ,business ,Power (physics) ,Microarchitecture ,TRACE (psycholinguistics) - Abstract
Power consumption in processors has become a major concern and clearly that has been one key factor behind growth of multi-core processors to achieve performance rather than single core with increased clock frequency. In this talk we would start by describing the processor power consumption issues as well as motivation for low power multi-core processors. We would also briefly trace the impact on power consumption as the processor architecture evolution mainly focussed on increasing performance. We would finally describe our recent research efforts focussed on multi-core power estimation.
- Published
- 2012
13. Large Displacement Optical Flow for Volumetric Image Sequences
- Author
-
Benjamin Ummenhofer
- Subjects
business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Displacement (vector) ,Symmetry (physics) ,Synthetic data ,Constraint (information theory) ,Flow (mathematics) ,Volumetric image ,Single-core ,Computer vision ,Artificial intelligence ,business ,Algorithm ,Mathematics - Abstract
In this paper we present a variational optical flow algorithm for volumetric image sequences (3D + time). The algorithm uses descriptor correspondences that allow us to capture large motions. Further we describe a symmetry constraint that considers the forward and the backward flow of an image sequence to improve the accuracy of the flow field. We have tested our algorithm on real and synthetic data. Our experiments include a quantitative evaluation that show the impact of the algorithm's components. We compare a single core implementation to two parallel implementations, one on a multi-core CPU and one on the GPU.
- Published
- 2011
14. Dependence of Functional Characteristics of Miniature Two Axis Fluxgate Sensors Made in PCB Technology on Chemical Composition of Amorphous Core
- Author
-
Krzysztof Trzcinka, Jacek Salach, Roman Szewczyk, and Piotr Frydrych
- Subjects
Core (optical fiber) ,Materials science ,Amorphous metal ,business.industry ,Optoelectronics ,Single-core ,Nanotechnology ,business ,Chemical composition ,Fluxgate compass ,Low frequency magnetic field ,Magnetic field ,Amorphous solid - Abstract
Fluxgate sensors are common used in week and low frequency magnetic field measurements. Recently rapid development of miniature fluxgate sensors made in PCB technology is observed. Until now single core layer sensors were unable to measure magnetic field for two directions. Moreover no modelling results of influence of amorphous alloy chemical composition on sensor properties were available.
- Published
- 2011
15. Enhanced Adaptive Insertion Policy for Shared Caches
- Author
-
Chongmin Li, Xi Zhang, Dongsheng Wang, Haixia Wang, and Yibo Xue
- Subjects
Hardware_MEMORYSTRUCTURES ,Speedup ,Computer science ,CPU cache ,Adaptive replacement cache ,Working set ,Single-core ,Thread (computing) ,Parallel computing ,Cache ,Cache algorithms - Abstract
The LRU replacement policy is commonly used in the lastlevel caches of multiprocessors. However, LRU policy does not work well for memory intensive workloads which working set are greater than the available cache size. When a new arrival cache block is inserted at the MRU position, it may never be reused until being evicted from the cache but occupy the cache space for a long time during its movement from the MRU to the LRU position. This results in inefficient use of cache space. If we insert a new cache block at the LRU position directly, the cache performance can be improved by keeping some fraction of the working sets is retained in the caches. In this work, we propose Enhanced Dynamic Insertion Policy (EDIP) and Thread Aware Enhanced Dynamic Insertion Policy (TAEDIP) which can adjust the probability of insertion at MRU by set dueling. The runtime information of the previous and the next BIP level are gathered and compared with current level to choose an appropriate BIP level. At the same time, access frequency is used to choose a victim. In this way, our work can get less miss rate than LRU for workloads with large work set. For workloads with small working set, the miss rate of our design is close to LRU replacement policy. Simulation results in single core configuration with 1MB 16-way LLC show that EDIP reduces CPI over LRU and DIP by an average of 11.4% and 1.8% respectively. On quad-core configuration with 4MB 16-way LLC. TAEDIP improves the performance on the weighted speedup metric by 11.2% over LRU and 3.7% over TADIP on average. For fairness metric, TAEDIP improves the performance by 11.2% over LRU and 2.6% over TADIP on average.
- Published
- 2011
16. A Highly-Parallel TSP Solver for a GPU Computing Platform
- Author
-
Shigeyoshi Tsutsui and Noriyuki Fujimoto
- Subjects
CUDA ,Computer science ,business.industry ,Crossover ,Graphics processing unit ,Single-core ,Local search (optimization) ,Parallel computing ,General-purpose computing on graphics processing units ,business ,Metaheuristic ,Parallel metaheuristic ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
The traveling salesman problem (TSP) is probably the most widely studied combinatorial optimization problem and has become a standard testbed for new algorithmic ideas. Recently the use of a GPU (Graphics Processing Unit) to accelerate non-graphics computations has attracted much attention due to its high performance and low cost. This paper presents a novel method to solve TSP with a GPU based on the CUDA architecture. The proposed method highly parallelizes a serial metaheuristic algorithm which is a genetic algorithm with the OX (order crossover) operator and the 2-opt local search. The experiments with an NVIDIA GeForce GTX285 GPU and a single core of 3.0 GHz Intel Core2 Duo E6850 CPU show that our GPU implementation is about up to 24.2 times faster than the corresponding CPU implementation.
- Published
- 2011
17. Parallelizing the Weil and Tate Pairings
- Author
-
Edward Knapp, Diego F. Aranha, Alfred Menezes, and Francisco Rodríguez-Henríquez
- Subjects
Set (abstract data type) ,Speedup ,Pairing ,Embedding ,Single-core ,Parallel computing ,Residue number system ,Weil pairing ,Algorithm ,Supersingular elliptic curve ,Mathematics - Abstract
In the past year, the speed record for pairing implementations on desktop-class machines has been broken several times. The speed records for asymmetric pairings were set on a single processor. In this paper, we describe our parallel implementation of the optimal ate pairing over Barreto-Naehrig (BN) curves that is about 1.23 times faster using two cores of an Intel Core i5 or Core i7 machine, and 1.45 times faster using 4 cores of the Core i7 than the state-of-the-art implementation on a single core. We instantiate Hess's general Weil pairing construction and introduce a new optimal Weil pairing tailored for parallel execution. Our experimental results suggest that the new Weil pairing is 1.25 times faster than the optimal ate pairing on 8-core extensions of the aforementioned machines. Finally, we combine previous techniques for parallelizing the eta pairing on a supersingular elliptic curve with embedding degree 4, and achieve an estimated 1.24-fold speedup on an 8-core extension of an Intel Core i7 over the previous best technique.
- Published
- 2011
18. Power/Performance Exploration of Single-core and Multi-core Processor Approaches for Biomedical Signal Processing
- Author
-
Ahmed Yasir Dogan, Luca Benini, Igor Loi, David Atienza, Andreas Burg, A. Dogan, D. Atienza, A. Burg, I. Loi, and L. Benini
- Subjects
Multi-core processor ,Interconnection ,ECG ,business.industry ,Computer science ,biomedical signal processing ,Power/performance exploration ,multi-core processor ,system-level design ,wireless body sensor networks ,Microarchitecture ,embedded system ,Computer architecture ,Parallel processing (DSP implementation) ,WBSN ,Embedded system ,embedded systems ,Single-core ,Biosignal ,Crossbar switch ,business ,optimization ,Signal conditioning - Abstract
This study presents a single-core and a multi-core processor architecture for health monitoring systems where slow biosignal events and highly parallel computations exist. The single-core architecture is composed of a processing core (PC), an instruction memory (IM) and a data memory (DM), while the multi-core architecture consists of PCs, individual IMs for each core, a shared DM and an interconnection crossbar between the cores and the DM. These architectures are compared with respect to power vs performance trade-offs for a multi-lead electrocardiogram signal conditioning application exploiting near threshold computing. The results show that the multi-core solution consumes 66% less power for high computation requirements (50.1 MOps/s), whereas 10.4% more power for low computation needs (681 kOps/s).
- Published
- 2011
19. Fast Evaluation of GP Trees on GPGPU by Optimizing Hardware Scheduling
- Author
-
Pierre Collet, Ogier Maitre, and Nicolas Lachiche
- Subjects
Multi-core processor ,Speedup ,business.industry ,Computer science ,Genetic programming ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Genetic program ,Scheduling (computing) ,CUDA ,Single-core ,General-purpose computing on graphics processing units ,business ,Computer hardware ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
This paper shows that it is possible to use General Purpose Graphic Processing Unit cards for a fast evaluation of different Genetic Programming trees on as few as 32 fitness cases by using the hardware scheduling of NVIDIA cards. Depending on the function set, observed speedup ranges between ×50 and ×250 on one half of an NVidia GTX295 GPGPU card, vs a single core of an Intel Quad core Q8200.
- Published
- 2010
20. waLBerla: Optimization for Itanium-based Systems with Thousands of Processors
- Author
-
Ulrich Rüde, S. Donath, Jan Götz, Klaus Iglberger, and Christian Feichtinger
- Subjects
Multi-core processor ,Computer science ,Computation ,Vectorization (mathematics) ,Volume (computing) ,Lattice Boltzmann methods ,Itanium ,Single-core ,Parallel computing ,Compiler ,computer.software_genre ,computer - Abstract
Performance optimization is an issue at different levels, in particular for computing and communication intensive codes like free surface lattice Boltzmann. This method is used to simulate liquid-gas flow phenomena such as bubbly flows and foams. Due to a special treatment of the gas phase, an aggregation of bubble volume data is necessary in every time step. In order to accomplish efficient parallel scaling, the all-to-all communication schemes used up to now had to be replaced with more sophisticated patterns that work in a local vicinity. With this approach, scaling could be improved such that simulation runs on up to 9 152 processor cores are possible with more than 90% efficiency. Due to the computation of surface tension effects, this method is also computational intensive. Therefore, also optimization of single core performance plays a tremendous role. The characteristics of the Itanium processor require programming techniques that assist the compiler in efficient code vectorization, especially for complex C++ codes like the waLBerla framework. An approach using variable length arrays shows promising results.
- Published
- 2010
21. High-Speed Software Implementation of the Optimal Ate Pairing over Barreto–Naehrig Curves
- Author
-
Eiji Okamoto, Jean-Luc Beuchat, Shigeo Mitsunari, Tadanori Teruya, Jorge Enrique González-Díaz, and Francisco Rodríguez-Henríquez
- Subjects
Discrete mathematics ,Polynomial ,Exponentiation ,Pairing ,Tate pairing ,Single-core ,Multiplier (economics) ,Finite field arithmetic ,Prime (order theory) ,Mathematics - Abstract
This paper describes the design of a fast software library for the computation of the optimal ate pairing on a Barreto-Naehrig elliptic curve. Our library is able to compute the optimal ate pairing over a 254-bit prime field Fp, in just 2.33 million of clock cycles on a single core of an Intel Core i7 2.8GHz processor, which implies that the pairing computation takes 0.832msec. We are able to achieve this performance by a careful implementation of the base field arithmetic through the usage of the customary Montgomery multiplier for prime fields. The prime field is constructed via the Barreto-Naehrig polynomial parametrization of the prime p given as, p = 36t4 + 36t3 + 24t2 + 6t + 1, with t = 262 - 254 + 244. This selection of t allows us to obtain important savings for both the Miller loop as well as the final exponentiation steps of the optimal ate pairing.
- Published
- 2010
22. OpenMP Parallelization of a Mickens Time-Integration Scheme for a Mixed-Culture Biofilm Model and Its Performance on Multi-core and Multi-processor Computers
- Author
-
Hermann J. Eberl and Nasim Muhammad
- Subjects
Scheme (programming language) ,Multi-core processor ,Discretization ,Workstation ,Xeon ,Computer science ,Finite difference method ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,law.invention ,law ,Itanium ,Single-core ,computer ,Computer Science::Distributed, Parallel, and Cluster Computing ,computer.programming_language - Abstract
We document and compare the performance of an OpenMP parallelized simulation code for a mixed-culture biofilm model on a desktop workstation with two quad core Xeon processors, and on SGI Altix Systems with single core and dual core Itanium processors. The underlying model is a parabolic system of highly non-linear partial differential equations, which is discretized in time using a non-local Mickens scheme, and in space using a standard finite difference method.
- Published
- 2010
23. Highly Scalable Multiprocessing Algorithms for Preference-Based Database Retrieval
- Author
-
Wolf-Tilo Balke, Joachim Selke, and Christoph Lofi
- Subjects
Skyline ,Analysis of parallel algorithms ,Shared memory ,Computer science ,Distributed computing ,Scalability ,Parallel algorithm ,Single-core ,Multiprocessing ,Algorithm ,Domain (software engineering) - Abstract
Until recently algorithms continuously gained free performance improvements due to ever increasing processor speeds. Unfortunately, this development has reached its limit. Nowadays, new generations of CPUs focus on increasing the number of processing cores instead of simply increasing the performance of a single core. Thus, sequential algorithms will be excluded from future technological advances. Instead, highly scalable parallel algorithms are needed to fully tap new hardware potentials. In this paper we establish a design space for parallel algorithms in the domain of personalized database retrieval, taking skyline algorithms as a representative example. We will investigate the spectrum of base operations of different retrieval algorithms and various parallelization techniques to develop a set of highly scalable and high-performing skyline algorithms for different retrieval scenarios. Finally, we extensively evaluate these algorithms to showcase their superior characteristics.
- Published
- 2010
24. Research on Benefits Distribution Model for Maintenance Partnerships of the Single-Core MPN
- Author
-
Taofen Li, Yao Yao, and Shuili Yang
- Subjects
Operations research ,Computer science ,business.industry ,Time consistency ,Mass customization ,General partnership ,Key (cryptography) ,Production (economics) ,Single-core ,Modular design ,Cooperative game theory ,business ,Simulation - Abstract
The key of stable growth in Mass Customization modular production network (MPN) lies in the good cooperation partners within the network. In this paper, the dynamic cooperative game problem in the single-core MPN is described by dynamic cooperative game theory, and the partnership benefits distribution model and interest distribution model for ensuring the cooperation relationship time consistency are proposed, and also, the conclusion points out that the cooperative partnership maintenance can depend on the expect reward obtained only after the module manufacturer joining the network.
- Published
- 2010
25. Implementation and Evaluation of Fast Parallel Packet Filters on a Cell Processor
- Author
-
Masato Tsuru and Yoshiyuki Yamashita
- Subjects
Software pipelining ,Software ,Computer science ,business.industry ,Filter (video) ,Network packet ,Embedded system ,Network processor ,Single-core ,Program optimization ,business ,Virtual network - Abstract
Packet filters are essential for most areas of recent information network technologies. While high-end expensive routers and firewalls are implemented in hardware-based, flexible and cost-effective ones are usually in software-based solutions using general-purpose CPUs but have less performance. The authors have studied the methods of applying code optimization techniques to the packet filters executing on a single core processor. In this paper, by utilizing the multi-core processor Cell Broadband Engine with software pipelining, we construct a parallelized and SIMDed packet filter 40 times faster than the naive C program filter executed on a single core.
- Published
- 2010
26. A Garbage Collection Technique for Embedded Multithreaded Multicore Processors
- Author
-
Sascha Uhrig and Theo Ungerer
- Subjects
Multi-core processor ,Java ,Computer science ,business.industry ,Parallel computing ,Thread (computing) ,Software_PROGRAMMINGTECHNIQUES ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Software ,Threading (manufacturing) ,Single-core ,business ,computer ,Drawback ,Garbage collection ,computer.programming_language - Abstract
Multicore processors get more and more popular, even in embedded systems. Due to the deeply integrated threading concept, Java is a perfect choice to deal with the necessary thread-level parallelism required for the performance potential of a multicore. Accordingly, the software developers are familiar with the threading concept, which means that single core applications already fit very well on a multicore processor and are able to utilize its advantage. Nevertheless, a drawback of Java has to be mentioned: the required garbage collection. Especially in multicore environments the most often used stop-the-world collectors reach their limits because all cores have to be suspended at the time a single thread requires a garbage collection cycle. Hence, the performance of the other cores is harmed tremendously. In this paper we present a garbage collection technique that runs in parallel to the application threads within a multithreaded multicore without any stop-the-world behavior.
- Published
- 2009
27. A Fast Scheme to Investigate Thermal-Aware Scheduling Policy for Multicore Processors
- Author
-
Cha Narisu and Liqiang He
- Subjects
Multi-core processor ,Thermal aware ,Computer science ,Distributed computing ,Leverage (statistics) ,Single-core ,Workload ,Scheduling (computing) - Abstract
With more cores integrated into one single chip, the overall power consumption from the multiple concurrent running programs increases dramatically in a CMP processor which causes the thermal problem becomes more and more severer than the traditional superscalar processor. To leverage the thermal problem of a multicore processor, two kinds of orthogonal technique can be exploited. One is the commonly used Dynamic Thermal Management technique. The other is the thermal aware thread scheduling policy. For the latter one, some general ideas have been proposed by academic and industry researchers. The difficult to investigate the effectiveness of a thread scheduling policy is the huge search space coming from the different possible mapping combinations for a given multi-program workload. In this paper, we extend a simple thermal model originally used in a single core processor to a multicore environment and propose a fast scheme to search or compare the thermal effectiveness of different scheduling policies using the new model. The experiment results show that the proposed scheme can predict the thermal characteristics of the different scheduling policies with a reasonable accuracy and help researchers to fast investigate the performances of the policies without detailed time consuming simulations.
- Published
- 2009
28. The Impact of Resource Sharing Control on the Design of Multicore Processors
- Author
-
Chen Liu and Jean-Luc Gaudiot
- Subjects
Multi-core processor ,Computer science ,Multithreading ,Distributed computing ,Single-core ,Thread (computing) ,Simultaneous multithreading ,Temporal multithreading ,Microarchitecture ,Shared resource - Abstract
One major obstacle faced by designers when entering the multicore era is how to harness the massive computing power which these cores provide. Since Instructional-Level Parallelism (ILP) is inherently limited, one single thread is not capable of efficiently utilizing the resource of a single core. Hence, Simultaneous MultiThreading (SMT) microarchitecture can be introduced in an effort to achieve improved system resource utilization and a correspondingly higher instruction throughput through the exploitation of Thread-Level Parallelism (TLP) as well as ILP. However, when multiple threads execute concurrently in a single core, they automatically compete for system resources. Our research shows that, without control over the number of entries each thread can occupy in system resources like instruction fetch queue and/or reorder buffer, a scenario called "mutual-hindrance" execution takes place. Conversely, introducing active resource sharing control mechanisms causes the opposite situation ("mutual-benefit" execution), with a possible significant performance improvement and lower cache miss frequency. This demonstrates that active resource sharing control is essential for future multicore multithreading microprocessor design.
- Published
- 2009
29. A Modified Singular Point Detection Algorithm
- Author
-
Rabia Anwar, M. Usman Akram, Rabia Arshad, and Muhammad Munir
- Subjects
Matching (graph theory) ,Data_MISCELLANEOUS ,Singular point of a curve ,Image (mathematics) ,Identification (information) ,Fingerprint ,Feature (computer vision) ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,Point (geometry) ,Single-core ,Algorithm ,Computer Science::Cryptography and Security ,Mathematics - Abstract
Automatic Fingerprint Identification Systems (AFIS) are widely used for personal identification due to uniqueness of fingerprints. Fingerprint reference points are useful for fingerprint classification and even for fingerprint matching algorithms. In this paper, we present a modified algorithm for singular points detection (cores and deltas) with high accuracy. Optimally located cores and deltas are necessary for classification and matching of fingerprint images. The previous techniques detect only a single core point which is inefficient to classify an image. The basic feature of our technique is that it computes all the cores along with all the deltas present in a fingerprint image. The proposed algorithm is applied on FVC2002, and experimental results are compared with the previous techniques, which verify the accuracy of our algorithm.
- Published
- 2008
30. Using the Corridor Map Method for Path Planning for a Large Number of Characters
- Author
-
Mark H. Overmars, Roland Geraerts, Arno Kamphuis, and Ioannis Karamouzas
- Subjects
Current (mathematics) ,Crowds ,Theoretical computer science ,Computer science ,Path (graph theory) ,Single-core ,Motion planning ,Planner ,computer ,Computer animation ,Simulation ,Power (physics) ,computer.programming_language - Abstract
A central problem in games is planning high-quality paths for characters avoiding obstacles in the environment. Current games require a path planner that is fast (to ensure real-time interaction) and flexible (to avoid local hazards). In addition, a path needs to be natural, meaning that the path is smooth, short, keeps some clearance to obstacles, avoids other characters, etcetera . Game worlds are normally populated with a large number of characters. In this paper we show how the recently introduced Corridor Map Method can be extended and used to efficiently compute smooth motions for these characters. We will consider crowds in which the characters wander around, characters have goals, and characters behave as a coherent group. The approach is very fast. Even in environments with 5000 characters it uses only 40% of the processing power of a single core of a cpu . Also the resulting paths are indeed natural.
- Published
- 2008
31. The Effect of Core Number and Core Diversity on Power and Performance in Multicore Processors
- Author
-
A. Zolfaghari Jooya and Mohsen Soryani
- Subjects
Set (abstract data type) ,Core (game theory) ,Multi-core processor ,Computer science ,Power consumption ,Face (geometry) ,Bounded function ,Single-core ,Parallel computing ,Power (physics) - Abstract
Today, multi-core processors dominate server, desktop and notebook computer’s market. Such processors have been able to decrease power consumption and thermal challenges that designers face within single core processors. In order to improve multi-core processor’s performance, designers should choose the best set of cores based on power consumption and execution delay. In this paper, we study several architectures that are composed of a configurable number of cores. We use three cores with different levels of performance and power consumption. Then, we implement different configurations of a multi-core processor. In each configuration, which has a different set of cores, we run benchmarks with various numbers of simultaneous threads, from 1 up to 32. Power consumption and execution delay of each configuration has been measured. It has been shown that the best configuration is a heterogeneous multi-core processor that is composed of 16 cores in our bounded area. Then, we examined various ways that threads can be assigned to different cores in the best configuration. It is shown that for serial workloads the best choice is to use high performance cores, but in parallel workloads that consist of multiple threads, a mixture of cores with different performance levels gives the best performance.
- Published
- 2008
32. Manycores in the Future
- Author
-
Robert Schreiber
- Subjects
Multi-core processor ,Software ,Computer architecture ,Computer science ,Data parallelism ,business.industry ,Parallelism (grammar) ,Single-core ,Parallel computing ,business - Abstract
The change from single core to multicore processors is expected to continue, taking us to manycore chips (64 processors) and beyond. Cores are more numerous, but not faster. They also may be less reliable. Chip-level parallelism raises important questions about architecture, software, algorithms, and applications. I'll consider the directions in which the architecture may be headed, and look at the impact on parallel programming and scientific computing.
- Published
- 2007
33. Energy Harvesting in Synthetic Dendrimer Materials
- Author
-
Gemma D. D'Ambruoso and Dominic V. McGrath
- Subjects
Förster resonance energy transfer ,Chemistry ,Energy transfer ,Dendrimer ,OLED ,Single-core ,Nanotechnology ,Antenna effect ,Energy harvesting - Abstract
In the past two and a half decades, dendrimers have emerged as a distinct branch ofmacromolecular chemistry. Tailoring of dendrimer structure yields precise placement of chromophoresthat can serve as energy harvesters, mimicking photosynthesis. The unique architecture afforded bydendrimers allows for multiple energy harvesters that can transfer their energy to a single core,which is important for optoelectronic applications such as organic light emitting diodes (OLEDs).This review emphasizes the energy transfer characteristics that these dendrimers provide rather thentheir synthesis.
- Published
- 2007
34. Dynamic Repartitioning of Real-Time Schedule on a Multicore Processor for Energy Efficiency
- Author
-
Yongbon Koo, Joonwon Lee, and Euiseong Seo
- Subjects
Multi-core processor ,Schedule ,Computer science ,business.industry ,Processor scheduling ,Multiprocessing ,Parallel computing ,Energy consumption ,Scheduling (computing) ,Dynamic voltage scaling ,Embedded system ,Single-core ,business ,Real-time operating system ,Efficient energy use - Abstract
Multicore processors promise higher throughput at lower power consumption than single core processors. Thus in the near future they will be widely used in hard real-time systems as the performance requirements are increasing. Though DVS may reduce power consumption for hard real time applications on single core processors, it introduces a new implication for multicore systems since all the cores in a chip should run at the same performance. Blind adoption of existing DVS algorithms may result in waste of energy since a core which requires low performance should run at the same high frequency with other cores. Based on the existing partitioning algorithms for the multiprocessor hard real-time scheduling, this article presents dynamic task repartitioning algorithm that balances task loads among cores to avoid the phenomena dynamically during execution. Simulation results show that in general cases our scheme makes additional energy saving more than 10% than that without our scheme even when the schedules are generated by WFD partitioning algorithm which is known as the best energy efficient partitioning algorithm
- Published
- 2006
35. An Efficient Dynamic Switching Mechanism (DSM) for Hybrid Processor Architecture
- Author
-
Akanda Md. Musfiquzzaman, Masahiro Sowa, Ben A. Abderazek, and Sotaro Kawata
- Subjects
Circular buffer ,Multi-core processor ,Shared memory ,Computer science ,Virtual memory ,Single-core ,Parallel computing ,Operand ,Queue ,Execution model ,Microarchitecture - Abstract
Increasing the processor resources usability and boosting processor compatibility and capability to support multi-executions models in a single core are highly needed nowadays to benefit from the recent developments in electronics technology. This work introduces the concept of a dynamic switching mechanism (DSM), which supports multi-instruction set execution models in a single and simple processor core. This is achieved dynamically by execution mode–switching scheme and a sources–results locations computing unit for a novel queue execution model and a well-known stack based execution model. The queue execution model is based on queue computation that uses queue-registers, a circular queue data structure, for operands and results manipulations and assigns queue words according to a single assignment rule. We present the DSM mechanism and we describe its hardware complexity and preliminary evaluation results. We also describe the DSM target architecture.
- Published
- 2005
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.