474 results on '"Labarta, Jesús"'
Search Results
202. On the usefulness of object tracking techniques in performance analysis
- Author
-
Llort, Germán, primary, Servat, Harald, additional, González, Juan, additional, Giménez, Judit, additional, and Labarta, Jesús, additional
- Published
- 2013
- Full Text
- View/download PDF
203. Implementing OmpSs support for regions of data in architectures with multiple address spaces
- Author
-
Bueno, Javier, primary, Martorell, Xavier, additional, Badia, Rosa M., additional, Ayguadé, Eduard, additional, and Labarta, Jesús, additional
- Published
- 2013
- Full Text
- View/download PDF
204. Detailed and simultaneous power and performance analysis.
- Author
-
Servat, Harald, Llort, Germán, Giménez, Judit, and Labarta, Jesús
- Subjects
MICROPROCESSOR performance ,ENERGY consumption ,HIGH performance computing ,SOURCE code ,PARALLEL computers - Abstract
On the road to Exascale computing, both performance and power areas are meant to be tackled at different levels, from system to processor level. The processor itself is the main responsible for the serial node performance and also for the most of the energy consumed by the system. Thus, it is important to have tools to simultaneously analyze both performance and energy efficiency at processor level. Performance tools have allowed analysts to understand, and even improve, the performance of an application that runs in a system. With the advent of recent processor capabilities to measure its own power consumption, performance tools can increase their collection of metrics by adding those related to energy consumption and provide a correlation between the source code, its performance and its energy efficiency. In this paper, we present a performance tool that has been extended to gather such energy metrics. The results of this tool are passed to a mechanism called folding that produces detailed metrics and source code references by using coarse grain sampling. We have used the tool with multiple serial benchmarks as well as parallel applications to demonstrate its usefulness by locating hot spots in terms of performance and power drained. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
205. Scalability of tracing and visualization tools
- Author
-
Labarta, Jesús, Gimenez, Judit, Martinez, Eloy, Gonzalez, Pedro, Servat, Harald, Llort, Germán, Aguilar, Xavier, Labarta, Jesús, Gimenez, Judit, Martinez, Eloy, Gonzalez, Pedro, Servat, Harald, Llort, Germán, and Aguilar, Xavier
- Abstract
Extending the capability of performance tools to deal with the larger and larger machines being deployed is necessary in order to understand their actual behavior and identify how to achieve per- formance expectations in the frequent case these are not met at a first try. Trace based tools such as Paraver provide extremely powerful and flexible analysis capabilities to identify performance problems not detectable by profile based tools. Scaling up the usability of trace based tools requires new techniques in both the acquisition and visualization phases. The CEPBA-tools approach distributes the functionalities required to tackle large systems in three different levels. Different acquisition techniques are used in the instrumen- tation package to control the data captured and maximize the ratio of information to file size. An intermediate level set of tools are used to summarize the generated Paraver traces into smaller traces, with the same format, but where some of the information has been summarized. Examples of filter functionalities at this level include summarization of certain events in periodic software counters and selection of specific time intervals or events. At the final level, different rendering techniques have been introduced in Paraver to visualize traces of many processes while still being able to con- vey to the analyst the information relevant to identify problems at very coarse level as well as the capabilities to dig down to very detailed levels. The paper describes in detail the techniques being used along those lines in the CEPBA-tools environment in order to support the analysis of applications run on large systems., NQC
- Published
- 2005
206. Extracting the optimal sampling frequency of applications using spectral analysis
- Author
-
Casas, Marc, primary, Servat, Harald, additional, Badia, Rosa M., additional, and Labarta, Jesús, additional
- Published
- 2011
- Full Text
- View/download PDF
207. OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES
- Author
-
DURAN, ALEJANDRO, primary, AYGUADÉ, EDUARD, additional, BADIA, ROSA M., additional, LABARTA, JESÚS, additional, MARTINELL, LUIS, additional, MARTORELL, XAVIER, additional, and PLANAS, JUDIT, additional
- Published
- 2011
- Full Text
- View/download PDF
208. Poster
- Author
-
Bueno, Javier, primary, Duran, Alejandro, additional, Martorell, Xavier, additional, Ayguadé, Eduard, additional, Badia, Rosa M., additional, and Labarta, Jesús, additional
- Published
- 2011
- Full Text
- View/download PDF
209. Overlapping communication and computation by using a hybrid MPI/SMPSs approach
- Author
-
Marjanović, Vladimir, primary, Labarta, Jesús, additional, Ayguadé, Eduard, additional, and Valero, Mateo, additional
- Published
- 2010
- Full Text
- View/download PDF
210. Automatic Phase Detection and Structure Extraction of MPI Applications
- Author
-
Casas, Marc, primary, Badia, Rosa M., additional, and Labarta, Jesús, additional
- Published
- 2010
- Full Text
- View/download PDF
211. Effective communication and computation overlap with hybrid MPI/SMPSs
- Author
-
Marjanovic, Vladimir, primary, Labarta, Jesús, additional, Ayguadé, Eduard, additional, and Valero, Mateo, additional
- Published
- 2010
- Full Text
- View/download PDF
212. Programmability Issues
- Author
-
Chapman, Barbara, primary, Labarta, Jesús, additional, Sarkar, Vivek, additional, and Sato, Mitsuhisa, additional
- Published
- 2009
- Full Text
- View/download PDF
213. BSC Vision Towards Exascale
- Author
-
Labarta, Jesús, primary, Ayguadé, Eduard, additional, and Valero, Mateo, additional
- Published
- 2009
- Full Text
- View/download PDF
214. Parallelizing dense and banded linear algebra libraries using SMPSs
- Author
-
Badia, Rosa M., primary, Herrero, José R., additional, Labarta, Jesús, additional, Pérez, Josep M., additional, Quintana‐Ortí, Enrique S., additional, and Quintana‐Ortí, Gregorio, additional
- Published
- 2009
- Full Text
- View/download PDF
215. Graph-Based Task Replication for Workflow Applications
- Author
-
Sirvent, Raül, primary, Badia, Rosa M., additional, and Labarta, Jesús, additional
- Published
- 2009
- Full Text
- View/download PDF
216. Automatic analysis of speedup of MPI applications
- Author
-
Casas, Marc, primary, Badia, Rosa, additional, and Labarta, Jesús, additional
- Published
- 2008
- Full Text
- View/download PDF
217. Automatic Grid workflow based on imperative programming languages
- Author
-
Sirvent, Raül, primary, Pérez, Josep M., additional, Badia, Rosa M., additional, and Labarta, Jesús, additional
- Published
- 2006
- Full Text
- View/download PDF
218. Another approach to backfilled jobs
- Author
-
Utrera, Gladys, primary, Corbalán, Julita, additional, and Labarta, Jesús, additional
- Published
- 2005
- Full Text
- View/download PDF
219. Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules
- Author
-
Nikolopoulos, Dimitrios S., primary, Artiaga, Ernest, additional, Ayguadé, Eduard, additional, and Labarta, Jesús, additional
- Published
- 2003
- Full Text
- View/download PDF
220. Program Phase Detection Based Dynamic Control Mechanisms for Pipeline Stage Unification Adoption.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Yao, Jun, and Shimada, Hajime
- Abstract
To reduce the power consumption in mobile processors, a method called Pipeline Stage Unification (PSU) is previously proposed to work as an alternative for Dynamic Voltage Scaling (DVS). Based on PSU, we proposed two mechanisms which dynamically predict a suitable unification degree according to the knowledge of the program behaviors. Our results show that the mechanisms can achieve an average Energy Delay Product (EDP) decrease of 15.1% and 19.2%, respectively, for SPECint2000 benchmarks, compared to the processor without PSU. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
221. Performance Evaluation of Compiler Controlled Power Saving Scheme.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Shirako, Jun, and Yoshida, Munehiro
- Abstract
Multicore processors, or chip multiprocessors, which allow us to realize low power consumption, high effective performance, good cost performance and short hardware/software development period, are attracting much attention. In order to achieve full potential of multicore processors, cooperation with a parallelizing compiler is very important. The latest compiler extracts multilevel parallelism, such as coarse grain task parallelism, loop parallelism and near fine grain parallelism, to keep parallel execution efficiency high. It also controls voltage and clock frequency of processors carefully to reduce energy consumption during execution of an application program. This paper evaluates performance of compiler controlled power saving scheme which has been implemented in OSCAR multigrain parallelizing compiler. The developed power saving scheme realizes voltage/frequency control and power shutdown of each processor core during coarse grain task parallel processing. In performance evaluation, when static power is assumed as one-tenth of dynamic power, OSCAR compiler with the power saving scheme achieved 61.2 percent energy reduction for SPEC CFP95 applu without performance degradation on 4 processors and 87.4 percent energy reduction for mpeg2encode, 88.1 percent energy reduction for SPEC CFP95 tomcatv and 84.6 percent energy reduction for applu with real-time deadline constraint on 4 processors. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
222. Reducing Energy in Instruction Caches by Using Multiple Line Buffers with Prediction.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Ali, Kashif, and Aboelaze, Mokhtar
- Abstract
Energy consumption plays a crucial role in the design of embedded processors especially for portable devices. Since memory access consumes a significant portion of the energy of a processor, the design of fast low-energy caches has become a very important aspect of modern processor design. In this paper, we present a novel cache architecture for reduced energy instruction caches. Our proposed cache architecture consists of the L1 cache, multiple line buffers, and a prediction mechanism to predict which line buffer, or L1 cache to access next. We used simulation to evaluate our proposed architecture and compare it with the HotSpot cache, Filter cache, Predictive line buffer cache and Way-Halting cache. Simulation results show that our approach can reduce instruction cache energy consumption, on average, by 75% without sacrificing performance [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
223. Empirical Study for Optimization of Power-Performance with On-Chip Memory.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Takahashi, Chikafumi, and Sato, Mitsuhisa
- Abstract
Power-performance (performance per uniform power consumption) recently has become a more important factor in modern high-performance microprocessors. In processor design, it is a well-known that off-chip memory access has a large impact on both performance and power consumption. On-chip memory is one solution for this problem, so that many processors such as the Renesas SH-4 and some ARM architecture type processors adopt on-chip memory, which resides on the same layer as the cache memory. In this study, the effectiveness of the on-chip memory in an SH-4 processor was quantitatively examined by directly measuring the real power of the processor. For these experiments, we proposed a method that made use of the on-chip memory for power reduction. The experimental results show that the optimization of data transfer using on-chip memory reduces EDP(energy delay product) by up to 15.2%. As an extension of on-chip memory, we have proposed an on-chip RAM architecture called SCIMA (software controllable integrated memory architecture) which enables DMA (direct memory access) transfer to the on-chip memory. According to the empirical data from the SH-4 processor, it was found that the additional DMA transfer using SCIMA reduces EDP by up to 26.3%. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
224. Computations of Global Seismic Wave Propagation in Three Dimensional Earth Model.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Tsuboi, Seiji, and Komatitsch, Dimitri
- Abstract
We use a Spectral-Element Method implemented on the Earth Simulator in Japan to simulate broadband seismic waves generated by various earthquakes. The spectral-element method is based on a weak formulation of the equations of motion and has both the flexibility of a finite-element method and the accuracy of a pseudospectral method. The method has been developed on a large PC cluster and optimized on the Earth Simulator. We perform numerical simulation of seismic wave propagation for a three-dimensional Earth model, which incorporates 3D variations in compressional wave velocity, shear-wave velocity and density, attenuation, anisotropy, ellipticity, topography and bathymetry, and crustal thickness. The simulations are performed on 4056 processors, which require 507 out of 640 nodes of the Earth Simulator. We use a mesh with 206 million spectral-elements, for a total of 13.8 billion global integration grid points (i.e., almost 37 billion degrees of freedom). We show examples of simulations for several large earthquakes and discuss future applications in seismological studies. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
225. Realization of a Computer Simulation Environment Based on ITBL and a Large Scale GW Calculation Performed on This Platform.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Kawazoe, Yoshiyuki, and Sluiter, Marcel
- Abstract
An extraordinarily large GRID environment has been established over Japan by using SuperSINET based on ITBL connecting 4 supercomputer facilities. This new supercomputing environment has been used for a large scale numerical simulations using original ab initio code TOMBO and several remarkable results have already been obtained to proof that this newly built computer environment is actually useful to accelerate the speed of designing and developing advanced functional materials expected to be used in nanotechnology. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
226. Energy-Efficient Embedded System Design at 90nm and Below - A System-Level Perspective -.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, and Ishihara, Tohru
- Abstract
Energy consumption is a fundamental barrier in taking full advantage of today and future semiconductor manufacturing technologies. This paper presents our recent research activities and results on estimating and reducing energy consumption in nanometer technology system LSIs. This includes techniques and tools for (i) estimating instantaneous energy consumption of embedded processors during an application execution, and (ii) reducing leakage energy in instruction cache memories by taking advantage of value-dependence of SRAM leakage due to within-die Vth variation. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
227. Lattice QCD Simulations as an HPC Challenge.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, and Nakamura, Atsushi
- Abstract
We overview the present status of lattice QCD (Quantum Chromodynamics) simulations. Although it is still far from the final goal, the lattice QCD is reaching a level to have a predictive power as a first principle study for the strongly interacting elementary particles, hadron. This is due to many improvements of techniques and rapid development of computational power. We then look into the hot spot of the calculation explicitly. Finally we discuss what kind of achievement can be expected by using Peta-flop computers. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
228. Numerical Simulation of Combustion Dynamics at ISTA/JAXA.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Shinjo, Junji, and Matsuyama, Shingo
- Abstract
This paper briefly reviews recent numerical combustion simulation results at ISTA/JAXA obtained by DNS and LES approaches, and shows some topics towards future combustion research. We have successfully simulated detailed structures of a hydrogen jet lifted flame by DNS and an unsteady combustor flow field in a gas turbine combustor by LES. In these simulations, numerical simulation has been proved very effective, but its applicability is still limited due to long computational time. It is expected that future progress in computer performance will make this kind of simulation more realistic and useful in combustion research and development. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
229. Spacecraft Plasma Environment Analysis Via Large Scale 3D Plasma Particle Simulation.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Okada, Masaki, and Usui, Hideyuki
- Abstract
Geospace environment simulator (GES) has started as one of the advanced computing research projects at the Earth Simulator Center in Japan Marine Science and Technology Center since 2002: [1]. By using this computing resource, a large scale simulation which reproduces a realistic physical model can be utilized not only for studying the geospace environment but also for various human activities in space. GES project aims to reproduce fully kinetic environment around a spacecraft by using the 3-dimensional full-particle electromagnetic simulation code which could include spacecraft model inside (NuSPACE). NuSPACE can model interaction between space plasma and a spacecraft by the unstructured-grid 3D plasma particle simulation code embedded in the NuSPACE. We will report current status of the project and our concept of achieving the spacecraft environment in conjunction with the space weather. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
230. PetaFLOPS Computing and Computational Nanotechnology on Industrial Issues.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Ohnishi, Shuhei, and Itoh, Satoshi
- Abstract
TA prospect of new development by PetaFLOPS computing is discussed for the industrial research and application in the field of materials sciences based on the TeraFLOPS computing in the Earth Simulator. Two examples of simulations for nano-scale materials are presented for metal clusters by the first principles calculation and water droplets by the classical molecular dynamics. A new possibility by PetaFLOPS computing is proposed in terms of a real-time simulator. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
231. 16.14 TFLOPS Eigenvalue Solver on the Earth Simulator: Exact Diagonalization for Ultra Largescale Hamiltonian Matrix.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Yamada, Susumu, and Imamura, Toshiyuki
- Abstract
The Lanczos method has been conventionally utilized as an eigenvalue solver for huge size matrices encountered in strongly correlated fermion systems. However, since one can not obtain the residual during the Lanczos iteration, the iteration count in the Lanczos method is not controllable. Thus, we adopt a new eigenvalue solver based on the conjugate gradient (CG) method in which the residual can be evaluated every iteration step. We confirm that the CG method with an preconditioner shows much more excellent performance than the Lanczos method. We achieve 16.14 TFLOPS on 512 nodes (4096 processors) of the Earth Simulator by the use of the CG method. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
232. Sampling of Protein Conformations with Computers to Predict the Native Structure.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, and Higo, Junichi
- Abstract
Native-structure prediction of proteins only from the amino-acid sequential information, without using information from a sequence-structure database of proteins, has not yet been succeeded. Computer simulation is now popular in protein conformational sampling for the prediction. The sampling is, however, hopelessly difficult when a conventional simulation technique (canonical molecular dynamics simulation) is used, because the conformation is frequently trapped in energy minima in the conformational space. This trapping makes the sampling efficiency considerable poor. I explain an efficient conformational sampling algorithm, multicanonical molecular dynamics simulation, recently developed. Results on the sampling of polypeptide chains showed that the conformation easily overcomes the energy barriers between the energy minima with using this method. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
233. Development of Electromagnetic Particle Simulation Code in an Open System.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Ohtani, Hiroaki, and Ishiguro, Seiji
- Abstract
In an electromagnetic particle simulation for magnetic reconnection in an open system, which has a free boundary condition, particles go out and come into the system through the boundary and the number of particles depends on time. Besides, particles are locally attracted due to physical condition. Accordingly, it is hard to realize an adequate load balance with domain decomposition. Furthermore, a vector performance does not become efficient without a large memory size due to a recurrence of array access. In this paper, we parallelise the code with High Performance Fortran. For data layout, all field data are duplicated on each parallel process, but particle data are distributed among them. We invent an algorithm for the open boundary of particles, in which an operation for outgoing and incoming particles is performed in each processor, and the only reduction operation for the number of particles is executed in data transfer. This adequate treatment makes the amount and frequency of data transfer small, and the load balance among processes relevant. Furthermore, a compiler-directive listvec in the gather process dramatically decreases the memory size and improves the vector performance. Vector operation ratio becomes about 99.5% and vector length turns 240 and over. It becomes possible to perform the simulation with 800 million particles in 512×128×64 meshes. We succeed in opening a path for a large-scale simulation. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
234. Development of Three-Dimensional Neoclassical Transport Simulation Code with High Performance Fortran on a Vector-Parallel Computer.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Satake, Shinsuke, and Okamoto, Masao
- Abstract
A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
235. Pipelined Parallelization in HPF Programs on the Earth Simulator.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Murai, Hitoshi, and Okabe, Yasuo
- Abstract
There is no explicit way for parallelization of DOACROSS loops in the HPF specifications. Although recent advanced HPF compilers such as HPF/ES have been as powerful as MPI in many situations of parallel programming, many of them do not have the capability of pipelining DOACROSS loops. We propose a new extension for pipelined parallelization, the PIPELINE clause, and have developed a preprocessor, named HPFX, that translates an HPF source program annotated by the PIPELINE clause into a normal HPF one, to evaluate the effectiveness of the clause. Evaluation on the Earth Simulator shows that pipelined parallelization in implementations of the NPB LU benchmark with HPFX and HPF/ES outperforms the hyperplane parallelization in the conventional HPF implementations of the benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
236. Distributed Parallelization of Exact Charge Conservative Particle Simulation Code by High Performance Fortran.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Hasegawa, Hiroki, and Ishiguro, Seiji
- Abstract
A three-dimensional, relativistic, electromagnetic particle simulation code is parallelized in distributed memories by High Performance Fortran (HPF). In this code, the " Exact Charge Conservation Scheme" is used as a method for calculating current densities. In this paper, some techniques to optimize this code for a vector-parallel supercomputer are presented. In particular, methods for parallelization and vectorization are discussed. Examination of the code is also made on multi-node jobs. The results of test runs show high efficiency of the code. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
237. Mapping Normalization Technique on the HPF Compiler fhpf.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Iwashita, Hidetoshi, and Aoki, Masaki
- Abstract
We propose a technique of mapping normalization which reduces the variety of data and computational mapping representation of HPF into a certain standard form. The base of the reduction is a set of equivalent transformations of an HPF program, using composition of alignment and affine transformation of data and loop indices. The mapping normalization technique was implemented in the HPF compiler fhpf, and made the succeeding processes, such as local access detection and SPMD conversion, much slimmer. The measurement result shows that performance of the MPI code generated by the fhpf compiler is fairly comparable to the one written by a skillful MPI programmer. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
238. A Similarity Evaluation Method for Volume Data Sets by Using Critical Point Graph.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Minami, Tomoki, and Sakai, Koji
- Abstract
The ever increasing use of computer simulation has proportionately increased the demands for an efficient method for classification of a large amount of computational results or for searching an arbitrary data set in a given database. In order to classify or to search for a computational simulation result, it is necessary to evaluate the similarity between a given data in respect to the reference data in a database. A similarity estimation method which employs "Critical Point Graph (CPG)" as an index has proven effective, however this method does not support transformation operations such as rotation or scaling. In this paper, we propose a CPG-based similarity estimation method supporting both rotation and scaling transformations for two and three dimensional scalar data sets (volume data sets). We could confirm its effectiveness, and also proved superior to the traditional Contour Tree (CT) based matching technique which uses affine-invariant metrics. Some discussion about the proper use of these matching techniques is also presented to clarify the advantages and disadvantages. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
239. Development of an Interactive Visual Data Mining System for Atmospheric Science.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Sato, Toshinori, Watanabe, Chiemi, Touma, Eriko, and Yamauchi, Kazuko
- Abstract
In atmospheric science, 3D visualization techniques have been mainly used to create impressive presentation in recent decades. However, from the viewpoint of utilize for visual data mining, 3D visualization methodology has difficulties in becoming wide spread because most conventional and established way is to make 2D diagrams consisting of two dimensions of a temporal transitional 3D grid. From these observations, we have been developing a quick look tool of atmospheric science data for 3d visual data mining. We expect that scientists can utilize this tool for finding out 2D diagrams from the data by using various 2D or 3D visualization methods, and become accustomed themselves to 3D visualization methods. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
240. Hybrid Parallelization and Flat Parallelization in HPF (High Performance Fortran).
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Hayashi, Yasuharu, and Suehiro, Kenji
- Abstract
We have developed the HPF (High Performance Fortran) compiler HPF/SX V2 as an interface for distributed memory parallel programming. HPF is a de facto standard language for parallel programs. It is possible to write parallel programs just by inserting comment directives into existing serial Fortran programs in HPF. This paper treats two parallelization methods in the HPF/SX V2 on an SMP (Symmetric Multiprocessor) cluster system, each node of which is built by connecting multiple vector PEs (Processor Elements) with a shared memory. The one is hybrid parallelization, which consists of vectorization on a PE, multi-thread parallelization within a node, and distributed memory parallelization across nodes. The other is flat parallelization, which consists of vectorization and distributed memory parallelization. We compare hybrid parallelization with flat parallelization by evaluating several typical codes. The result shows that hybrid parallelization is particularly beneficial, when reduction of memory is expected. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
241. A Calculus Effectively Performing Event Formation with Visualization.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Yamasaki, Susumu, and Sasakura, Mariko
- Abstract
As a programming technique, we formulate a calculus of illustrating event formation which is effectively performed. An event is visualized as a sequence of abstract charts denoting processes. The calculus contains a set of charts related to basic processes, a set of situations, a semantic function assigning a situation transition to each chart, a logic program with negation-as-failure, and the integrity constraint on the set of situations. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
242. $\mathit{\mathcal{N}etfiles}$: An Enhanced Stream-Based Communication Mechanism.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Chan, Philip, and Abramson, David
- Abstract
$\mathit{\mathcal{N}etfiles}$ is an alternative API for message passing on distributed memory machines. Based on the communication stream model, $\mathit{\mathcal{N}etfiles}$ provides enhanced functionality such as broadcasts and gather operations. $\mathit{\mathcal{N}etfiles}$ overload conventional file I/O primitives enabling parallel programs to be developed and tested on a file system before execution on a parallel machine. $\mathit{\mathcal{N}etfiles}$ is part of a parallel programming system called FAbrIC. This paper also presents the design and implementation of the FAbrIC architecture and demonstrate the effectiveness of this approach by means of two parallel applications: a parallel shallow water model application and parallel Jacobi method. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
243. Photo-Realistic Visualization for the Blast Wave of TNT Explosion by Grid-Based Rendering.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Kato, Kaori, and Aoki, Takayuki
- Abstract
After the detonation of a solid high explosive, the material has extremely high pressure keeping the solid density and expands rapidly driving strong shock wave. In order to investigate the blast wave propagation driven by the 32-kg TNT explosion of the underground magazine a three-dimensional simulation is performed with a stable and accurate numerical scheme without a special modeling for the expansion process of detonation product gas. The compressible fluid equations are solved by a fractional step procedure which consists of the advection phase and non-advection phase. The former employs the Rational function CIP scheme in order to preserve monotone signals and the latter is solved by IDO (Interpolated Differential Operator) scheme for achieving the accurate calculation. For this simulation results, photo-realistic visualization is achieved with combination of volume rendering with isosurface rendering on grid computer. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
244. Reconfigurable Middleware for Grid Environment.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Sungju Kwon, and Jaeyoung Choi
- Abstract
A component in application is a functional unit with well-defined interfaces. It encapsulates its internal states and provides services to other components or applications. By modularizing required functions into components, a component-based system can easily reuse those components and provide a flexible application structure with dynamic reconfiguration. In this paper, we propose a component-based middleware, called MAGE, which uses a service-oriented interface to provide transparency of platform, implementation language, and location. MAGE can dynamically reconfigure its architecture to adapt to Grid environments. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
245. Performance of Coupled Parallel Finite Element Analysis in Grid Computing Environment.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Niho, Tomoya, and Horie, Tomoyoshi
- Abstract
Since coupled problem should be solved for multiphenomena, large computational resources are needed for the large scale coupled analysis. In this paper, we propose a coupled parallel finite element analysis method using wide-area distributed computational resources on the Internet. In order for PC clusters located in different places to carry out a coupled parallel finite element analysis, a PC cluster receives the data needed for the coupled analysis from the other through the Internet. To perform the computing efficiently, processes for coupled parallel analysis are allocated based on the estimation of the coupled parallel analysis time taking account of available computer resources and network performance. Parallel finite element analysis of electromagnetic and structural coupled problem was carried out using two PC clusters to discuss the validity of this analysis method and computing environment. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
246. Performance-Based Loop Scheduling on Grid Environments.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Wen-Chung Shih, and Chao-Tung Yang
- Abstract
Loop scheduling and load balancing on parallel and distributed systems are critical problems, but it is difficult to cope with these ones, especially on the emerging grid environments. Previous researchers proposed some useful self-scheduling schemes, which were applicable to PC-based cluster and grid computing environments. In this paper, we generalized this concept and proposed a general approach, named PLS (Performance-Based Loop Scheduling). To verify our approach, a grid platform was built, and two application programs, matrix multiplication and Mandelbrot, were implemented with MPI to be executed in this testbed. Experimental results showed that our approach was efficient and robust, in terms of the range of α value. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
247. Computationally Efficient Parallel Matrix-Matrix Multiplication on the Torus.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Zekri, Ahmed S., and Sedukhin, Stanislav G.
- Abstract
In this paper, we represent the computation space of the (n×n)-matrix multiplication problem C=C+A·B as a 3D torus. All possible time-minimal scheduling vectors needed to activate the computations inside the corresponding 3D index points at each step of computing are determined. Using the projection method to allocate the scheduled computations to the processing elements, the resulting array processor that minimizes the computing time is a 2D torus with n×n processing elements. For each optimal time scheduling function, three optimal array allocations are obtained from projection. All the resulting allocations of all the optimal scheduling vectors can be classified into three groups. In one group, matrix C remains and both matrices A and B are shifted between neighbor processors. The well-known Cannon's algorithm belongs to this group. In another group, matrix A remains and both matrices B and C are shifted. In the third group, matrix B remains while both matrices A and C are shifted. The obtained array processor allocations need n compute-shift steps to multiply n×n dense matrices. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
248. Implementation and Evaluation of the Mechanisms for Low Latency Communication on DIMMnet-2.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Miyabe, Yasuo, and Kitamura, Akira
- Abstract
DIMMnet-2 is a network interface for PC cluster, plugged into a DIMM slot. Connecting network interface into commonly used memory bus reduces the cost of building PC cluster compared with using expensive machines with recent high performance I/O bus like PCIX. Moreover, low latency communication from the host CPU can be achieved. In this paper, implementation of the mechanisms for low latency communication on the DIMMnet-2 prototype board by making the best use of the memory slot is shown. Its latency for 4 Bytes data transfer is only 1.4 μs which is lower than those of InfiniBand and QsNET II on condition those host processes are Intel Xeon. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
249. A New Dynamic Load Balancing Technique for Parallel Modified PrefixSpan with Distributed Worker Paradigm and Its Performance Evaluation.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Takaki, Makoto, and Tamura, Keiichi
- Abstract
In order to extract the frequent patterns that can become motif at high speed from amino acid sequences, we are developing the parallel Modified PrefixSpan with the distributed worker paradigm. This paper presents a new dynamic load balancing technique for the parallel Modified PrefixSpan with the distributed worker paradigm and its performance evaluation. The characteristics of the dynamic load balancing are the small-grain task and the Cache-based Random Steal schema. This paper explains these characteristics and presents performance evaluations with the PC cluster of 100 nodes. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
250. The Bandwidth Expansion Effectiveness of Cache Levels Block Prefetch.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Labarta, Jesús, Joe, Kazuki, Sato, Toshinori, Youngkwan Ju, and Bongyong Uh
- Abstract
Most cache architectures exploit only a second level cache prefetch. In this paper, we propose the hierarchical prefetch cache architecture which allows prefetch between all levels of caches. We discovered that this architecture has a virtual effect of expanding memory bus bandwidth. According to an experimental analysis using 10 benchmark programs, the proposed architecture that employs all level cache prefetcher obtained a maximum 11% increased performance when compared to both architecture with expanded bus bandwidth and architecture with employment only a level 2 cache prefetcher. This shows our proposed architecture has an effectiveness of memory-bus bandwidth expansion. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.