Descriptor: "Shared memory" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shared memory"' showing total 13,616 results

Start Over Descriptor "Shared memory"

13,616 results on '"Shared memory"'

101. Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs

Author: Mungiello, Innocenzo, De Rosa, Francesco, Xhafa, Fatos, Series editor, Barolli, Leonard, editor, and Amato, Flora, editor
Published: 2017
Full Text: View/download PDF

102. Shared Memory Abstraction

Author: Yan, Da, Tian, Yuanyuan, Cheng, James, Zdonik, Stan, Series editor, Shekhar, Shashi, Series editor, Wu, Xindong, Series editor, Jain, Lakhmi C., Series editor, Padua, David, Series editor, Shen, Xuemin Sherman, Series editor, Furht, Borko, Series editor, Subrahmanian, V.S., Series editor, Hebert, Martial, Series editor, Ikeuchi, Katsushi, Series editor, Siciliano, Bruno, Series editor, Jajodia, Sushil, Series editor, Lee, Newton, Series editor, Yan, Da, Tian, Yuanyuan, and Cheng, James
Published: 2017
Full Text: View/download PDF

103. A Distributed Version of Syrup

Author: Audemard, Gilles, Lagniez, Jean-Marie, Szczepanski, Nicolas, Tabary, Sébastien, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gaspers, Serge, editor, and Walsh, Toby, editor
Published: 2017
Full Text: View/download PDF

104. An Observational Approach to Defining Linearizability on Weak Memory Models

Author: Derrick, John, Smith, Graeme, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Bouajjani, Ahmed, editor, and Silva, Alexandra, editor
Published: 2017
Full Text: View/download PDF

105. Connected Morphological Attribute Filters on Distributed Memory Parallel Machines

Author: Kazemier, Jan J., Ouzounis, Georgios K., Wilkinson, Michael H. F., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Angulo, Jesús, editor, Velasco-Forero, Santiago, editor, and Meyer, Fernand, editor
Published: 2017
Full Text: View/download PDF

106. SwingDB: An Embedded In-memory DBMS Enabling Instant Snapshot Sharing

Author: Meng, Qingzhong, Zhou, Xuan, Chen, Shiping, Wang, Shan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Blanas, Spyros, editor, Bordawekar, Rajesh, editor, Lahiri, Tirthankar, editor, Levandoski, Justin, editor, and Pavlo, Andrew, editor
Published: 2017
Full Text: View/download PDF

107. Reduced Complexity Many-Core: Timing Predictability Due to Message-Passing

Author: Mische, Jörg, Frieb, Martin, Stegmeier, Alexander, Ungerer, Theo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Knoop, Jens, editor, Karl, Wolfgang, editor, Schulz, Martin, editor, Inoue, Koji, editor, and Pionteck, Thilo, editor
Published: 2017
Full Text: View/download PDF

108. Energy Avoiding Matrix Multiply

Author: Livingston, Kelly, Landwehr, Aaron, Monsalve, José, Zuckerman, Stéphane, Meister, Benoît, Gao, Guang R., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ding, Chen, editor, Criswell, John, editor, and Wu, Peng, editor
Published: 2017
Full Text: View/download PDF

109. Embedded Real-Time Operating Systems

Author: Wang, K. C. and Wang, K.C.
Published: 2017
Full Text: View/download PDF

110. A TIGHT SPACE BOUND FOR CONSENSUS.

Author: LEQI ZHU
Subjects: *DISTRIBUTED computing
Abstract: In the consensus problem, there are n processes that each has a private input value. Each nonfaulty process must output a single value such that no two processes output different values and the output is the input value of some process. There are many consensus protocols for systems where the processes may only communicate by reading and writing to shared registers. Of particular interest are protocols that have progress guarantees such as randomized wait-freedom or obstructionfreedom. In 1992, it was proved that such protocols must use Ω(√n) registers. In 2015, this was improved to Ω(n) registers in the anonymous setting, where processes do not have identifiers. We prove that every randomized wait-free or obstruction-free protocol for solving consensus among n processes must use at least n - 1 registers. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

111. Minimizing energy on homogeneous processors with shared memory.

Author: Chau, Vincent, Fong, Chi Kit Ken, Liu, Shengxin, Wang, Elaine Yinling, and Zhang, Yong
Subjects: *COMPUTER engineering, *APPROXIMATION algorithms, *COMPUTER systems, *MEMORY, *SERVER farms (Computer network management)
Abstract: • Scheduling problem on m identical speed scalable processors with shared memory is considered. • A constant approximation algorithm is given. • An optimal algorithm is given when assignment of jobs is fixed. Energy efficiency is a crucial desideratum in the design of computer systems, from small-sized mobile devices with limited battery to large scale data centers. In such computing systems, processors and memory are considered as two major power consumers among all the system components. One recent trend to reduce power consumption is using shared memory in multi-core systems, such architecture has become ubiquitous nowadays. However, implementing the energy-efficient methods to the multi-core processor and the shared memory separately is not trivial. In this work, we consider the energy-efficient task scheduling problem, which coordinates the power consumption of both the multi-core processor and the shared memory, especially focus on the general situation in which the number of tasks is more than the number of cores. We devise an approximation algorithm with guaranteed performance in the multiple cores system. We tackle the problem by first presenting an optimal algorithm when the assignment of tasks to cores is given. Then we propose an approximation assignment for the general task scheduling. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

112. Tight Bounds for Asynchronous Renaming.

Author: ALISTARH, DAN, ASPNES, JAMES, CENSOR-HILLEL, KEREN, GILBERT, SETH, and GUERRAOUI, RACHID
Subjects: DISTRIBUTED computing, COMPUTATIONAL complexity, NAMESPACE (Computer science), DETERMINISTIC algorithms, LOGARITHMIC functions
Abstract: This article presents the first tight bounds on the time complexity of shared-memory renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct identifiers from a small namespace. We first prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. The bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic concurrent fetch-and-increment counters, queues, and stacks. The proof is based on a new reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement this individual bound with a global lower bound of Ω(klog(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c ≥ 1. This result applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter implementations, that are tight within logarithmic factors. On the algorithmic side, we give a protocol that transforms any sorting network into a randomized strong adaptive renaming algorithm, with expected cost equal to the depth of the sorting network. This gives a tight adaptive renaming algorithm with expected step complexity O(log k), where k is the contention in the current execution. This algorithm is the first to achieve sublinear time, and it is time-optimal as per our randomized lower bound. Finally, we use this renaming protocol to build monotone-consistent counters with logarithmic step complexity and linearizable fetch-and-increment registers with polylogarithmic cost. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

113. The Space Complexity of Long-Lived and One-Shot Timestamp Implementations.

Author: HELMI, MARYAM, HIGHAM, LISA, PACHECO, EDUARDO, and WOELFEL, PHILIPP
Subjects: TIMESTAMPS, DATA loggers, UNIFORM Resource Identifiers, TIMEKEEPING, ALGORITHMS
Abstract: This article is concerned with the problem of implementing an unbounded timestamp object from multiwriter atomic registers, in an asynchronous distributed system of n processes with distinct identifiers where timestamps are taken from an arbitrary universe. Ellen et al. [2008] showed that √n/2 - O(1) registers are required for any obstruction-free implementation of long-lived timestamp systems from atomic registers (meaning processes can repeatedly get timestamps). We improve this existing lower bound in two ways. First we establish a lower bound of n/6 - 1 registers for the obstruction-free long-lived timestamp problem. Previous such linear lower bounds were only known for constrained versions of the timestamp problem. This bound is asymptotically tight; Ellen et al. [2008] constructed a wait-free algorithm that uses n - 1 registers. Second we show that √2n-log n-O(1) registers are required for any obstruction-free implementation of one-shot timestamp systems (meaning each process can get a timestamp at most once). We show that this bound is also asymptotically tight by providing a wait-free one-shot timestamp system that uses at most [2√n] registers, thus establishing a space complexity gap between one-shot and long-lived timestamp systems. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

114. Harvesting the Aggregate Computing Power of Commodity Computers for Supercomputing Applications

Author: Dereje Regassa, Heonyoung Yeom, and Yongseok Son
Subjects: HPC, shared memory, optimization, commodity hardware, big data, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Distributed supercomputing is becoming common in different companies and academia. Most of the parallel computing researchers focused on harnessing the power of commodity processors and even internet computers to aggregate their computation powers to solve computationally complex problems. Using flexible commodity cluster computers for supercomputing workloads over a dedicated supercomputer and expensive high-performance computing (HPC) infrastructure is cost-effective. Its scalable nature can make it better employed to the available organizational resources, which can benefit researchers who aim to conduct numerous repetitive calculations on small to large volumes of data to obtain valid results in a reasonable time. In this paper, we design and implement an HPC-based supercomputing facility from commodity computers at an organizational level to provide two separate implementations for cluster-based supercomputing using Hadoop and Spark-based HPC clusters, primarily for data-intensive jobs and Torque-based clusters for Multiple Instruction Multiple Data (MIMD) workloads. The performance of these clusters is measured through extensive experimentation. With the implementation of the message passing interface, the performance of the Spark and Torque clusters is increased by 16.6% for repetitive applications and by 73.68% for computation-intensive applications with a speedup of 1.79 and 2.47 respectively on the HPDA cluster. We conclude that the specific application or job could be chosen to run based on the computation parameters on the implemented clusters.
Published: 2022
Full Text: View/download PDF

115. Scalable Post-Processing of Large-Scale Numerical Simulations of Turbulent Fluid Flows

Author: Christian Lagares, Wilson Rivera, and Guillermo Araya
Subjects: CFD post-processing, Kokkos, distributed memory, shared memory, scalability, out-of-core processing, Mathematics, QA1-939
Abstract: Military, space, and high-speed civilian applications will continue contributing to the renewed interest in compressible, high-speed turbulent boundary layers. To further complicate matters, these flows present complex computational challenges ranging from the pre-processing to the execution and subsequent post-processing of large-scale numerical simulations. Exploring more complex geometries at higher Reynolds numbers will demand scalable post-processing. Modern times have brought application developers and scientists the advent of increasingly more diversified and heterogeneous computing hardware, which significantly complicates the development of performance-portable applications. To address these challenges, we propose Aquila, a distributed, out-of-core, performance-portable post-processing library for large-scale simulations. It is designed to alleviate the burden of domain experts writing applications targeted at heterogeneous, high-performance computers with strong scaling performance. We provide two implementations, in C++ and Python; and demonstrate their strong scaling performance and ability to reach 60% of peak memory bandwidth and 98% of the peak filesystem bandwidth while operating out of core. We also present our approach to optimizing two-point correlations by exploiting symmetry in the Fourier space. A key distinction in the proposed design is the inclusion of an out-of-core data pre-fetcher to give the illusion of in-memory availability of files yielding up to 46% improvement in program runtime. Furthermore, we demonstrate a parallel efficiency greater than 70% for highly threaded workloads.
Published: 2022
Full Text: View/download PDF

116. Memory-Optimized Wavefront Parallelism on GPUs.

Author: Li, Yuanzhe and Schwiebert, Loren
Subjects: *GRAPHICS processing units, *PARTIAL differential equations, *SEQUENCE alignment, *PARALLEL programming
Abstract: Wavefront parallelism is a well-known technique for exploiting the concurrency of applications that execute nested loops with uniform data dependencies. Recent research of such applications, which range from sequence alignment tools to partial differential equation solvers, has used GPUs to benefit from the massively parallel computing resources. To achieve optimal performance, tiling has been introduced as a popular solution to achieve a balanced workload. However, the use of hyperplane tiles increases the cost of synchronization and leads to poor data locality. In this paper, we present a highly optimized implementation of the wavefront parallelism technique that harnesses the GPU architecture. A balanced workload and maximum resource utilization are achieved with an extremely low synchronization overhead. We design the kernel configuration to significantly reduce the minimum number of synchronizations required and also introduce an inter-block lock to minimize the overhead of each synchronization. In addition, shared memory is used in place of the L1 cache. The well-tailored mapping of the operations to the shared memory improves both spatial and temporal locality. We evaluate the performance of our proposed technique for four different applications: sequence alignment, edit distance, summed-area table, and 2D-SOR. The performance results demonstrate that our method achieves speedups of up to six times compared to the previous best-known hyperplane tiling-based GPU implementation. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

117. A Directed Test Generator for Shared-Memory Verification of Multicore Chip Designs.

Author: Andrade, Gabriel A. G., Graf, Marleson, Pfeifer, Nicolas, and dos Santos, Luiz C. V.
Subjects: *MULTICORE processors, *NEIGHBORHOODS, *MULTIPROCESSORS, *EXAMINATIONS, *DESIGN, *MEMORY
Abstract: The functional verification of multicore chips requires the generation of parallel test programs able to expose design errors and ensure high coverage in less time. Albeit the coherence hardware can scale gracefully as the number of cores grows, the state space of the coherence protocol increases exponentially. That is why this article describes a directed test generation approach that exploits random test generation (RTG) for avoiding explicit enumeration of the coherence state space while memory consistency is verified. The novel approach was designed for synergy between a data-driven engine that explores neighborhoods toward higher coverage and a model-based engine that exploits constraints while driving RTG toward faster coverage evolution. As compared to a state-of-the-art data-driven generator and to a model-based generator, the proposed approach led to superior coverage evolution with time, when targeting 32-core designs relying on different protocols. For MOESI 2-level, the novel approach was from 4.8 to 18.7 faster to reach the data-driven generator’s maximal coverage, and it was up to 2.7 faster to reach the model-driven generator’s. For MESI 3-level, it found, in 10 to 15 min, a few errors whose detection required the data-driven generator 45 min to 7 h. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

118. Soft Memory Box: A Virtual Shared Memory Framework for Fast Deep Neural Network Training in Distributed High Performance Computing

Author: Shinyoung Ahn, Joongheon Kim, Eunji Lim, and Sungwon Kang
Subjects: High performance computing, distributed computing, soft memory box, shared memory, deep neural network, distributed deep learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Deep learning is one of the major promising machine learning methodologies. Deep learning is widely used in various application domains, e.g., image recognition, voice recognition, and natural language processing. In order to improve learning accuracy, deep neural networks have evolved by: 1) increasing the number of layers and 2) increasing the number of parameters in massive models. This implies that distributed deep learning platforms need to evolve to: 1) deal with huge/complex deep neural networks and 2) process with high-performance computing resources for massive training data. This paper proposes a new virtual shared memory framework, called Soft Memory Box (SMB), which enables sharing the memory of remote node among distributed processes in the nodes so as to improve communication performance via parameter sharing. According to data-intensive performance evaluation results, the communication time of deep learning using the proposed SMB is 2.1 times faster than that using the massage passing interface (MPI). In addition, the communication time of the SMB-based asynchronous parameter update becomes 2-7 times faster than that using the MPI depending on deep learning models and the number of deep learning workers.
Published: 2018
Full Text: View/download PDF

119. Convergence and covering on graphs for wait-free robots

Author: Armando Castañeda, Sergio Rajsbaum, and Matthieu Roy
Subjects: Robot gathering, Agreement, Symmetry breaking, Shared memory, Wait-freedom, Combinatorial topology, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract The class of robot convergence tasks has been shown to capture fundamental aspects of fault-tolerant computability. A set of asynchronous robots that may fail by crashing, start from unknown places in some given space, and have to move towards positions close to each other. In this article, we study the case where the space is uni-dimensional, modeled as a graph G. In graph convergence, robots have to end up on one or two vertices of the same edge. We consider also a variant of robot convergence on graphs, edge covering, where additionally, it is required that not all robots end up on the same vertex. Remarkably, these two similar problems have very different computability properties, related to orthogonal fundamental issues of distributed computations: agreement and symmetry breaking. We characterize the graphs on which each of these problems is solvable, and give optimal time algorithms for the solvable cases. Although the results can be derived from known general topology theorems, the presentation serves as a self-contained introduction to the algebraic topology approach to distributed computing, and yields concrete algorithms and impossibility results.
Published: 2018
Full Text: View/download PDF

120. COMPARISON OF MULTI-FRONTAL AND ALTERNATING DIRECTION PARALLEL HYBRID MEMORY IGRM DIRECT SOLVER FOR NON-STATIONARY SIMULATIONS.

Author: WOŹNIAK, MACIEJ and BUKOWSKA, ANNA
Subjects: ISOGEOMETRIC analysis, SPARSE matrices, DIRECT costing, FINITE element method, MEMORY, FACTORIZATION
Abstract: Three-dimensional isogeometric analysis (IGA-FEM) is a modern method for simulation. The idea is to utilize B-splines or NURBS basis functions for both computational domain descriptions and engineering computations. Re- fined isogeometric analysis (rIGA) employs a mixture of patches of elements with B-spline basis functions and C0 separators between them. This enables a reduction in the computational cost of direct solvers. Both IGA and rIGA come with challenging sparse matrix structures that are expensive to generate. In this paper, we show a hybrid parallelization method using hybrid-memory parallel machines. The two-level parallelization includes the partitioning of the computational mesh into sub-domains on the first level (MPI) and loop parallelization on the second level (OpenMP). We show that the hybrid paral- lelization of the integration reduces the contribution of this phase significantly. We compare the multi-frontal solver and alternating direction solver, including the integration and the factorization phases. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

121. Bounded disagreement.

Author: Chan, David Yu Cheng, Hadzilacos, Vassos, and Toueg, Sam
Subjects: *SIMILARITY (Geometry), *GENERALIZATION
Abstract: A well-known generalization of the consensus problem, namely, set agreement (SA) , limits the number of distinct decision values that processes decide. In some settings, it may be more important to limit the number of "disagreers". Thus, we introduce another natural generalization of the consensus problem, namely, bounded disagreement (BD) , which limits the number of processes that decide differently from the plurality. More precisely, in a system with n processes, the (n , ℓ) -BD task has the following requirement: there is a value v such that at most ℓ processes (the disagreers) decide a value other than v. Despite their apparent similarities, the results described below show that bounded disagreement, consensus, and set agreement are in fact fundamentally different problems. We investigate the relationship between bounded disagreement, consensus, and set agreement. In particular, we determine the consensus number [15] for every instance of the BD task. We also determine values of n , ℓ , m , and k such that the (n , ℓ) -BD task can solve the (m , k) -SA task (where m processes can decide at most k distinct values). Using our results and a previously-known impossibility result for set agreement [7] , we prove that for all n ≥ 2 , there is a BD task (and a corresponding BD object) that has consensus number n but cannot be solved using n -consensus and registers. Prior to our paper, the only objects known to have this unusual characteristic for n ≥ 2 (which shows that the consensus number of an object is not sufficient to fully capture its power) were artificial objects crafted solely for the purpose of exhibiting this behavior [1,17]. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

122. PERFORMANCE SIMULATION MODEL FOR A SHARED MEMORY MULTI-CORE COMPUTER SYSTEM USING TIME COLORED PETRINETS.

Author: Amusan, Elliizabeth A. and Gabriiell, Undiie A.
Subjects: COMPUTER storage devices, COLLECTIVE memory, MULTICORE processors, PETRI nets, TIME management, SIMULATION methods & models
Abstract: A shared memory multi-core computer system is a computing paradigm in which processors have more than one core to process requests and also have access to a common memory. Most existing works are limited to modeling of a shared memory single-core computer system and thus the models are not flexible enough to study the operations of multi-core computer systems. Hence, in this paper, a high-level Petri Nets formalism (Timed Coloured Petri Nets) was used to develop a simulation model for a shared memory multicore computer system. Intel HP core i5 was used as a case study in developing the TCPN model for a shared memory multi-core computer system. The developed TCPN model was simulated using Coloured Petri Net (CPN) tools. One hundred and fifty simulation runs were carried out in order to obtain average utilization rate of the shared memory and average waiting time of the processor's cores in accessing the shared memory. The developed TCPN model was validated based on both real and simulated average memory utilization of the shared memory multi-core computer system. The validation result of the developed TCPN model showed that there was no significant difference between the simulated and real average memory utilization of the shared memory multi-core computer system. [ABSTRACT FROM AUTHOR]
Published: 2020

123. ALLSCALE API.

Author: GSCHWANDTNER, Philipp, JORDAN, Herbert, THOMAN, Peter, and FAHRINGER, Thomas
Subjects: DISTRIBUTED algorithms, DATA structures, APPLICATION program interfaces, PARALLEL programming
Abstract: Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for a modification but open for an extension, allowing new user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

124. Life beyond set agreement.

Author: Chan, David Yu Cheng, Hadzilacos, Vassos, and Toueg, Sam
Subjects: *HIERARCHIES
Abstract: The set agreement power of a shared object O describes O's ability to solve set agreement problems: it is the sequence (n 1 , n 2 , ... , n k , ...) such that, for every k ≥ 1 , using O and registers one can solve the k-set agreement problem among at most n k processes. It has been shown that the ability of an object O to implement other objects is not fully characterized by its consensus number (the first component of its set agreement power). This raises the following natural question: is the ability of an object O to implement other objects fully characterized by its set agreement power? We prove that the answer is no: every level n ≥ 2 of Herlihy's consensus hierarchy has two linearizable objects that have the same set agreement power but are not equivalent, i.e., at least one cannot implement the other. We also show that every level n ≥ 2 of the consensus hierarchy contains a deterministic linearizable object O n with some set agreement power (n 1 , n 2 , ... , n k , ...) such that being able to solve the k-set agreement problems among n k processes, for all k ≥ 1 , is not enough to implement O n . [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

125. The Implementation of Real-Time Storage for Parallel Plasma Equilibrium Reconstruction Data in EAST PCS.

Author: Zhu, J. Q., Shen, B., Yuan, Qiping, Zhang, R. R., Huang, Y., Guo, H. R., Yan, L. L., and Zheng, Y. Y.
Subjects: *PLASMA equilibrium, *PLASMA confinement, *DATA warehousing
Abstract: In order to obtain the accurate and fast equilibrium reconstruction for Experimental Advanced Superconducting Tokamak (EAST), parallel plasma equilibrium reconstruction (P-EFIT) based on parallel computation in the graphic processing unit (GPU) and the EFIT framework is developed. However, more data will be produced during shots with improved accuracy and rapidity. 76 KB of P-EFIT data are generated in each iteration of $500~\mu \text{s}$ with computation grid size of $129 \times 129$. EAST is aimed at 1000-s-long-pulse discharge, then 52 GB of P-EFIT data will be generated during a shot even at 1.5 ms saving interval. In the previous storage mode, these data will be saved in memory during shots and archived to MDSplus as a whole after the shot, which means a very large memory requirement and heavy network transmission load after shots. Therefore, the real-time storage is designed for P-EFIT data to replace the “after shot archiving” mode. In the new mode, the P-EFIT processes and real-time archiving process will coordinate and share P-EFIT data gathered from GPU via shared memory, then the P-EFIT data will be archived to MDSplus in real-time during shots based on “segmented record” of MDSplus, which provides the possibility to reuse the memory allocated for P-EFIT and save data in higher saving frequency. The real-time storage of P-EFIT data was tested in 10-s discharge and 1000-s-long-pulse simulation discharge, and the validity and the advantage of memory saving have been verified. The real-time storage has been integrated into a plasma control system (PCS) and will be applied to next experiments in EAST. In order to obtain the accurate and fast equilibrium reconstruction for Experimental Advanced Superconducting Tokamak (EAST), parallel plasma equilibrium reconstruction (P-EFIT) based on parallel computation in the graphic processing unit (GPU) and the EFIT framework is developed. However, more data will be produced during shots with improved accuracy and rapidity. 76 KB of P-EFIT data are generated in each iteration of $500~\mu \text{s}$ with computation grid size of $129 \times 129$. EAST is aimed at 1000-s-long-pulse discharge, then 52 GB of P-EFIT data will be generated during a shot even at 1.5 ms saving interval. In the previous storage mode, these data will be saved in memory during shots and archived to MDSplus as a whole after the shot, which means a very large memory requirement and heavy network transmission load after shots. Therefore, the real-time storage is designed for P-EFIT data to replace the “after shot archiving” mode. In the new mode, the P-EFIT processes and real-time archiving process will coordinate and share P-EFIT data gathered from GPU via shared memory, then the P-EFIT data will be archived to MDSplus in real-time during shots based on “segmented record” of MDSplus, which provides the possibility to reuse the memory allocated for P-EFIT and save data in higher saving frequency. The real-time storage of P-EFIT data was tested in 10-s discharge and 1000-s-long-pulse simulation discharge, and the validity and the advantage of memory saving have been verified. The real-time storage has been integrated into a plasma control system (PCS) and will be applied to next experiments in EAST. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

126. Performance analysis and hardware implementation of a nearly optimal buffer management scheme for high‐performance shared‐memory switches.

Author: Zheng, Ling, Pan, Weitao, Li, Yinghua, and Gao, Ya
Subjects: *FIELD programmable gate arrays, *IP networks, *NETWORK routers, *TRAFFIC patterns, *HIGH performance computing
Abstract: Summary: Burst traffic is a common traffic pattern in modern IP networks, and it may lead to the unfairness problem and seriously degrade the performance of switches and routers. From the perspective of switching mechanism, the majority of commercial switches adopt the on‐chip shared‐memory switching architecture, and high‐speed packet buffer with efficient queue management is required to deal with the unfairness and congestion problem. In this paper, the performance of a shared‐private buffer management scheme is analyzed in detail. In the proposed scheme, the total memory space is split into shared area and private area. Each output port has a private memory area that cannot be used by other ports. The shared area is completely shared among all output ports. A theoretical queuing model of the proposed scheme is formulated, and closed‐form formulas for multiple performance parameters are derived. Through the numerical studies, we demonstrate that a nearly optimal buffer partition policy can be obtained by setting an equally small amount of private area for each queue. This work is validated by simulations as well as hardware experiments. Software simulations show that the proposed scheme performs better than existing methods, and packet dropping caused by burst traffic can be significantly reduced. Besides, a prototype of the buffer management module is implemented and evaluated in field programmable gate array platform. The evaluation shows that the proposed scheme can ensure the efficiency and fairness while keeping a high throughput in real workload. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

127. Reducing the Impact of Intensive Dynamic Memory Allocations in Parallel Multi-Threaded Programs.

Author: Langr, Daniel and Kocicka, Martin
Subjects: *PARALLEL programming, *MEMORY, *SOURCE code, *DATA structures
Abstract: Frequent dynamic memory allocations (DyMAs) can significantly hinder the scalability of parallel multi-threaded programs. As the number of threads grows, DyMAs can even become the main performance bottleneck. We introduce modern tools and methods for evaluating the impact of DyMAs and present techniques for its reduction, which include scalable heap implementations, small buffer optimization, and memory pooling. Additionally, we provide a survey of state-of-the-art implementations of these techniques and study them experimentally by using a benchmark program, server simulator software, and a real-world high-performance computing application. As a result, we show that relatively small modifications in parallel program's source code or a way of its execution may substantially reduce the runtime overhead associated with the use of dynamic data structures. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

128. A memory scheduling strategy for eliminating memory access interference in heterogeneous system.

Author: Fang, Juan, Wang, Mengxuan, and Wei, Zelin
Subjects: *COMPUTER scheduling, *MULTICORE processors, *MEMORY, *PROBLEM solving
Abstract: Multiple CPUs and GPUs are integrated on the same chip to share memory, and access requests between cores are interfering with each other. Memory requests from the GPU seriously interfere with the CPU memory access performance. Requests between multiple CPUs are intertwined when accessing memory, and its performance is greatly affected. The difference in access latency between GPU cores increases the average latency of memory accesses. In order to solve the problems encountered in the shared memory of heterogeneous multi-core systems, we propose a step-by-step memory scheduling strategy, which improve the system performance. The step-by-step memory scheduling strategy first creates a new memory request queue based on the request source and isolates the CPU requests from the GPU requests when the memory controller receives the memory request, thereby preventing the GPU request from interfering with the CPU request. Then, for the CPU request queue, a dynamic bank partitioning strategy is implemented, which dynamically maps it to different bank sets according to different memory characteristics of the application, and eliminates memory request interference of multiple CPU applications without affecting bank-level parallelism. Finally, for the GPU request queue, the criticality is introduced to measure the difference of the memory access latency between the cores. Based on the first ready-first come first served strategy, we implemented criticality-aware memory scheduling to balance the locality and criticality of application access. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

129. Chaining and Biasing: Test Generation Techniques for Shared-Memory Verification.

Author: Andrade, Gabriel A. G., Graf, Marleson, and dos Santos, Luiz C. V.
Subjects: *MAGNITUDE (Mathematics), *GENERATIONS, *EVICTION
Abstract: Since nondeterministic behavior is key to exposing shared-memory errors, nonsynchronized parallel programs are often used for verification and test of multicore chips. In the verification phase, however, the slow execution in a simulator requires nonconventional constraints for enabling error exposure with shorter programs. This paper proposes two novel techniques that build upon conventional random test generation for efficient shared-memory verification. The first technique exploits canonical dependence chains for constraining the random generation of instruction sequences so that the races induced at runtime are likely to raise the coverage of state transitions due to memory events conflicting at a same shared location. The second one exploits address space constraints for biasing random address assignment so that the competition of distinct shared locations for a same cache set can be controlled for raising the coverage of state transitions due to eviction events. We built generators relying on each of the proposed techniques, as well as on their combination, and we compared them to a conventional constrained random test generator for 8, 16, and 32-core architectures. Each of the four generators synthesized 1200 distinct test programs for verifying ten faulty designs derived from each of the three architectures (144 000 verification runs in total). For 32-core designs, the combination of the proposed techniques made at least 50% of the generation space capable of exposing errors, improved the median functional coverage by 44% and 83% at the two highest hierarchical levels, and reduced the average verification effort by one order of magnitude in many cases. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

130. Time-Triggered Switch-Memory-Switch Architecture for Time-Sensitive Networking Switches.

Author: Li, Zonghui, Wan, Hai, Deng, Yangdong, Zhao, Xibin, Gao, Yue, Song, Xiaoyu, and Gu, Ming
Subjects: *SWITCHING systems (Telecommunication), *FAULT-tolerant computing, *COMPUTER scheduling, *SOCIAL conflict, *MOTIVATION (Psychology), *ETHERNET
Abstract: Time-sensitive networking (TSN) is a set of extended standards for the IEEE 802.3 Ethernet under development by the IEEE 802.1 TSN task group. TSN depends on two key components, scheduling and fault tolerance, to provide real-time and reliable transmission. There is a strong motivation to replace the widely used field-buses with TSNs in industrial networking applications. However, industrial network devices are typical application-specific embedded systems with limited memory resources. Time-sensitive (TS) transmission certainly prefers on-chip memory, which is even more scarce for embedded systems. As a result, it is critical for TSNs to develop memory-efficient switching techniques with scalable schedulability and elegant fault-tolerance support. This paper proposes a time-triggered switch-memory-switch (SMS) architecture for memory-efficient TSN switches. First, based on the SMS shared memory, our architecture makes it possible to statically schedule memory allocation with full utilization for TS traffic and the remaining memory for other traffic. Compared with per-port memory, the shared memory achieves a ratio of (${n^{n}}/{n!}$) (${\approx }({e^{n}}/{\sqrt {2 \pi n}}), {n\rightarrow \infty }$), where ${n}$ is the port number, in the feasible solution space under memory constraints and thus significantly improves scheduling memory ability and flexibility. Moreover, we develop a fault-tolerance scheme for reliable transmission. It facilitates a memory-efficient implementation of the popular multiline redundancy in industrial networks. The scheme is validated by five classes of memory conflicts and a case study on two-line redundancy. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

131. PARALLELIZATION OF ASSEMBLY OPERATION IN FINITE ELEMENT METHOD.

Author: BOŠANSKÝ, MICHAL and PATZÁK, BOREK
Subjects: FINITE element method, PARALLEL programming
Published: 2020
Full Text: View/download PDF

132. Recoverable mutual exclusion.

Author: Golab, Wojciech and Ramaraju, Aditya
Subjects: *DATA structures, *FAULT-tolerant computing, *SYNCHRONIZATION
Abstract: Mutex locks have traditionally been the most common mechanism for protecting shared data structures in concurrent programs. However, the robustness of such locks against process failures has not been studied thoroughly. The vast majority of mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their efficiency. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of efficiency, which we define in terms of remote memory references. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

133. CLOMP: Accurately Characterizing OpenMP Application Overheads

Author: Bronevetsky, Greg, Gyllenhaal, John, and Supinski, Bronis R.
Subjects: Computer Science, Software Engineering/Programming and Operating Systems, Processor Architectures, Theory of Computation, OpenMP, Benchmarking, Performance, Profiling, Shared memory
Abstract: Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to understand measurements of OpenMP overheads in the context of application usage scenarios. Our results for several OpenMP implementations demonstrate that CLOMP identifies the amount of work required to compensate for the overheads observed with EPCC. We also show that CLOMP also captures limitations for OpenMP parallelization on SMT and NUMA systems. Finally, CLOMPI, our MPI extension of CLOMP, demonstrates which aspects of OpenMP interact poorly with MPI when MPI helper threads cannot run on the NIC.
Published: 2009

134. Graphics Controllers

Author: Grimm, Andreas, Chen, Janglin, editor, Cranton, Wayne, editor, and Fihn, Mark, editor
Published: 2016
Full Text: View/download PDF

135. Layer-by-Layer Partitioning of Finite Element Meshes for Multicore Architectures

Author: Novikov, Alexander, Piminova, Natalya, Kopysov, Sergey, Sagdeeva, Yulia, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Voevodin, Vladimir, editor, and Sobolev, Sergey, editor
Published: 2016
Full Text: View/download PDF

136. A Parallel Multithreading Algorithm for Self-gravity Calculation on Agglomerates

Author: Nesmachnow, Sergio, Frascarelli, Daniel, Tancredi, Gonzalo, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Liu, Ting, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Gitler, Isidoro, editor, and Klapp, Jaime, editor
Published: 2016
Full Text: View/download PDF

137. Parallel Image De-fencing: Technique, Analysis and Performance Evaluation

Author: Khalid, Madiha, Yousaf, Muhammad Murtaza, Sulaiman, Hamzah Asyrani, editor, Othman, Mohd Azlishah, editor, Othman, Mohd Fairuz Iskandar, editor, Rahim, Yahaya Abd, editor, and Pee, Naim Che, editor
Published: 2016
Full Text: View/download PDF

138. Synthesizing Code for GPGPUs from Abstract Formal Models

Author: Blindell, Gabriel Hjort, Menne, Christian, Sander, Ingo, Oppenheimer, Frank, editor, and Medina Pasaje, Julio Luis, editor
Published: 2016
Full Text: View/download PDF

139. Utilization of Parallel Computing for Discrete Self-organizing Migration Algorithm

Author: Běhálek, Marek, Gajdoš, Petr, Davendra, Donald, Kacprzyk, Janusz, Series editor, Davendra, Donald, editor, and Zelinka, Ivan, editor
Published: 2016
Full Text: View/download PDF

140. Towards Semi-automated Parallelization of Data Stream Processing

Author: Kruliš, Martin, Bednárek, David, Falt, Zbyněk, Yaghob, Jakub, Zavoral, Filip, Kacprzyk, Janusz, Series editor, Novais, Paulo, editor, Camacho, David, editor, Analide, Cesar, editor, El Fallah Seghrouchni, Amal, editor, and Badica, Costin, editor
Published: 2016
Full Text: View/download PDF

141. Parallel Environments

Author: Mishra, Bhabani Shankar Prasad, Sagnika, Santwana, Kacprzyk, Janusz, Series editor, Mishra, Bhabani Shankar Prasad, editor, Dehuri, Satchidananda, editor, Kim, Euiwhan, editor, and Wang, Gi-Name, editor
Published: 2016
Full Text: View/download PDF

142. k-Abortable Objects: Progress Under High Contention

Author: Ben-David, Naama, Chan, David Yu Cheng, Hadzilacos, Vassos, Toueg, Sam, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gavoille, Cyril, editor, and Ilcinkas, David, editor
Published: 2016
Full Text: View/download PDF

143. On Composition and Implementation of Sequential Consistency

Author: Perrin, Matthieu, Petrolia, Matoula, Mostéfaoui, Achour, Jard, Claude, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gavoille, Cyril, editor, and Ilcinkas, David, editor
Published: 2016
Full Text: View/download PDF

144. Anonymity-Preserving Failure Detectors

Author: Bouzid, Zohir, Travers, Corentin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gavoille, Cyril, editor, and Ilcinkas, David, editor
Published: 2016
Full Text: View/download PDF

145. On Atomicity in Presence of Non-atomic Writes

Author: Enea, Constantin, Farzan, Azadeh, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chechik, Marsha, editor, and Raskin, Jean-François, editor
Published: 2016
Full Text: View/download PDF

146. Asynchronous Consensus with Bounded Memory

Author: Delporte-Gallet, Carole, Fauconnier, Hugues, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Abdulla, Parosh Aziz, editor, and Delporte-Gallet, Carole, editor
Published: 2016
Full Text: View/download PDF

147. A Maude Framework for Cache Coherent Multicore Architectures

Author: Bijo, Shiji, Johnsen, Einar Broch, Pun, Ka I, Tapia Tarifa, Silvia Lizeth, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Lucanu, Dorel, editor
Published: 2016
Full Text: View/download PDF

148. Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem

Author: Zhang, Jie, Lu, Xiaoyi, Chakraborty, Sourav, Panda, Dhabaleswar K. (DK), Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Dutot, Pierre-François, editor, and Trystram, Denis, editor
Published: 2016
Full Text: View/download PDF

149. Unity3D-MatLab Simulator in Real Time for Robotics Applications

Author: Andaluz, Víctor Hugo, Chicaiza, Fernando A., Gallardo, Cristian, Quevedo, Washington X., Varela, José, Sánchez, Jorge S., Arteaga, Oscar, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, De Paolis, Lucio Tommaso, editor, and Mongelli, Antonio, editor
Published: 2016
Full Text: View/download PDF

150. A Virtual Machine Data Communication Mechanism on Openstack

Author: Chen, Jie, Xu, Saihong, Zhang, Haiyang, Wang, Zunliang, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Josef, Kittler, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Zu, Qiaohong, editor, and Hu, Bo, editor
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

13,616 results on '"Shared memory"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources