Publisher: wiley-blackwell / Search Limiters: Academic (Peer-Reviewed) Journals / Topic: computer architecture - Searchworks@Jio Institute Digital Library Search Results

1. A tale of two directories: implementing distributed shared objects in Java<FN>A preliminary version of this paper appeared in the 1999 Java Grande Conference. ACM Java Grande Conference. A Tale of Two Directories: Implementing Distributed Shared Objects in Java, 1999. </FN>

Author: Herlihy, Maurice and Warres, Michael P.
Subjects: JAVA programming language, DISTRIBUTED computing, COMPUTER networks, ELECTRONIC data processing, COMPUTER architecture, COMPUTER science
Abstract: A directory service keeps track of the location and status of mobile objects in a distributed system. This paper describes our experience implementing two distributed directory protocols as part of the Aleph toolkit, a distributed shared object system implemented in Java. One protocol is a conventional home-based protocol, in which a fixed node keeps track of the object's location and status. The other is a novel Arrow protocol, based on a simple path-reversal algorithm. We were surprised to discover that the Arrow protocol outperformed the home protocol, sometimes substantially, across a range of system sizes. This paper describes a series of experiments testing whether the discrepancy is due to an artifact of the Java run-time system (such as differences in thread management or object serialization costs), or whether it is something inherent in the protocols themselves. In the end, we use insights gained from these experimental results to design a new directory protocol that combines advantages of both. Copyright © 2000 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2000
Full Text: View/download PDF

2. Event‐based high throughput computing: A series of case studies on a massively parallel softcore machine.

Author: Vousden, Mark, Morris, Jordan, McLachlan Bragg, Graeme, Beaumont, Jonathan, Rafiev, Ashur, Luk, Wayne, Thomas, David, and Brown, Andrew
Subjects: CONDENSED matter physics, ELECTRICITY pricing, COMPUTATIONAL chemistry, COMPUTER architecture, MESSAGE passing (Computer science)
Abstract: This paper introduces an event‐based computing paradigm, where workers only perform computation in response to external stimuli (events). This approach is best employed on hardware with many thousands of smaller compute cores with a fast, low‐latency interconnect, as opposed to traditional computers with fewer and faster cores. Event‐based computing is timely because it provides an alternative to traditional big computing, which suffers from immense infrastructural and power costs. This paper presents four case study applications, where an event‐based computing approach finds solutions to orders of magnitude more quickly than the equivalent traditional big compute approach, including problems in computational chemistry and condensed matter physics. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Recent developments in high-performance computing and simulation: distributed systems, architectures, algorithms, and applications.

Author: Smari, Waleed W., Fiore, Sandro, and Trinitis, Carsten
Subjects: HIGH performance computing, SIMULATION methods & models, COMPUTER architecture, COMPUTER algorithms, APPLICATION software
Published: 2015
Full Text: View/download PDF

4. Sharing non‐cache‐coherent memory with bounded incoherence.

Author: Ren, Yuxin, Parmer, Gabriel, and Milojicic, Dejan
Subjects: CACHE memory, MEMORY, MODERN architecture, COMPUTER architecture, INFORMATION sharing, MANAGEMENT controls
Abstract: Summary: Cache coherence in modern computer architectures enables easier programming by sharing data across multiple processors. Unfortunately, it can also limit scalability due to cache coherency traffic initiated by competing memory accesses. Rack‐scale systems introduce shared memory across a whole rack, but without inter‐node cache coherence. This poses memory management and concurrency control challenges for applications that must explicitly manage cache‐lines. To fully utilize rack‐scale systems for low‐latency and scalable computation, applications need to maintain cached memory accesses in spite of non‐coherency. This paper introduces Bounded Incoherence, a memory consistency model that enables cached access to shared data‐structures in non‐cache‐coherency memory. It ensures that updates to memory on one node are visible within at most a bounded amount of time on all other nodes. We evaluate this memory model on modified PowerGraph graph processing framework, and boost its performance by 30% with eight sockets by enabling cached‐access to data‐structures. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Computer architecture and high performance computing.

Author: de Camargo, Raphael Y., Marozzo, Fabrizio, and Martins, Wellington
Subjects: COMPUTER architecture, HIGH performance computing, ALGORITHMS, GRAPHICS processing units, SOFTWARE development tools, DATA structures
Abstract: They evaluate these optimizations in different multicore and GPU architectures, investigating the impact of different APIs on the performance, energy efficiency, and portability of the code. In the fifth contribution, entitled "Energy efficiency and portability of oil and gas simulations on multicore and graphics processing unit architectures", Serpa et al.5 propose three optimizations for an oil and gas application, reverse time migration (RTM), which reduce the floating-point operations by changing the equation derivatives. Moreover, the Brute Force algorithm running on CPU + GPU architecture has greater energy efficiency, reaching at least 1.79× more operations per energy consumption than other algorithms on different architectures explored in the work. [Extracted from the article]
Published: 2021
Full Text: View/download PDF

6. A survey of value prediction techniques for leveraging value locality.

Author: Mittal, Sparsh
Subjects: COMPUTER storage capacity, BANDWIDTH allocation, COMPUTER scheduling, COMPUTER architecture, INTEGRATED circuits
Abstract: Value locality (VL) refers to recurrence of values in a memory structure, and value prediction (VP) refers to predicting VL and leveraging it for diverse optimizations. VP holds the promise of exceeding true-data dependencies and provide performance and bandwidth advantages in both single- and multi-threaded applications. Fully exploiting the potential of VL, however, requires addressing several challenges, such as achieving high accuracy and coverage, reducing hardware and latency overheads, etc. In this paper, we present a survey of techniques for leveraging value locality. We categorize the research works based on key parameters to provide insights and highlight similarities and differences. This paper is expected to be useful for researchers, processor architects, and chip-designers. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

7. Architecting cloud-enabled systems: a systematic survey of challenges and solutions.

Author: Chauhan, Muhammad Aufeef, Babar, Muhammad Ali, and Benatallah, Boualem
Subjects: COMPUTER architecture, COMPUTER software development, CLOUD computing, MIDDLEWARE, SCALABILITY, META-analysis
Abstract: The literature on the challenges of and potential solutions to architecting cloud-based systems is rapidly growing but is scattered. It is important to systematically analyze and synthesize the existing research on architecting cloud-based software systems in order to build a cohesive body of knowledge of the reported challenges and solutions. We have systematically identified and reviewed 133 papers that report architecture-related challenges and solutions for cloud-based software systems. This paper reports the methodological details, findings, and implications of a systematic review that has enabled us to identify 44 unique categories of challenges and associated solutions for architecting cloud-based software systems. We assert that the identified challenges and solutions classified into the categories form a body of knowledge that can be leveraged for designing or evaluating software architectures for cloud-based systems. Our key conclusions are that a large number of primary studies focus on middleware services aimed at achieving scalability, performance, response time, and efficient resource optimization. Architecting cloud-based systems presents unique challenges as the systems to be designed range from pervasive embedded systems and enterprise applications to smart devices with Internet of Things. We also conclude that there is a huge potential of research on architecting cloud-based systems in areas related to green computing, energy efficient systems, mobile cloud computing, and Internet of Things. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

8. Latest trends in computer architectures and parallel and distributed technologies.

Author: Schulze, Bruno, Rebello, Vinod, and Moreira, Jose
Subjects: COMPUTER architecture, PARALLEL computers, DISTRIBUTED computing, HIGH performance computing, PROFESSIONAL peer review, COMPUTER input-output equipment, SYSTEMS development
Abstract: ABSTRACT This special issue focuses on new developments in high performance applications, as well as the latest trends in computer architecture and parallel and distributed technologies, and is based on extended, thoroughly revised papers from the 22nd International Symposium on Computer Architecture and High Performance Computing. The authors were invited to provide extended versions of their original papers, taking into account comments and suggestions raised during the peer review process and comments from the audience during the conference. Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

9. ReliaCloud‐NS: A scalable web‐based simulation platform for evaluating the reliability of cloud computing systems.

Author: Snyder, Brett, Green II, Robert C., Devabhaktuni, Vijay, and Alam, Mansoor
Subjects: COMPUTER architecture, CLOUD computing, INTERNET software, COMPUTER systems, RELIABILITY in engineering
Abstract: Summary: This paper discusses the implementation, architecture, and use of a graphical web‐based application called ReliaCloud‐NS that allows users to (1) evaluate the reliability of a cloud computing system (CCS) and (2) design a CCS to a specified reliability level for both public and private clouds. The software was designed with a RESTful application programming interface for performing nonsequential Monte Carlo simulations to perform reliability evaluations of a CCS. Simulation results are stored and presented to the user in the form of interactive charts and graphs from within a web browser. The software contains multiple types of CCS components, simulations, and virtual machine allocation schemes. ReliaCloud‐NS also contains a novel feature that evaluates CCS reliability across a range of varying virtual machine allocations and establishes and graphs a CCS reliability curve. This paper discusses the software architecture, the interactive web‐based interface, and the different types of simulations available in ReliaCloud‐NS and presents an overview of the results generated from a simulation. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

10. KIND-DAMA: A modular middleware for Kinect-like device data management.

Author: Milazzo, Fabrizio, Gentile, Vito, Gentile, Antonio, and Sorce, Salvatore
Subjects: MIDDLEWARE, KINECT (Motion sensor), DATABASE management, USER interfaces, COMPUTER architecture
Abstract: In the last decades, we have witnessed a growing interest toward touchless gestural user interfaces. Among other reasons, this is due to the large availability of different low-cost gesture acquisition hardware (the so-called 'Kinect-like devices'). As a consequence, there is a growing need for solutions that allow to easily integrate such devices within actual systems. In this paper, we present KIND-DAMA, an open and modular middleware that helps in the development of interactive applications based on gestural input. We first review the existing middlewares for gestural data management. Then, we describe the proposed architecture and compare its features against the existing similar solutions we found in the literature. Finally, we present a set of studies and use cases that show the effectiveness of our proposal in some possible real-world scenarios. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

11. CNTFET‐based ternary address decoder design.

Author: Mohammed, Rawan, Fouda, Mohammed E., Alouani, Ihsen, Said, Lobna A., and Radwan, Ahmed G.
Subjects: MOORE'S law, COMPUTER architecture, LOGIC circuits, COMPUTER systems
Abstract: Summary: With the end of Moore's law, new paradigms are investigated for more scalable computing systems. One of the promising directions is to examine the data representation toward higher data density per hardware element. Multiple valued logic (MVL) emerged as a promising system due to its advantages over binary data representation. MVL offers higher information processing within the same number of digits when compared with binary systems. Accessing memory is considered one of the most power‐ and time‐consuming instructions within a microprocessor. In the quest for building an entire ternary computer architecture, we propose investigating the potential opportunities of ternary address decoders. This paper presents three different designs for ternary address decoder based on CNTFET. The first design is based on a cascade of Ternary to Binary blocks (T2B) and a binary decoder. The second design is built using the hierarchical structure and enables signals. The third is designed utilising a pre‐decoder and ternary logic gates. A comparison of the proposed designs and the binary address decoder in terms of power and delay under different supply voltage and temperature values is introduced. Simulation results show that the second design has the least power and delay of the proposed ternary designs. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. Compilation of MATLAB computations to CPU/GPU via C/OpenCL generation.

Author: Reis, Luís, Bispo, João, and Cardoso, João M. P.
Subjects: COMPUTER architecture, COMPUTER programming, PROGRAMMING languages, COMPUTING platforms, COMPUTER performance, GRAPHICS processing units, COMPILERS (Computer programs)
Abstract: Summary: In order to take advantage of the processing power of current computing platforms, programmers typically need to develop software versions for different target devices. This task is time‐consuming and requires significant programming and computer architecture expertise. A possible and more convenient alternative is to start with a single high‐level description of a program with minimum implementation details, and generate custom implementations according to the target platform. In this paper, we use MATLAB as a high‐level programming language and propose a compiler that targets CPU/GPU computing platforms by generating customized implementations in C and OpenCL. We propose a number of compiler techniques to automatically generate efficient C and OpenCL code from MATLAB programs. One of such compiler techniques relies on heuristics to decide when and how to use Shared Virtual Memory (SVM). The experimental results show that our approach is able to generate code that provides significant speedups (eg, geometric mean speedup of 11× for a set of simple benchmarks) using a discrete GPU over equivalent sequential C code executing on a CPU. With more complex benchmarks, for which only some code regions can be parallelized, and are thus offloaded, the generated code achieved speedups of up to 2.2×. We also show the impact of using SVM, specifically fine‐grained buffers, and the results show that the compiler is able to achieve significant speedups, both over the versions without SVM and with naïve aggressive SVM use, across three CPU/GPU platforms. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

13. Computer architecture and high performance computing.

Author: Goldman, Alfredo, Arantes, Luciana, and Moreno, Edward
Subjects: COMPUTER architecture, HIGH performance computing, DISTRIBUTED computing, COMPILERS (Computer programs), PRODUCTION scheduling
Published: 2017
Full Text: View/download PDF

14. Towards effective scheduling policies for many-task applications: Practice and experience based on HTCaaS.

Author: Kim, Jik‐Soo, Quang, Bui, Rho, Seungwoo, Kim, Seoyoung, Kim, Sangwan, Breton, Vincent, and Hwang, Soonwook
Subjects: SOFTWARE as a service, COMPUTER architecture, MIDDLEWARE, RESOURCE allocation, CLOUD computing
Abstract: In this paper, we conduct a comparative study of relatively simple yet effective scheduling policies for many-task applications where multiple users with varying numbers of tasks are actively sharing a common system infrastructure. We have implemented three different scheduling mechanisms that can address fairness and user response time respectively in a common middleware stack called HTCaaS, which is a pilot-job-based multi-level scheduling system running on top of production-level clusters. As a representative case of our many-task applications, we have leveraged the virtual screening application, which is a computational technique used in drug discovery process to select the most promising candidate drugs for in vitro testing from millions of chemical compounds, which typically requires a substantial amount of computing resources and efficient processing of docking simulations. Our comparative experimental results of different scheduling policies show how we can effectively support multiple users in a shared resource environment by balancing between user satisfaction and overall system performance, provide guidelines to improve system utilization, and address additional technical challenges to support various many-task applications. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

15. Increasing software development efficiency and maintainability for complex industrial systems -- A case study.

Author: Lagerström, Robert, Sporrong, Ulf, and Wall, Anders
Subjects: COMPUTER software, COMPUTER software development, COMPUTER architecture, CASE studies, ELECTRONIC systems
Abstract: It is difficult to manage complex software systems. Thus, many research initiatives focus on how to improve software development efficiency and maintainability. However, the trend in the industry is still alarming, software development projects fail, and maintenance is becoming more and more expensive. One problem could be that research has been focusing on the wrong things. Most research publications address either process improvements or architectural improvements. There are few known approaches that consider how architectural changes affect processes and vice versa. One method proposed, called the Business-Architecture-Process method, takes these aspects into consideration. In 2007 the method was tested in one case study. Findings in the 2007 case study show that the method is useful, but in need of improvements and further validation. The present paper employs the method in a second case study. The contribution in this paper is thus a second test and validation of the proposed method, and useful method improvements for future use of the method. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

16. Solving the discretised neutron diffusion equations using neural networks.

Author: Phillips, Toby R.F., Heaney, Claire E., Chen, Boyang, Buchan, Andrew G., and Pain, Christopher C.
Subjects: NEUTRON diffusion, HEAT equation, FINITE volume method, COMPUTER architecture, CENTRAL processing units
Abstract: This paper presents a new approach which uses the tools within artificial intelligence (AI) software libraries as an alternative way of solving partial differential equations (PDEs) that have been discretised using standard numerical methods. In particular, we describe how to represent numerical discretisations arising from the finite volume and finite element methods by pre‐determining the weights of convolutional layers within a neural network. As the weights are defined by the discretisation scheme, no training of the network is required and the solutions obtained are identical (accounting for solver tolerances) to those obtained with standard codes often written in Fortran or C++. We also explain how to implement the Jacobi method and a multigrid solver using the functions available in AI libraries. For the latter, we use a U‐Net architecture which is able to represent a sawtooth multigrid method. A benefit of using AI libraries in this way is that one can exploit their built‐in technologies to enable the same code to run on different computer architectures (such as central processing units, graphics processing units or new‐generation AI processors) without any modification. In this article, we apply the proposed approach to eigenvalue problems in reactor physics where neutron transport is described by diffusion theory. For a fuel assembly benchmark, we demonstrate that the solution obtained from our new approach is the same (accounting for solver tolerances) as that obtained from the same discretisation coded in a standard way using Fortran. We then proceed to solve a reactor core benchmark using the new approach. For both benchmarks we give timings for the neural network implementation run on a CPU and a GPU, and a serial Fortran code run on a CPU. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Investigating memory prefetcher performance over parallel applications: From real to simulated.

Author: Girelli, Valéria S., Moreira, Francis B., Serpa, Matheus S., Carastan‐Santos, Danilo, and Navaux, Philippe O. A.
Subjects: MEMORY, PARALLEL programming, CACHE memory, PARALLEL processing, COMPUTER architecture
Abstract: Memory prefetcher algorithms are widely used in processors to mitigate the performance gap between the processors and the memory subsystem. The complexities behind the architectures and prefetcher algorithms, however, not only hinder the development of accurate architecture simulators, but also hinder understanding the prefetcher's contribution to performance, on both a real hardware and in a simulated environment. In this paper, we contribute to shed light on the memory prefetcher's role in the performance of parallel High‐Performance Computing applications, considering the prefetcher algorithms offered by both the real hardware and the simulators. We performed a careful experimental investigation, executing the NAS parallel benchmark (NPB) on a real Skylake machine, and as well in a simulated environment with the ZSim and Sniper simulators, taking into account the prefetcher algorithms offered by both Skylake and the simulators. Our experimental results show that: (i) prefetching from the L3 to L2 cache presents better performance gains, (ii) the memory contention in the parallel execution constrains the prefetcher's effect, (iii) Skylake's parallel memory contention is poorly simulated by ZSim and Sniper, and (iv) Skylake's noninclusive L3 cache hinders the accurate simulation of NPB with the Sniper's prefetchers. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

18. A note on resource orchestration for cloud computing.

Author: Ranjan, Rajiv, Buyya, Rajkumar, Nepal, Surya, and Georgakopulos, Dimitrios
Subjects: MUSIC orchestration, CLOUD computing, COMPUTER systems, COMPUTER architecture, COMPUTER software
Published: 2015
Full Text: View/download PDF

19. Dynamic and hierarchical IPv6 address configuration for a mobile ad hoc network.

Author: Wang, Xiaonan and Qian, Huanyan
Subjects: INTERNET protocols, AD hoc computer networks, COMPUTER architecture, CLUSTER analysis (Statistics), COST analysis
Abstract: SUMMARY The paper proposes a dynamic and hierarchical IPv6 address configuration scheme for a mobile ad hoc network (MANET). The scheme proposes the hierarchical architecture and combines the distributed and centralized address configuration approaches. In the architecture, a central node assigns IPv6 addresses for cluster heads that are distributed around a MANET, and distributed cluster heads assign IPv6 addresses for cluster members. The generation algorithm of a cluster is proposed, and it uses the number of potential cluster members as a measurement unit and minimizes the number of cluster heads. Therefore, the address configuration cost for cluster heads is reduced. A central node/cluster head uses the unicast communication mode to achieve the real-time address recovery in order to ensure that it has enough address resources for assignment. The paper also proposes the low-cost MANET merging/partitioning algorithm that guarantees that no address collision happens during the MANET merging/partitioning process. This paper analyzes the performance parameters of the proposed scheme, including the address configuration cost, the address configuration delay, and the number of MANET merging. The analytical results show that the proposed scheme effectively reduces the address configuration cost, shortens the address configuration delay, and decreases the number of MANET merging. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

20. Dynamic Self-Repair Architectures for Defective Through-silicon Vias.

Author: Joon-Sung Yang, Tae Hee Han, Kobla, Darshan, and Ju, Edward L.
Subjects: THREE-dimensional integrated circuits, THROUGH-silicon via, MANUFACTURING processes, SYSTEMS design, COMPUTER architecture, SYSTEMS on a chip testing
Abstract: Three-dimensional integration technology results in area savings, platform power savings, and an increase in performance. Through-silicon via (TSV) assembly and manufacturing processes can potentially introduce defects. This may result in increases in manufacturing and test costs and will cause a yield problem. To improve the yield, spare TSVs can be included to repair defective TSVs. This paper proposes a new built-in self-test feature to identify defective TSV channels. For defective TSVs, this paper also introduces dynamic self-repair architectures using code-based and hardware-mapping based repair. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

21. Dynamic wireless network shaping via moving cells: The nomadic nodes case.

Author: Tsoulos, George, Bulakci, Ömer, Zarbouti, Dimitra, Athanasiadou, Georgia, and Kaloxylos, Alexandros
Subjects: WIRELESS sensor networks, FIFTH generation computers, CAPITAL investments, COMPUTER architecture, ELECTROMAGNETIC wave propagation, GEODATABASES
Abstract: Within the Fifth Generation framework of dynamic network topology and moving cells, the nomadic node (NN) concept is seen as a promising paradigm to provide coverage extension and capacity improvement on demand as well as to attain reduced capital expenditure and operational expenditure when compared with fixed-node deployment, such as microcells. This paper proposes an enabling architecture for the NN operation in the framework of a wireless network, and then analyzes the physical layer aspects of the problem and investigates the performance of NN operation in the context of a realistic multi-cellular wireless network, with the help of site-specific propagation modeling and real geographical databases. The produced results provide useful insights into the performance of the NN operation; the results show that the NN operation is beneficial both in line-of-sight and non-line-of-sight scenarios and demonstrate that the NN selection strategy, which is based on the backhaul signal-to-interference-plus-noise ratio with a 10 dB power window, offers almost identical performance with the optimum end-to-end throughput algorithm. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

22. A survey of techniques for architecting TLBs.

Author: Mittal, Sparsh
Subjects: CACHE memory, VIRTUAL reality, INFORMATION & communication technologies, ENERGY consumption, COMPUTER architecture
Abstract: Translation lookaside buffer (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Because TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects, and system engineers. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

23. An evaluation of analytical queries on CPUs and coupled GPUs.

Author: Luan, Hua and Chang, Lei
Subjects: CENTRAL processing units, GRAPHICS processing units, COMPUTER architecture, COMPUTER performance, BIT rate, DATABASES
Abstract: Recently, the mainstream hardware vendors such as Intel and AMD have made significant efforts to integrate the central processing unit (CPU) and the graphics processing unit (GPU) into a single chip, which forms a coupled CPU-GPU architecture. Data transfer between the CPU and the GPU through a Peripheral Component Interconnect Express bus is eliminated on this architecture, which provides new opportunities for database community to optimize query processing. Because of the lack of comprehensive evaluation of database systems on coupled CPU-GPU platforms, it is difficult for academic and industry researchers to make appropriate decisions on improvement and optimization directions. In this paper, we conduct an extensive experimental study to evaluate an online analytical processing system on Intel and AMD machines. The performance difference is measured and analyzed when executing queries on integrated GPUs and multicore CPUs. The impacts of various parameters, data sizes, and optimization techniques on performance are also investigated. The results provide preliminary insights into database query and operator behaviors on state-of-the-art coupled CPU-GPU architectures. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

24. Technology classification, industry, and education for Future Internet of Things.

Author: Ning, Huansheng and Hu, Sha
Subjects: INTERNET, INFORMATION technology, TECHNOLOGICAL innovations, INTERNET in education, MATHEMATICAL models, COMPUTER architecture, INDUSTRIES
Abstract: SUMMARY The Internet of Things (IoT) is developing rapidly and becoming a hot topic around the world. On the basis of the reorganized Unit IoT and Ubiquitous IoT, two models for Future IoT are proposed in this paper. A dimension model is established to classify the complicated IoT technologies and a layer model is built for Future IoT system architecture. Then, the IoT vision and its development phases prediction are presented. Furthermore, the thought of regarding IoT as an emerging industry is explained to be inappropriate because IoT is a new stage of intelligentization and informatization development. Meanwhile, the necessity of training qualified personnel in colleges is introduced. Then, the relation between IoT and the science and technology system and IoT relevant subjects is analyzed. At the end, this paper raises the problem of setting IoT as a major in college and proposes some suggestions. Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

25. Towards an architecture for deploying elastic services in the cloud.

Author: Kirschnick, Johannes, Alcaraz Calero, Jose M., Goldsack, Patrick, Farrell, Andrew, Guijarro, Julio, Loughran, Steve, Edwards, Nigel, and Wilcock, Lawrence
Subjects: CLOUD computing, COMPUTER architecture, COMPUTER software management, WEB services, PEER-to-peer architecture (Computer networks), INFORMATION technology
Abstract: SUMMARY Cloud computing infrastructure services enable the flexible creation of virtual infrastructures on-demand. However, the creation of infrastructures is only a part of the process for provisioning services. Other steps such as installation, deployment, configuration, monitoring and management of software components are needed to fully provide services to end-users in the cloud. This paper describes a peer-to-peer architecture to automatically deploy services on cloud infrastructures. The architecture uses a component repository to manage the deployment of these software components, enabling elasticity by using the underlying cloud infrastructure provider. The life cycle of these components is described in this paper, as well as the language for defining them. We also describe the open-source proof-of-concept implementation. Some technical information about this implementation together with some statistical results are also provided. Copyright © 2011 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

26. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.

Author: Haidar, Azzam, Ltaief, Hatem, YarKhan, Asim, and Dongarra, Jack
Subjects: COMPUTER algorithms, LINEAR algebra, MULTICORE processors, COMPUTER architecture, COMPUTER memory management, NUMERICAL analysis software, GRAPH theory
Abstract: SUMMARY The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left-looking and right-looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited. Copyright © 2011 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

27. Performance analysis of multi-level parallelism: inter-node, intra-node and hardware accelerators.

Author: Hackenberg, Daniel, Juckeland, Guido, and Brunst, Holger
Subjects: PERFORMANCE evaluation, MULTICORE processors, PARALLEL computers, COMPUTER software development, COMPUTATIONAL complexity, COMPUTER architecture
Abstract: SUMMARY The advent of multi-core processors has made parallel computing techniques mandatory on mainstream systems. With the recent rise in hardware accelerators, hybrid parallelism adds yet another dimension of complexity to the process of software development. The inner workings of a parallel program are usually difficult to understand and verify. This paper presents a tool for graphical program flow analysis of hardware accelerated parallel programs. It monitors the hybrid program execution to record and visualize many performance relevant events along the way. Representative real-world applications written for both IBM's Cell processor and NVIDIA's CUDA API are studied exemplarily. With our combined monitoring and visualization approach for hardware accelerated multi-core and multi-node systems we take the next step in tool evolution towards a highly improved level of detail, precision, and completeness. The contents of this paper is of interest to developers of hardware accelerated applications as well as performance tool architects. Copyright © 2011 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

28. A novel approach to user data federation in Next-Generation Networks.

Author: Bartolomeo, Giovanni, Kovacikova, Tatiana, and Petersen, Françoise
Subjects: COMPUTER network architectures, INFORMATION resources, XML (Extensible Markup Language), COMPUTER architecture, COMPUTER systems
Abstract: User Profile Management (UPM) is an essential feature to support applications in Next-Generation Network (NGN). Its actual realization depends on standardization of information and preferences and the ways in which user data are expressed and handled. To this end, an NGN may support different approaches; among them, the ‘federated approach’ seems particularly relevant, as it fits the very common scenario in which many existing services and devices already contain specific settings and preferences that are however unrelated to any other. This paper proposes a novel, totally distributed solution for data federation, where authorization, management of data localization and synchronization of federated data are thought to be part of the meta information that may be associated with each data element, rather than being external resources or built-in functions, as in legacy solutions. First, the present paper introduces the work on personalization and UPM and the NGN architecture being done at ETSI. Then, the paper illustratesthe work on user data interoperability performed at OASIS XDI TC. Finally, based on these inputs, it explains how to combine them into a novel solution (which is not part of ETSI or OASIS specifications). Copyright © 2009 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

29. Automatic instantiation of abstract tests on specific configurations for large critical control systems.

Author: Flammini, Francesco, Mazzocca, Nicola, and Orazzo, Antonio
Subjects: AUTOMATIC control systems, COMPUTER software testing, COMMAND & control systems, COMPUTER architecture, COMPUTER software
Abstract: Computer-based control systems have grown in size, complexity, distribution and criticality. In this paper a methodology is presented to perform an ‘abstract testing’ of such large control systems in an efficient way: an abstract test is specified directly from system functional requirements and has to be instantiated in more test runs to cover a specific configuration, comprising any number of control entities (sensors, actuators and logic processes). Such a process is usually performed by hand for each installation of the control system, requiring a considerable time effort and being an error-prone verification activity. To automate a safe passage from abstract tests, related to the so-called generic software application, to any specific installation, an algorithm is provided, starting from a reference architecture and a state-based behavioural model of the control software. The presented approach has been applied to a railway interlocking system, demonstrating its feasibility and effectiveness in several years of testing experience. Copyright © 2008 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

30. Plug-and-play remote portlet publishing.

Author: Wang, X. D., Yang, X., Allan, R. J., and Baker, M.
Subjects: WEB services, PLUG & play (Computer architecture), COMPUTER architecture, COMPUTERS, INTERFACE circuits, COMPUTER network resources, REMOTE access networks
Abstract: Web Services for Remote Portlets (WSRP) is gaining attention among portal developers and vendors to enable easy development, increased richness in functionality, pluggability, and flexibility of deployment. Whilst currently not supporting all WSRP functionalities, open-source portal frameworks could in future use WSRP Consumers to access remote portlets found from a WSRP Producer registry service. This implies that we need a central registry for the remote portlets and a more expressive WSRP Consumer interface to implement the remote portlet functions. This paper reports on an investigation into a new system architecture, which includes a Web Services repository, registry, and client interface. The Web Services repository holds portlets as remote resource producers. A new data structure for expressing remote portlets is found and published by populating a Universal Description, Discovery and Integration (UDDI) registry. A remote portlet publish and search engine for UDDI has also been developed. Finally, a remote portlet client interface was developed as a Web application. The client interface supports remote portlet features, as well as window status and mode functions. Copyright © 2007 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

31. Evaluation of bank-based multiport memory architecture with blocking network.

Author: Inoue, Tomohiro, Hironaka, Tetsuo, Sasaki, Takahiro, Fukae, Seiji, Koide, Tetsushi, and Mattausch, Hans J.
Subjects: COMPUTER architecture, HARBORS, BANKING industry, DATA transmission systems, TRANSISTORS, TRANSISTOR circuits
Abstract: The bank-based multiport memory is a better composition approach to realizing realistic chip area and high access bandwidth than a conventional N-port memory cell approach. However, this method is unsuitable for large numbers of ports and banks because the hardware resources of the crossbar network which connects the ports and banks increase in proportion to the product of the numbers of ports and banks. In order to solve this problem, this paper proposes a new bank-based multiport memory architecture using a blocking network instead of a crossbar network. Many blocking networks have been researched so far. However, these researches evaluated hardware resources based on the number of switches, but the compositions and circuit scale of the switches used in crossbar network and blocking network are different. Hence, this paper compares the number of transistors to show that the bank-based multiport memory using the blocking network achieves high access bandwidth with smaller hardware resources than the conventional approach. According to our results, our approach achieves the same access bandwidth with half the number of transistors, for 512 ports and 512 banks. © 2006 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 89(6): 22–33, 2006; Published online in Wiley InterScience (www. interscience.wiley.com). DOI 10.1002/ecjc.20205 [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

32. From genetic to bacteriological algorithms for mutation-based testing<FNR></FNR><FN>Based on ‘Genes and bacteria for automatic test cases optimization in the .NET environment’ by Benoit Baudry, Frank Fleurey, Jean-Marc Jézéquel and Yves Le Traon which appeared in Proceedings of the International Symposium on Software Reliability Engineering, Annapolis, MD, November 2002, pp. 195–206 [1]. © 2002 IEEE. This revised and expanded version appears here with the permission of the IEEE </FN>

Author: Baudry, Benoit, Fleurey, Franck, Jézéquel, Jean-Marc, and Le Traon, Yves
Subjects: GENETIC algorithms, COMPUTER algorithms, COMBINATORIAL optimization, GENETIC programming, COMPUTER programming, COMPUTER architecture
Abstract: The level of confidence in a software component is often linked to the quality of its test cases. This quality can in turn be evaluated with mutation analysis: faults are injected into the software component (making mutants of it) to check the proportion of mutants detected (‘killed’) by the test cases. But while the generation of a set of basic test cases is easy, improving its quality may require prohibitive effort. This paper focuses on the issue of automating the test optimization. The application of genetic algorithms would appear to be an interesting way of tackling it. The optimization problem is modelled as follows: a test case can be considered as a predator while a mutant program is analogous to a prey. The aim of the selection process is to generate test cases able to kill as many mutants as possible, starting from an initial set of predators, which is the test cases set provided by the programmer. To overcome disappointing experimentation results, on .Net components and unit Eiffel classes, a slight variation on this idea is studied, no longer at the ‘animal’ level (lions killing zebras, say) but at the bacteriological level. The bacteriological level indeed better reflects the test case optimization issue: it mainly differs from the genetic one by the introduction of a memorization function and the suppression of the crossover operator. The purpose of this paper is to explain how the genetic algorithms have been adapted to fit with the issue of test optimization. The resulting algorithm differs so much from genetic algorithms that it has been given another name: bacteriological algorithm. Copyright © 2005 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2005
Full Text: View/download PDF

33. An authentication architecture for collaboration among agents in ad hoc networks.

Author: Okataku, Yasukuni, Yoshioka, Nobukazu, and Honiden, Shinichi
Subjects: COMPUTER network security, AUTHENTICATION (Law), COMPUTER architecture, DATA protection, RANDOM numbers, ELECTRONIC surveillance
Abstract: This paper proposes an authentication architecture for collaboration among agents in a network environment without security assurance. The architecture requires that there should exist at least one secure node (oasis node). The oasis node generates the same number of authentication codes as the number of objects of authentication, using random numbers and agent information, and distributes the codes among the agents. The agents gather at the specified oasis node and obtain verification by the oasis node, based on the distributed random value and the authentication code. In the authentication architecture proposed in this paper, the random number and the authentication code are publicized information which can be compromised by eavesdropping. But the algorithm for generation and verification of the authentication code is not publicized. The architecture is suited for handling authentication processing in ad hoc collaboration among an unspecified number of agents. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 1, 87(5): 11–19, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecja.10165 [ABSTRACT FROM AUTHOR]
Published: 2004
Full Text: View/download PDF

34. A CORBA Commodity Grid Kit.

Author: Parashar, Manish, Von Laszewski, Gregor, Verma, Snigdha, Gawor, Jarek, Keahey, Kate, and Rehn, Nell
Subjects: COMPUTER architecture, COMPUTER software development, WEB services, COMPUTER software, COMPUTER systems
Abstract: This paper reports on an ongoing research project aimed at designing and deploying a Common Object Resource Broker Architecture (CORBA) (ww.omg.org) Commodity Grid (CoG) Kit. The overall goal of this project is to enable the development of advanced Grid applications while adhering to state-of-the-art software engineering practices and reusing the existing Grid infrastructure. As part of this activity, we are investigating how CORBA can be used to support the development of Grid applications. In this paper, we outline the design of a CORBA CoG Kit that will provide a software development framework for building a CORBA ‘Grid domain’. We also present our experiences in developing a prototype CORBA CoG Kit that supports the development and deployment of CORBA applications on the Grid by providing them access to the Grid services provided by the Globus Toolkit. Copyright © 2002 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2002
Full Text: View/download PDF

35. A systematic review of approaches for testing concurrent programs.

Author: Arora, Vinay, Bhatia, Rajesh, and Singh, Maninder
Subjects: COMPUTER multitasking, COMPUTER software, MULTICORE processors, COMPUTER architecture, COMPUTER surveys
Abstract: Concurrent programs are replacing the sequential programs as they utilize the true capabilities of multicore architecture. The extensive use of multicore systems and multithreaded paradigms warrants more attention to the testing of the concurrent programs. The testing concurrent program is not a new field as it has been more than 40 years because the first problem related to the testing concurrent program was addressed by the researchers. The field covers various domains, which include concurrency problems, testing approaches, techniques, graphical representations, tools, and subject systems. This paper aims at providing an overview of research in the domain of testing concurrent programs by classifying it into eight categories: (a) reachability testing, (b) structural testing, (c) model-based testing, (d) mutation-based testing, (e) slicing-based testing, (f) formal methods, (g) random testing, and (h) search-based testing. The survey is focused on the techniques applied, methodologies followed, and tools used in these aforementioned approaches. Furthermore, the gaps are also identified in different approaches. The paper concludes with the consolidation of various testing parameters along with the future directions. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

36. Monitoring and improving performance in human-computer interaction.

Author: Carneiro, Davide, Pimenta, André, Gonçalves, Sérgio, Neves, José, and Novais, Paulo
Subjects: HUMAN-computer interaction, PERFORMANCE evaluation, ECONOMIC competition, WORK environment, COMPUTER architecture
Abstract: Monitoring an individual's performance in a task, especially in the workplace context, is becoming an increasingly interesting and controversial topic in a time in which workers are expected to produce more, better and faster. The tension caused by this competitiveness, together with the pressure of monitoring, may not work in favour of the organization's objectives. In this paper, we present an innovative approach on the problem of performance management. We build on the fact that computers are nowadays used as major work tools in many workplaces to devise a non-invasive method for distributed performance monitoring based on the observation of the worker's interaction with the computer. We then look at musical selection both as a pleasant and as an effective method for improving performance in the workplace. The proposed approach will allow team coordinators to assess and manage their co-workers' performance continuously and in real-time, using a distributed service-based architecture. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

37. Accelerating Asian option pricing on many-core architectures.

Author: Li, Shuo and Lin, James
Subjects: COMPUTER architecture, COMPUTER algorithms, BLACK-Scholes model, COMPUTER programming, APPROXIMATION theory
Abstract: In this paper, we start by looking at the algorithms and the numerical methods of pricing one exotic option, the strong path dependent Asian option using the Black-Scholes pricing model. We cover both geometric average and arithmetic average schemes that lead us to two different numerical solutions. Next, we discuss how to implement these algorithms on the leading many-core architectures with contrasting programming models and still achieve the comparable performance results. As an example, we will show that a 2-year contract with 252 times steps and 1,000,000 samples can be priced in approximately one fifth of a second on two leading many-core architectures. The purpose of this paper is to understand what is required to accelerate the numerical-intensive algorithms such as the Asian option pricing algorithm in quantitative and how to take advantage of the parallel programming features on many-core architecture and express parallelism inherent in the similar algorithms in quantitative finance. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

38. Efficient congestion control mechanism for flow-aware networks.

Author: Domżał, Jerzy, Wójcik, Robert, Chołda, Piotr, Stankiewicz, Rafał, and Jajszczyk, Andrzej
Subjects: COMPUTER networks, TELECOMMUNICATION systems, QUALITY of service, DATA transmission systems, COMPUTER architecture, COMPUTER simulation
Abstract: Transmission based on flows becomes more and more popular in teleinformatics networks. To guarantee proper quality of service, to enable multipath transmissions, or just to increase transmission effectiveness in a network, traffic should be sent as flows. Flow-aware networking architecture is one of the possible concepts to realize flow-based transmissions. In this paper, the efficient congestion control mechanism (ECCM) is proposed to improve transmission in flow-aware networks (FAN). The mechanism makes it possible to minimize acceptance delay of streaming flows (served with high priority) without deteriorating other transmissions in the network. It is confirmed by simulation experiments that the implementation of FAN with the ECCM mechanism is a promising solution for the Future Internet. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

39. PENNANT: an unstructured mesh mini-app for advanced architecture research.

Author: Ferenbaugh, Charles R.
Subjects: HYDRODYNAMICS, COMPUTER architecture, DATA structures, COMPUTER algorithms, COMPUTER programming
Abstract: SUMMARY This paper describes PENNANT, a mini-app that operates on general unstructured meshes (meshes with arbitrary polygons), and is designed for advanced architecture research. It contains mesh data structures and physics algorithms adapted from the Los Alamos National Laboratory radiation-hydrodynamics code FLAG and gives a sample of the typical memory access patterns of FLAG. The basic capabilities and optimization approaches of PENNANT are presented. Results are shown from sample performance experiments run on serial, multicore, and graphics processing unit implementations, giving an indication of how PENNANT can be a useful tool for studies of new architectures and programming models. Copyright © 2014 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

40. An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures.

Author: Tang, Guangping, Yang, Wangdong, Li, Kenli, Ye, Yu, Xiao, Guoqing, and Li, Keqin
Subjects: ITERATIVE methods (Mathematics), HYBRID systems, PARALLEL algorithms, MULTICORE processors, COMPUTER architecture
Abstract: An optimized parallel algorithm is proposed to solve the problem occurred in the process of complicated backward substitution of cyclic reduction during solving tridiagonal linear systems. Adopting a hybrid parallel model, this algorithm combines the cyclic reduction method and the partition method. This hybrid algorithm has simple backward substitution on parallel computers comparing with the cyclic reduction method. In this paper, the operation count and execution time are obtained to evaluate and make comparison for these methods. On the basis of results of these measured parameters, the hybrid algorithm using the hybrid approach with a multi-threading implementation achieves better efficiency than the other parallel methods, that is, the cyclic reduction and the partition methods. In particular, the approach involved in this paper has the least scalar operation count and the shortest execution time on a multi-core computer when the size of equations meets some dimension threshold. The hybrid parallel algorithm improves the performance of the cyclic reduction and partition methods by 19.2% and 13.2%, respectively. In addition, by comparing the single-iteration and multi-iteration hybrid parallel algorithms, it is found that increasing iteration steps of the cyclic reduction method does not affect the performance of the hybrid parallel algorithm very much. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

41. A DISTRIBUTED MEDIA ACCESS PROTOCOL FOR PACKET RADIO NETWORKS AND PERFORMANCE ANALYSIS. PART 2: NETWORK SET-UP TIME AND DATA RATE.

Author: Pond, Lawrence C. and Li, Victor O. K.
Subjects: MOBILE communication systems, COMPUTER architecture, DATA transmission systems, PACKET radio transmission, CODE division multiple access, CELL phone systems
Abstract: In this, the second part of a two-part paper, the required time for establishing a mobile packet radio network using the virtual circuit and time division multiple access protocol developed in Part 1 is analysed. Tools are developed to determine the virtual circuit and network set-up times in terms of the channel bandwidth allocated to establish and maintain the network. The tools are then extended to include the effects of user mobility. Then these results are combined with the network capacity results of Part 1 to analyse the trade-off between the data rate and set-up time of the network. Next a hierarchical architecture is proposed and the network data rate versus set-up time trade-off of this architecture is analysed using these tools. This architecture is shown to both provide a higher data rate and establish faster than flat networks of the same number of nodes. [ABSTRACT FROM AUTHOR]
Published: 1995
Full Text: View/download PDF

42. CAKA: a novel cache-aware K-anycast routing scheme for publish/subscribe-based information-centric network.

Author: Ren, Jing, Lu, Kejie, Tang, Fei, Wang, Jin, Wang, Jianping, Wang, Sheng, and Liu, Shucheng
Subjects: CACHE memory, ROUTING (Computer network management), INFORMATION services, COMPUTER architecture, INFORMATION theory
Abstract: In the past few years, many publish/subscribe-based information-centric network (PS-ICN) architectures have been proposed and investigated to efficiently deliver information from content publishers to subscribers. However, most existing studies on PS-ICN have not considered how to utilize in-network caches, which is a common but important feature in ICN. To address this issue, in this paper, we propose a novel cache-aware K-anycast routing scheme, namely, CAKA, that can significantly improve the performance of content delivery. Specifically, we choose PURSUIT, which is one of the most important PS-ICN architectures, and leverage its bidirectional communication procedure to do the following: (1) enable multiple publishers to send probing messages to the same subscriber; and (2) allow the subscriber to retrieve content objects using K-anycast routing and network coding. In this study, we extend the PURSUIT protocol to support cache-aware K-anycast routing and design the algorithms to choose multiple partially disjointed paths for probing, and to select paths for content retrieval. To evaluate the performance of the proposed scheme, we develop not only a simulation testbed, but also a prototype running in a realistic network environment. Our studies show that the proposed scheme can significantly reduce the average hops to retrieve content objects, with very small overheads. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

43. LISP controller: a centralized LISP management system for ISP networks.

Author: Jeong, Taeyeol, Li, Jian, Hyun, Jonghwan, Yoo, Jae‐Hyoung, and Hong, James Won‐Ki
Subjects: COMPUTER network protocols, COMPUTER networks, SCALABILITY, PROGRAMMING languages, COMPUTER architecture, CONVERGENCE (Telecommunication)
Abstract: As the current Internet architecture is suffering from scalability issues, the network research community has proposed alternative designs for the Internet architecture. Among those solutions that adopt the idea of locator/identifier split paradigm, the locator/identifier separation protocol (LISP) has been considered as the most promising solution because of its incrementally deployable feature. Despite various advantages provided by LISP, many ISPs are still conservative to adopt LISP into their production network because the standard LISP does not fully satisfy ISP's requirements on LISP-enabled services. In this paper, we define ISP's requirements on LISP-enabled commercial services and describe limitations of the standard LISP from an ISP's perspective. Also, we propose LISP controller, a centralized LISP management system. By using LISP controller, we evaluate three ISP's representative LISP use cases: traffic engineering, virtual machine live migration, and vertical handover. The results show that the proposed LISP controller provides centralized management, controllability, and fast map entry update, without any modifications on the standard LISP. LISP controller allows an ISP to control and manage its LISP-enabled services while satisfying ISP's requirements. Copyright © 2015 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

44. A new hybrid solver with two-level parallel computing for large-scale structural analysis.

Author: Miao, Xinqiang, Jin, Xianlong, and Ding, Junhong
Subjects: PARALLEL computers, LARGE scale systems, STRUCTURAL analysis (Engineering), COMPUTER storage devices, COMPUTER architecture
Abstract: With the advancement of new processor and memory architectures, supercomputers of multicore and multinode architectures have become general tools for large-scale engineering and scientific simulations. However, the nonuniform latencies between intranode and internode communications on these machines introduce new challenges that need to be addressed in order to achieve optimal performance. In this paper, a novel hybrid solver that is especially designed for supercomputers of multicore and multinode architectures is proposed. The new hybrid solver is characterized by its two-level parallel computing approach on the basis of the strategies of two-level partitioning and two-level condensation. It distinguishes intranode and internode communications to minimize the communication overheads. Moreover, it further reduces the size of interface equation system to improve its convergence rate. Three numerical experiments of structural linear static analysis were conducted on DAWNING-5000A supercomputer to demonstrate the validity and efficiency of the proposed method. Test results show that the proposed approach was superior in performance compared with the conventional Schur complement method. Copyright © 2014 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

45. Interoperating grid infrastructures with the GridWay metascheduler.

Author: Carrión, Ismael Marín, Huedo, Eduardo, and Llorente, Ignacio M.
Subjects: GRID computing, INFRASTRUCTURE (Economics), COMPUTER software development, COMPUTER architecture, COMPUTER files, DATA transmission systems
Abstract: This paper describes the GridWay metascheduler and exposes its latest and future developments, mainly related to interoperability and interoperation. GridWay enables large-scale, reliable, and efficient sharing of computing resources over grid middleware. To favor interoperability, it shows a modular architecture based on drivers, which access middleware services for resource discovery and monitoring, job execution and management, and file transfer. This paper presents two new execution drivers for Basic Execution Service (BES) and Computing Resource Execution and Management (CREAM) services and introduces a remote BES interface for GridWay. This interface allows users to access GridWay's job metascheduling capabilities, using the BES implementation of GridSAM. Thus, GridWay now provides to end users more possibilities of interoperability and interoperation.Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

46. Computer Architecture and FPGAs: A Learning-by-Doing Methodology for Digital-Native Students.

Author: Cifredo‐Chacón, Mª De Los Ángeles, Quirós‐Olozábal, Ángel, and Guerrero‐Rodríguez, José María
Subjects: COMPUTER architecture, FIELD programmable gate arrays, INTERACTIVE learning, VHDL (Computer hardware description language), COMPUTER engineering education, EDUCATION
Abstract: The theoretical teaching of Computer Architecture is not suitable longer. In the present time, students claim for a learning-by-doing according to their dynamic and active character. Nowadays, interactive teaching is possible thanks to the decrease in the prices of the Field Programmable Gate Arrays. This paper proposes a learning-by-doing methodology to teach Computer Architecture to first-year student who belong to a digital-native generation. The method consists in developing a whole computer from scratch while they are introduced to hardware description languages (HDL) and programmable logic devices. Firstly, students design each and every element of the computer by VHDL language. Later on, they interconnect the verified elements and test the complete computer. A FPGA-based board is needed to implement and check the correct performance of the designed computer. This educational approach is intended to be used with first-year students from Computer Engineering Degree, thus, it is the first experience of the students with the basics of Computer Architecture. Students have a computer and a FPGA-based board in anytime. In the final exam, a design of a different computer is propounded. Computer testing and programming is a requirement to pass. The high percentage of passed students corroborated the success of the methodology. Thus, computer functioning and construction is understood by a hands-on methodology at the same time as VHDL language and FPGA technology are introduced. Lack attention is avoided since students keep a dynamic role working with their personal computer and FPGA at all times. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

47. The performance of uplink LDPC-coded multirelay cooperation based on virtual V-BLAST processing.

Author: Tang, Lei, Yang, Fengfan, Zhang, Shunwai, and Luo, Lin
Subjects: CODING theory, VIRTUAL machine systems, LOW density parity check codes, COMPUTER architecture, RADIO transmitter fading, DECODERS & decoding
Abstract: This paper proposes an efficient uplink low-density parity-check (LDPC)-coded multirelay cooperation architecture based on virtual Vertical-Bell Labs Layered Space-Time (V-BLAST) processing over a Rayleigh fading channel, where minimum mean square error in combination with a successive interference canceller and belief propagation-based joint iterative decoder based on the introduced multilayer Tanner graph are effectively designed to detect and decode the corrupted received sequence at the destination. By introducing V-BLAST transmission to coded multirelay cooperation, relays send their symbol streams simultaneously, which significantly reduce the transmission delay and provide higher transmission efficiency. The theoretical analysis and numerical results show that the proposed LDPC-coded cooperation scheme outperforms the coded noncooperation under the same code rate and achieves a better compromise, with respect to the performance, signal delay, and encoding complexity associated to the number of relays, than the conventional LDPC-coded cooperations without V-BLAST transmission. This performance gain can be credited to the proposed V-BLAST processing architecture and belief propagation-based joint iterative decoding by the introduced multilayer Tanner graph at a receiver side. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

48. Resource partitioning for Integrated Modular Avionics: comparative study of implementation alternatives.

Author: Han, Sanghyun and Jin, Hyun‐Wook
Subjects: RESOURCE partitioning (Ecology), AVIONICS industry, COMPUTER architecture, COMPUTER hardware description languages
Abstract: ABSTRACT Most current generation avionics systems are based on a federated architecture, where an electronic device runs a single software module or application that collaborates with other devices through a network. This architecture makes the software development process very simple, but the hardware system becomes very complicated and it is difficult to resolve issues of size, weight, and power efficiently. An integrated architecture can address the size, weight, and power issues and provide better software reusability, testability, and reliability by means of partitioning. Partitioning provides a framework that can transparently integrate several real-time applications on the same computing device, allowing the isolation of the execution environment in terms of resources and faults. Several studies on partitioning software platforms have been reported; however, to the best of our knowledge, extensive comparison and analysis of design and implementation alternatives have not been conducted owing to the extreme complexity of their implementation and measurement. In this paper, we present three design alternatives for partitioning at the user, kernel, and virtual machine monitor levels, which are compared quantitatively. In particular, we target the worldwide standard software platform for avionics systems, that is, Aeronautical Radio, Incorporated Specification 653 (ARINC 653). Overall, our study provides valuable design references and demonstrates the characteristics of design alternatives. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

49. A superlinear speedup region for matrix multiplication.

Author: Gusev, Marjan and Ristov, Sasko
Subjects: MATRIX multiplications, NUMBER theory, CACHE memory, COMPUTER architecture, COMPUTER algorithms, EXISTENCE theorems
Abstract: SUMMARY The realization of modern processors is based on a multicore architecture with increasing number of cores per processor. Multicore processors are often designed such that some level of the cache hierarchy is shared among cores. Usually, last level cache is shared among several or all cores (e.g., L3 cache) and each core possesses private low level caches (e.g., L1 and L2 caches). Superlinear speedup is possible for matrix multiplication algorithm executed in a shared memory multiprocessor due to the existence of a superlinear region. It is a region where cache requirements for matrix storage of the sequential execution incur more cache misses than in parallel execution. This paper shows theoretically and experimentally that there is a region, where the superlinear speedup can be achieved. We provide a theoretical proof of existence of a superlinear speedup and determine boundaries of the region where it can be achieved. The experiments confirm our theoretical results. Therefore, these results will have impact on future software development and exploitation of parallel hardware on the basis of a shared memory multiprocessor architecture. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

50. An Adaptive Smart Grid Management Scheme Based on the Coopetition Game Model.

Author: Sungwook Kim
Subjects: SMART power grids, GAME theory software, COMPUTER architecture, COMPUTER systems management, ENERGY consumption
Abstract: Recently, the idea of the smart grid has been gaining significant attention and has become a hot research topic. The purpose of this paper is to present a novel smart grid management scheme that uses game theory principles. In our proposed scheme, power appliances in the smart grid adaptively form groups according to the non-cooperative hedonic game model. By exploiting multi-appliance diversity, appliances in each group are dynamically scheduled in a cooperative manner. For efficient smart grid management, the proposed coopetition game approach is dynamic and flexible to adaptively respond to current system conditions. The main feature is to maximize the overall system performance while satisfying the requirements of individual appliances. Simulation results indicate that our proposed scheme achieves higher energy efficiency and better system performance than other existing schemes. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

712 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources