129 results on '"Scalable Coherent Interface"'
Search Results
2. The MuSE system: A flexible combination of on-stack execution and work-stealing
- Author
-
Leberecht, Markus, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Rolim, José, editor, Mueller, Frank, editor, Zomaya, Albert Y., editor, Ercal, Fikret, editor, Olariu, Stephan, editor, Ravindran, Binoy, editor, Gustafsson, Jan, editor, Takada, Hiroaki, editor, Olsson, Ron, editor, Kale, Laxmikant V., editor, Beckman, Pete, editor, Haines, Matthew, editor, ElGindy, Hossam, editor, Caromel, Denis, editor, Chaumette, Serge, editor, Fox, Geoffrey, editor, Pan, Yi, editor, Li, Keqin, editor, Yang, Tao, editor, Chiola, G., editor, Conte, G., editor, Mancini, L. V., editor, Méry, Domenique, editor, Sanders, Beverly, editor, Bhatt, Devesh, editor, and Prasanna, Viktor, editor
- Published
- 1999
- Full Text
- View/download PDF
3. A Study of Switch Models for the Scalable Coherent Interface
- Author
-
Wu, B., Bogaerts, A., Skaali, B., Fdida, Serge, editor, and Onvural, Raif O., editor
- Published
- 1996
- Full Text
- View/download PDF
4. Multicast performance modeling and evaluation for high-speed unidirectional torus networks
- Author
-
Oral, Sarp and George, Alan D.
- Subjects
- *
TORUS , *COMPUTER networks , *VERSIFICATION , *ALGORITHMS - Abstract
This paper evaluates the performance of various unicast-based and path-based multicast protocols for high-speed torus networks. The results of an experimental case study on a Scalable Coherent Interface (SCI) torus network are presented. Small-message latency models of these software-based multicast algorithms as well as analytical projections for larger unidirectional torus systems are also introduced. The strengths and weaknesses of selected multicast protocols are experimentally and analytically illustrated in terms of various metrics, such as startup and completion latency, CPU utilization, and link concentration and concurrency for SCI networks under various networking and multicasting scenarios. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
5. Java Fast Sockets: Enabling high-speed Java communications on high performance clusters
- Author
-
Ramón Doallo, Juan Touriño, and Guillermo L. Taboada
- Subjects
Ethernet ,Java ,Computer Networks and Communications ,business.industry ,Computer science ,Serialization ,Gigabit Ethernet ,computer.software_genre ,Marshalling ,Shared memory ,Embedded system ,Operating system ,System area network ,Myrinet ,business ,computer ,Scalable Coherent Interface ,computer.programming_language - Abstract
This paper presents Java Fast Sockets (JFS), an optimized Java socket implementation on clusters for high performance computing. Current socket libraries do not efficiently support high-speed cluster interconnects and impose substantial communication overhead. JFS overcomes these performance constraints by: (1) enabling high-speed communication on cluster networks such as Scalable Coherent Interface (SCI), Myrinet and Gigabit Ethernet; (2) avoiding the need of primitive data type array serialization; (3) reducing buffering and unnecessary copies; and (4) reimplementing the protocol for boosting shared memory (intra-node) communication. Its interoperability and user and application transparency allow for immediate applicability on a wide range of parallel and distributed target applications. A performance evaluation conducted on a dual-core cluster has shown experimental evidence of throughput increase on SCI, Myrinet, Gigabit Ethernet and shared memory communication. It has also been analyzed the impact of this improvement on the overall application performance of representative parallel codes.
- Published
- 2008
6. Implementation of cache coherence protocol for COMA multiprocessor systems based on the scalable coherent interface
- Author
-
S. Ahmed and Mohammad Al-Rousan
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Distributed computing ,MESI protocol ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,MESIF protocol ,Shared memory ,Hardware and Architecture ,Bus sniffing ,Memory architecture ,Communications protocol ,Law ,Scalable Coherent Interface ,Software ,Cache coherence - Abstract
The Scalable Coherent Interface (SCI) provides high bandwidth, low latency communication in systems with point-to-point links. SCI can also support cache coherence in shared memory multiprocessors. Existing Cache-Only Memory Architecture (COMA) systems are all based on communication and cache coherence protocols other than SCI. Hierarchical COMA systems generally suffer from high message latencies. In this paper, we examine the implementation of COMA multiprocessors using SCI protocol. Implementing the SCI communication protocol in hierarchical COMA systems reduces latencies and its cache coherence protocol will reduce directory sizes. In this work, a hierarchical COMA system was modified so that it fits the SCI. In addition, a new replacement policy is proposed in which replaced lines swap storage locations with new ones fetched into the attraction memory. Similar to other COMA systems, this new system still gives high performance when part of the attraction memory is left unallocated. However, the performance of this system is slightly influenced by the initial distribution of data across the system.
- Published
- 2004
7. K-PROCESSOR RELIABILITY OF LARGE-SCALE RING-BASED HIERARCHICAL INTERCONNECTIONS
- Author
-
O. B. AL-Jarrah, Mohammad Al-Rousan, and Moad Mowafi
- Subjects
Ring (mathematics) ,Interconnection ,Mathematics::Commutative Algebra ,General Computer Science ,Computer science ,Distributed computing ,Reliability (computer networking) ,Energy Engineering and Power Technology ,Aerospace Engineering ,Context (language use) ,Scale (descriptive set theory) ,Multiprocessing ,Security token ,Topology ,Industrial and Manufacturing Engineering ,Nuclear Energy and Engineering ,Electrical and Electronic Engineering ,Safety, Risk, Reliability and Quality ,Scalable Coherent Interface - Abstract
Recently, connecting thousands of processors via interconnection networks based on multiple (hierarchical) rings has an increased interest. This is due to the large acceptance and success of the Scalable Coherent Interface (SCI) technology. The inherently weak behavior of ring architecture has led interconnection designers to consider various choices to improve the overall network reliability. An interesting choice is to use braided rings instead of the single (basic) rings in the hierarchy. In this paper, we present new formulas for computing K-processor reliability of SCI ring-based hierarchical networks in the context of large-scale multiprocessor systems. The derived formulas are general and applicable to any given systems size consisting of an arbitrary number of levels. The reliability of hierarchical systems based on the basic and braided rings is evaluated and analyzed using the derived formulas. The results show that hierarchical systems based on braided rings significantly improve the reliability of hierarchies constructed of basic rings. The results are general and not limited to systems of SCI rings; the analysis is valid for any type of rings architecture such as token and slotted rings.
- Published
- 2002
8. Comparative performance analysis of directed flow control for real-time SCI
- Author
-
Robert W. Todd, Matthew C. Chidester, and Alan D. George
- Subjects
Flow control (data) ,Network architecture ,Computer Networks and Communications ,business.industry ,Computer science ,Distributed computing ,Ring network ,Throughput ,Network topology ,Packet switching ,Hybrid system ,Discrete event simulation ,Communications protocol ,Priority queue ,business ,Scalable Coherent Interface ,Computer network - Abstract
The distributed nature of routing and flow control in a register-insertion ring topology complicates priority enforcement for real-time systems. Two divergent approaches for priority enforcement for ring-based networks are reviewed: a node-oriented scheme called preemptive priority queue and a ring-wide arbitration approach dubbed TRAIN. This paper introduces a hybrid protocol named directed flow control that combines node- and ring-oriented flow control to yield greater performance. A functional comparison of the three protocols as implemented on the scalable coherent interface is presented, followed by performance results obtained through high-fidelity modeling and simulation.
- Published
- 2001
9. Evaluating the impact of locality on the performance of large-scale SCI multiprocessors
- Author
-
Mohammad Al-Rousan, LeRoy Bearnson, and James Archibald
- Subjects
Interconnection ,Computer Networks and Communications ,Computer science ,Network packet ,Distributed computing ,Locality ,Clock rate ,Ring network ,Multiprocessing ,Parallel computing ,Computer Science::Hardware Architecture ,Hardware and Architecture ,Modeling and Simulation ,Latency (engineering) ,Computer Science::Operating Systems ,Scalable Coherent Interface ,Software - Abstract
Hierarchical ring-based multiprocessor systems are attractive and enjoy several advantages over other type of systems. They ensure unique paths between nodes, simple node interfaces and simple cross-ring connections. Furthermore, employing point-to-point links allows the system to run at high clock rate which increases bandwidth and decreases latency. This paper investigates the performance of hierarchical ring-based shared-memory multiprocessors. Rings in the hierarchy are composed of point-to-point, unidirectional links and apply the Scalable Coherent Interface (SCI) protocol. We pay special emphasis on the impact of locality on processor and interconnection design issues such as number of outstanding requests, and ring topology. We find that in order to exploit the power of hierarchical multiprocessors an accurate and appropriate model of locality must be used. Hierarchical multiprocessors that are well balanced (uniform) tend to provide lower latency and higher system throughput. For non-uniform systems, high degree of locality is required for the hierarchies to perform well. However, restricting the number of outstanding transactions per processor is important in decreasing packets latency and avoiding network contention.
- Published
- 2001
10. Performance modeling and evaluation of topologies for low-latency SCI systems
- Author
-
Matthew C. Chidester, Alan D. George, and Damian M Gonzalez
- Subjects
Network architecture ,Application programming interface ,Computer Networks and Communications ,Computer science ,Distributed computing ,Network topology ,Packet switching ,Computer engineering ,Artificial Intelligence ,Hardware and Architecture ,Performance engineering ,Communications protocol ,Scalable Coherent Interface ,Software - Abstract
This paper presents an analytical performance characterization and topology comparison from a latency perspective for the scalable coherent interface (SCI). Experimental methods are used to determine constituent latency components and verify the results obtained by these analytical models as close approximations of reality. In contrast with simulative models, analytical SCI models are faster to solve, yielding accurate performance estimates very quickly, and thereby broadening the design space that can be explored. Ultimately, the results obtained here serve to identify optimum topology types for a range of system sizes based on the latency performance of common parallel application demands.
- Published
- 2001
11. SCI evaluation in multinode environments for computing and data-processing applications
- Author
-
Enrique Sanchis, Vicente González, and G. Torralba
- Subjects
Nuclear and High Energy Physics ,Hardware_MEMORYSTRUCTURES ,Windows NT ,Application programming interface ,Computer science ,business.industry ,Message passing ,Electrical engineering ,Pentium ,computer.software_genre ,Supercomputer ,Nuclear Energy and Engineering ,Shared memory ,Operating system ,Programming paradigm ,Electrical and Electronic Engineering ,business ,computer ,Scalable Coherent Interface - Abstract
The need of high-performance low-cost computing systems motivates intense research in this field, and some very fine results have been achieved. By clustering cheap personal computers together, we placed a virtual supercomputer at our disposal. Currently, PC clusters are mainly programmed using message passing programming models. The scalable coherent interface (SCI) interconnect technology opens the possibility for shared memory models. To profit from all the advantages that SCI offers, an application programming interface (API) for SCI-based systems is needed. The SISCI API provides the necessary tools for developing SCI-based systems applications like performance tests and other similar utilities. The paper presents an easy application that uses the SCI API functions and was developed to measure the total bandwidth of a cluster. Four PCs with Pentium II and running under a Windows NT 4.0 server were connected via Dolphin's four-port switch. The application is an example of what this SCI API can offer to any user who wants to develop an SCI-based cluster application without the need of a wide knowledge on the SCI standard.
- Published
- 2001
12. [Untitled]
- Author
-
Hyuk-Chul Kwon, Kwang Ryel Ryu, Hankook Jang, Sang-Hwa Chung, Cham-Ah Choi, and Yoojin Chung
- Subjects
Distributed shared memory ,Information retrieval ,business.industry ,Computer science ,Parallel computing ,Theoretical Computer Science ,Shared memory ,Hardware and Architecture ,Computer cluster ,Node (computer science) ,Cluster (physics) ,The Internet ,business ,Scalable Coherent Interface ,Software ,Information Systems - Abstract
This article presents an efficient parallel information retrieval (IR) system which provides fast information service for the Internet users on low-cost high-performance PC-NOW environment. The IR system is implemented on a PC cluster based on the scalable coherent interface (SCI), a powerful interconnecting mechanism for both shared memory models and message-passing models. In the IR system, the inverted-index file (IIF) is partitioned into pieces using a greedy declustering algorithm and distributed to the cluster nodes to be stored on each node's hard disk. For each incoming user's query with multiple terms, terms are sent to the corresponding nodes which contain the relevant pieces of the IIF to be evaluated in parallel. The IR system is developed using a distributed-shared memory (DSM) programming technique based on the SCI. According to the experiments, the IR system outperforms an MPI-based IR system using Fast Ethernet as an interconnect. Speed-up of up to 5.0 was obtained with an 8-node cluster in processing each query on a 500,000-document IIF.
- Published
- 2001
13. VME crate interconnection through the SCI network in large data acquisition systems
- Author
-
F.J. Mora, A. Sebastia, and Hans Muller
- Subjects
Interconnection ,Computer Networks and Communications ,business.industry ,Computer science ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Crate ,Fibre Channel ,Data acquisition ,Artificial Intelligence ,Hardware and Architecture ,Embedded system ,Conventional PCI ,business ,Scalable Coherent Interface ,Software ,Computer hardware ,VMEbus - Abstract
The possibility of using PCI Mezzanine Cards (PMCs) in VME modules opens new options for interconnecting crates by means of high-speed interconnection networks (Scalable Coherent Interface (SCI), ATM, Fiber Channel, etc.). These new technologies offer major benefits in the construction of large data acquisition networks where the number of VME crates is important. In this paper we explain how to use an SCI to easily connect VME crates and take advantage of the new scalability offered by SCI.
- Published
- 2000
14. Simulative performance analysis of distributed switching fabrics for SCI-based systems
- Author
-
Alan D. George and Mushtaq A. Sarwar
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,Multiprocessing ,Torus ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Network topology ,Artificial Intelligence ,Hardware and Architecture ,Scalability ,Distributed switching ,Discrete event simulation ,Throughput (business) ,Scalable Coherent Interface ,Software - Abstract
This paper presents the results of a simulative performance study on 1D and 2D k -ary n -cube topologies as distributed switching fabrics for the Scalable Coherent Interface (SCI). Case studies are conducted on multiprocessor SCI networks composed of simple rings, counter-rotating rings, unidirectional and bidirectional tori, and tori with rings of uniform size. Based on a novel set of verified high-fidelity models, the results identify fundamental performance characteristics associated with each of these SCI fabrics, and tradeoffs between them, in terms of throughput and latency. Limits on scalable performance from SCI with increase in complexity and dimensionality are clarified, supporting decisions for advanced multiprocessor design.
- Published
- 2000
15. Modeling and Simulative Performance Analysis of SMP and Clustered Computer Architectures
- Author
-
Mark W. Burns, Alan D. George, and Bradley A. Wallace
- Subjects
021103 operations research ,Computer science ,Testbed ,0211 other engineering and technologies ,Parallel algorithm ,Multiprocessing ,02 engineering and technology ,Parallel computing ,Computer Graphics and Computer-Aided Design ,Modeling and Simulation ,0202 electrical engineering, electronic engineering, information engineering ,Systems architecture ,Overhead (computing) ,020201 artificial intelligence & image processing ,Scalable Coherent Interface ,Software ,Cache coherence ,Network model - Abstract
The performance characteristics of several classes of parallel computing systems are analyzed and compared using high-fidelity modeling and execution-driven simulation. Processor, bus, and network models are used to construct and simulate the architectures of symmetric multiprocessors (SMPs), clusters of uniprocessors, and clusters of SMPs. To demonstrate a typical use of the models, the performance of ten systems with one to eight processors and the Scalable Coherent Interface interconnection network is evaluated using a parallel matrix-multiplication algorithm. Because the performance of a parallel algorithm on a specific architecture is dependent upon its communication-to-computation ratio, an analysis of communication latencies for bus transactions, cache coherence, and network transactions is used to quantify the communication overhead of each system. While low-level performance attributes are difficult to measure on experimental testbed systems, and are far less accurate from purely analytical models, with high-fidelity simulative models they can be readily and accurately obtained. This level of detail gives the designer the ability to rapidly prototype and evaluate the performance of parallel and distributed computing systems.
- Published
- 2000
16. Cache Coherency in SCI: Specification and a Sketch of Correctness
- Author
-
Amy P. Felty and Frank Stomp
- Subjects
Correctness ,Theoretical computer science ,Computer science ,Programming language ,CPU cache ,computer.software_genre ,Sketch ,Theoretical Computer Science ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS ,Temporal logic ,Scalable Coherent Interface ,computer ,Formal verification ,Software ,Cache coherence ,Abstraction (linguistics) - Abstract
SCI – Scalable Coherent Interface – is an IEEE standard for specifying communication between multiprocessors in a shared memory model. In this paper we model part of SCI by a program written in a UNITY-like programming language. This part of SCI is formally specified in Manna and Pnueli's Linear Time Temporal Logic (LTL). We give a sketch of our proof that the program satisfies its specification. The proof has been carried out within LTL. It uses history variables. Structuring of the proof has been achieved by careful formulation of lemmata and the use of auxiliary predicates as an abstraction mechanism.
- Published
- 1999
17. The implementation of a high-performance ORB over multiple network transports
- Author
-
Steve Pope and Sai-Lai Lo
- Subjects
GeneralLiterature_INTRODUCTORYANDSURVEY ,Computer Networks and Communications ,business.industry ,Computer science ,Distributed computing ,Shared memory ,Common Object Request Broker Architecture ,Null (SQL) ,ATM Adaptation Layer 5 ,Object request broker ,business ,Protocol (object-oriented programming) ,Scalable Coherent Interface ,Computer network ,Orb (optics) - Abstract
This paper describes the implementation of a high-performance Object Request Broker (ORB) - omniORB2. The discussion focuses on the experience in achieving high performance by exploiting the protocol and other characteristics of the CORBA 2.0 specification. The design is also highly adaptable to a variety of network transports. The results of running the ORB over TCP/IP, shared memory, Scalable Coherent Interface (SCI) and ATM Adaptation Layer 5 (AAL5) are presented. In both null calls and bulk data transfers, the performance of omniORB2 is significantly better than other commercial ORBs.
- Published
- 1999
18. Data acquisition with the SCINET, a scalable-coherent-interface network
- Author
-
Harald Richter and Matthias Ohlenroth
- Subjects
Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,Workstation ,business.industry ,Computer science ,Mechanical Engineering ,Throughput ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,law.invention ,Data acquisition ,Nuclear Energy and Engineering ,law ,Bandwidth (computing) ,General Materials Science ,Latency (engineering) ,business ,Scalable Coherent Interface ,Cache coherence ,Civil and Structural Engineering ,Computer network - Abstract
The goal of the SCINET project is to investigate the applicability of the scalable-coherent-interface (SCI) for data-acquisition systems in large-scale fusion-reactor experiments. SCI is a standardized, high-speed interconnect for peripheral devices, processors, memories, PCs and workstations that provides for a distributed shared memory with optional cache coherence between computing nodes. Up to 64 K SCI nodes can be closely coupled in one or more rings that are concatenated via SCI switches. In SCINET, it is investigated how data-acquisition computers can be efficiently connected with each other and with their sensors and actuators by means of SCI, what topological structure the network should have, and which bandwidth and latency can be expected. Test stands were established as a sample SCI-based data-acquisition system that showed up to 45 MB s -1 of throughput and < 5 μs latency for end-to-end data transfers.
- Published
- 1999
19. A prototype DAQ system for the ALICE experiment based on SCI
- Author
-
S. Polovnikov, H. Rohrig, Bernhard Skaali, D. Wormald, and L. Ingebrigtsen
- Subjects
Nuclear and High Energy Physics ,Engineering ,Hardware_MEMORYSTRUCTURES ,business.industry ,Address space ,Electrical engineering ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Data link ,Data acquisition ,Nuclear Energy and Engineering ,Electrical and Electronic Engineering ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Scalable Coherent Interface ,Computer hardware ,Computer memory ,Data rate units ,VMEbus ,Data transmission - Abstract
A prototype DAQ system for the ALICE/PHOS beam test and commissioning program is presented. The system has been taking data since august '97, and represents one of the first applications of the scalable coherent interface (SCI) as interconnect technology for an operational DAQ system. The front-end VMEbus address space is mapped directly from the DAQ computer memory space through SCI via PCI-SCI bridges. The DAQ computer is a commodity PC running the Linux operating system. The results of measurements of data transfer rate and latency for the PCI-SCI bridges in a PCVMEbus SCI-configuration are presented. An optical SCI link based on the Motorola Optobus I data link is described.
- Published
- 1998
20. The GigaRing channel
- Author
-
Steven L. Scott
- Subjects
ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,business.industry ,Computer science ,Master/slave ,Data_CODINGANDINFORMATIONTHEORY ,Supercomputer ,Application-specific integrated circuit ,Hardware and Architecture ,Scalability ,Electrical and Electronic Engineering ,business ,Scalable Coherent Interface ,Software ,Computer hardware ,Communication channel - Abstract
Cray's GigaRing channel provides flexible intersystem and system-to-peripheral communication for distributed supercomputer environments, sustaining data payload bandwidths on the order of a Gbyte per second.
- Published
- 1996
21. The Scalable Coherent Interface (SCI)
- Author
-
D.B. Gustavson and Qiang Li
- Subjects
Distributed database ,Computer Networks and Communications ,business.industry ,Computer science ,Distributed computing ,Local area network ,Multiprocessing ,Throughput ,Network interface ,Computer Science Applications ,Data acquisition ,Systems architecture ,Bandwidth (computing) ,Electrical and Electronic Engineering ,business ,Scalable Coherent Interface ,Computer network - Abstract
There is rapidly increasing demand for very-high-performance networked communication for workstation clusters, distributed databases, multiprocessors, industrial data acquisition and control systems, shared access to distributed data, and so on. Higher-bandwidth hardware using the traditional protocols is not sufficient. Even at 100 Mb/s, and certainly at 250 Mb/s, throughput for many applications is so limited by delays due to architecturally induced inefficiencies, such as software overheads (often hundreds of microseconds), that higher bandwidth generally raises cost without improving performance. A new approach to communication is required, one that can eliminate the delay due to software overheads, if we are to reap the full benefit of the far higher bandwidths that modern hardware can provide. The SCI solves this problem by using the distributed-shared-memory paradigm, typically offering submicrosecond delays and bandwidths currently in the range of 1250 to 8000 Mb/s per network node. The article first reviews the general properties that an appropriate system architecture should have, and introduces an architectural model, the local area multiprocessor, distinguished by its shared-memory performance and its ability to handle LAN-style distances. These desired properties are then considered in more detail, and practical design decisions are made, illustrated by the evolution of the ISO/ANSI/IEEE standard Scalable Coherent Interface (SCI) as it addressed these issues. Finally, the current status of the various SCI follow-on and support projects is reported.
- Published
- 1996
22. A simulation study of hardware-oriented DSM approaches
- Author
-
Milo Tomasevic, Veljko Milutinovic, and A. Grujic
- Subjects
Speedup ,Computer science ,Distributed computing ,Multiprocessing ,Parallel computing ,Directory ,computer.software_genre ,Theoretical Computer Science ,Non-uniform memory access ,Electrical and Electronic Engineering ,Data diffusion machine ,Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,business.industry ,Cache-only memory architecture ,Uniform memory access ,Supercomputer architecture ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Physical address ,Computational Theory and Mathematics ,Shared memory ,Computer architecture ,Virtual machine ,Scalability ,Distributed memory ,business ,Scalable Coherent Interface ,computer ,Computer hardware ,Cache coherence - Abstract
Representative hardware implementations of the distributed shared memory (DSM) concept are comparatively evaluated in this study-two NUMA (Dash and SCI) and two COMA (KSR1 and DDM) architectures. Analysis was oriented towards the comparison of approaches, rather than their implementations. For that purpose, a hierarchical two-level cluster-based system with a uniform bus-based cluster structure on the first level is assumed. The DSM mechanisms of four approaches were simulated on the second level. A simulation methodology based on synthetic address traces was applied. The comparison was carried out for a large variety and a broad range of system-oriented, application-oriented and technology-oriented parameters. The results have shown the somewhat better efficiency of COMA protocols (because of the dynamic migration of responsibility for shared data) and the large impact of the available interconnection network bandwidth on system scalability (an almost linear speedup is achieved with ring-based systems). >
- Published
- 1996
23. Reliability Modeling and Analysis of SCI Topological Network
- Author
-
Hongzhe Xu, Jun Huang, and Yaoming Zhou
- Subjects
Matrix (mathematics) ,Theoretical computer science ,Computer Networks and Communications ,Computer science ,Reliability (computer networking) ,Shortest path problem ,Monte Carlo method ,Path (graph theory) ,Survivability ,Network topology ,Topology ,Scalable Coherent Interface - Abstract
The problem of reliability modeling on the Scalable Coherent Interface (SCI) rings and topological network is studied. The reliability models of three SCI rings are developed and the factors which influence the reliability of SCI rings are studied. By calculating the shortest path matrix and the path quantity matrix of different types SCI network topology, the communication characteristics of SCI network are obtained. For the situations of the node-damage and edge-damage, the survivability of SCI topological network is studied.
- Published
- 2012
24. Reliability Modeling and Analysis of SCI Rings in Avionics System
- Author
-
Huang Jun, Zhou Yao-ming, and Xu Hong-zhe
- Subjects
Hardware_MEMORYSTRUCTURES ,Interconnection matrix ,Computer science ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Avionics ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Scalable Coherent Interface ,Reliability (statistics) ,Reliability model ,Expression (mathematics) ,Reliability engineering - Abstract
The problem of reliability modeling of the SCI (Scalable Coherent Interface) is studied. A message-based source-destination reliability expression is proposed according to the specification of SCI Interconnection Matrix (IM) in the design of avionics system. The reliability models of three SCI rings are developed and the factors which influence the reliability of SCI rings are studied. Conclusions show that the proposed reliability algorithm is appropriate to the design of the avionics system.
- Published
- 2012
25. Simulations with SCI as a data carrier in data acquisition systems
- Author
-
T.I. Hulaas, T.B. Skaali, E. Rongved, E.H. Kristiansen, and J.W. Bothner
- Subjects
Physics ,Nuclear and High Energy Physics ,business.industry ,Electrical engineering ,Topology (electrical circuits) ,Network topology ,Computational science ,Data acquisition ,Nuclear Energy and Engineering ,Bandwidth (computing) ,Instrumentation (computer programming) ,Electrical and Electronic Engineering ,Latency (engineering) ,business ,Scalable Coherent Interface ,Data transmission - Abstract
Detailed simulations of processor networks based on the Scalable Coherent Interface (SCI) show that SCI is suitable as a data carrier in data acquisition systems where the total bandwidth need is in the multi GBytes/s range and a low latency is required. The objective of these simulations was to find topologies with low latency and high bandwidth, but also with the cost of implementation in mind. A ring-to-ring bridge has been used as the building element for the networks. The simulations have been performed on regular k-ary n-cube type topologies from a few tens of nodes and up to about 500 nodes under different load conditions. Among the parameters which have been manipulated in the simulations are the number of nodes, topology structure, number of outstanding requests and load in the system. >
- Published
- 1994
26. First experience with the scalable coherent interface
- Author
-
A. Bogaerts, D. Samyn, F. Lozano-Alwmany, G. Mugnai, Hans Muller, A. Ivanov, J. Buytaert, Roberto Divia, Bernhard Skaali, and R. Keyser
- Subjects
Nuclear and High Energy Physics ,Hardware_MEMORYSTRUCTURES ,Large Hadron Collider ,Reduced instruction set computing ,Computer science ,business.industry ,Network packet ,Node (networking) ,Electrical engineering ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Computer Science::Hardware Architecture ,Packet switching ,Data acquisition ,Nuclear Energy and Engineering ,Bandwidth (computing) ,Detectors and Experimental Techniques ,Electrical and Electronic Engineering ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Scalable Coherent Interface ,Computer hardware - Abstract
The research project RD24 is studying applications of the scalable coherent interface (IEEE-1596) standard for the Large Hadron Collider (LHC). First SCI node chips from Dolphin were used to demonstrate the use and functioning of SCI's packet protocols and to measure data rates. We present results from a first, two-node SCI ringlet at CERN, based on a R3000 RISC processor node and DMA node on a MC68040 processor bus. A diagnostic link analyzer monitors the SCI packet protocols up to full link bandwidth. In its second phase, RD24 will build a first implementation of a multi-ringlet SCI data merger. >
- Published
- 1994
27. Simulation of SCI/RT data bus
- Author
-
Jing Yin, Nana Zhang, Jian Cui, and Shuguang Zhang
- Subjects
Correctness ,biology ,Computer science ,business.industry ,Real-time computing ,biology.organism_classification ,Communications system ,Ringlet ,Software ,Computer Science::Networking and Internet Architecture ,business ,Throughput (business) ,Scalable Coherent Interface ,Protocol (object-oriented programming) ,System bus - Abstract
The Scalable Coherent interface for real time (SCI/RT) TRAIN protocol was analyzed. Then a simulation model was built based on the OPNET™ software. The functions of model had been described. A 16 nodes ringlet had been simulated and the results prove the correctness of simulation model.
- Published
- 2011
28. Communication performance analysis of Scalable Coherent Interface
- Author
-
Shi Guoqing, Zhang Jiandong, Wu Yong, and Pang Min
- Subjects
Interconnection ,Engineering ,business.industry ,Embedded system ,Throughput ,Avionics ,Latency (engineering) ,Petri net ,business ,Scalable Coherent Interface ,Protocol (object-oriented programming) ,System model - Abstract
In the designing process of the integrated avionics system, the bus system performance is closely related to the overall index of the entire avionics system. Scalable Coherent Interface (SCI) possesses some advantages, such as low latency and high bandwidth, and it can also meet the interconnection requirements within a system and between systems, which makes it applicable to various key operations. This paper researches the SCI multi-ring simulation problem. It starts from the SCI protocol, and uses Petri nets as a tool to establish an exchange unit model and the system model. At last it is found that the simulating and solving of the system model can derive the relationship between system throughput and delay time.
- Published
- 2011
29. A High Performance Computing Platform Based on a Fibre-Channel Token-Routing Switch-Network
- Author
-
Jichang Kang, Wenlang Luo, Manhua Li, Cunjuan Ou Yang, and Xiaohui Zeng
- Subjects
Engineering ,Fibre Channel ,MPICH ,business.product_category ,business.industry ,Embedded system ,Message Passing Interface ,Network switch ,Routing (electronic design automation) ,Supercomputer ,business ,Communications protocol ,Scalable Coherent Interface - Abstract
A high cost-effective Fibre-Channel Token-Routing Switch-Network is designed and developed for the high performance computing fields, and a low-level communication library based on our user-level communication protocol FC-VIA, is appended into the most commonly used MPICH. Thus FC-VIA-MPI (a new extended MPICH) comes into being, and the experiment results show that the FC-VIA-MPI running on Fibre-Channel Token-Routing Switch-Network has achieved better performance than the ScaMPI running on the SCI network.
- Published
- 2010
30. The Scalable Coherent Interface and related standards projects
- Author
-
D.B. Gustavson
- Subjects
Hardware_MEMORYSTRUCTURES ,Optical fiber ,Computer science ,business.industry ,Network packet ,Distributed computing ,Message passing ,Multiprocessing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,law.invention ,Hardware and Architecture ,law ,Electrical and Electronic Engineering ,business ,Scalable Coherent Interface ,Software ,Cache coherence ,Computer network - Abstract
The Scalable Coherent Interface (SCI) (IEEE P1596), which provides bus services by transmitting packets on a collection of point-to-point unidirectional links, is described. Its protocols support cache coherence in a distributed shared-memory multiprocessor model, with message passing, I/O, and LAN communication taking place over fiber optic or wire links. Several ongoing SCI-related projects that apply the SCI technology to new areas or extend it to more difficult problems are also described. Future plans are sketched. >
- Published
- 1992
31. SCI network emulation tool for the development of parallel DAQ software
- Author
-
F.J. Mora and A. Sebastia
- Subjects
Nuclear and High Energy Physics ,Emulation ,Computer science ,business.industry ,Local area network ,Network emulation ,Virtualization ,computer.software_genre ,Application software ,Software ,Nuclear Energy and Engineering ,Computer engineering ,Computer architecture ,Electrical and Electronic Engineering ,Architecture ,business ,computer ,Scalable Coherent Interface - Abstract
The goal of the project is to provide an environment for facilitating the development of parallel applications over SCI (Scalable Coherent Interface, IEEE Std.1596) networks. To achieve this milestone we believe it is important to write applications that run independently of any particular underlying architecture. This is what we call architecture virtualisation.
- Published
- 2000
32. Implementation and analysis of nonblocking collective operations on SCI networks
- Author
-
Thomas Bemmerl, Christian Kaiser, Boris Bierbaum, and Torsten Hoefler
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Kernel (statistics) ,Distributed computing ,Message passing ,InfiniBand ,Software_PROGRAMMINGTECHNIQUES ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,business ,Scalable Coherent Interface ,Blocking (computing) ,Computer network - Abstract
Nonblocking collective communication operations are currently being considered for inclusion into the MPI standard and are an area of active research. The benefits of such operations are documented by several recent publications, but so far, research concentrates on InfiniBand clusters. This paper describes an implementation of nonblocking collectives for clusters with the Scalable Coherent Interface (SCI) interconnect. We use synthetic and application kernel benchmarks to show that with nonblocking functions for collective communication performance enhancements can be achieved on SCI systems. Our results indicate that for the implementation of these nonblocking collectives data transfer methods other than those usually used for the blocking version should be considered to realize such improvements.
- Published
- 2009
33. Distributed-directory scheme: scalable coherent interface
- Author
-
A.T. Laundrie, Stein Gjessing, D.V. James, and Gurindar S. Sohi
- Subjects
Scheme (programming language) ,Interconnection ,Hardware_MEMORYSTRUCTURES ,General Computer Science ,Computer science ,Interface (computing) ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Data structure ,Backplane ,Scalability ,Synchronization (computer science) ,computer ,Scalable Coherent Interface ,computer.programming_language - Abstract
The scalable coherent interface (SCI), a local or extended computer backplane interface being defined by an IEEE standard project (P1596), is discussed. the interconnection is scalable, meaning that up to 64 K processor, memory, or I/O nodes can effectively interface to a shared SCI interconnection. The SCI sharing-list structures are described, and sharing-list addition and removal are examined. Optimizations being considered to improve the performance of large system configurations are discussed. Request combining, a useful feature of linked-list coherence, is described. SCI's optional extensions, including synchronization using a queued-on-lock bit, are considered. >
- Published
- 1990
34. The Scalable Coherent Interface, IEEE P 1596, status and possible applications to data acquisition and physics
- Author
-
D.B. Gustavson
- Subjects
Nuclear and High Energy Physics ,Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,biology ,Futurebus ,Bandwidth (signal processing) ,Multiprocessing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Banyan ,biology.organism_classification ,Data acquisition ,Nuclear Energy and Engineering ,Computer engineering ,Computer architecture ,Electrical and Electronic Engineering ,Crossbar switch ,Scalable Coherent Interface - Abstract
IEEE P 1596, the Scalable Coherent Interface (SCI) (formerly known as SuperBus), is based on experience gained while developing Fastbus (ANSI/IEEE 960-1986, IEC 935), Futurebus (IEEE P896.x), and other modern 32-b buses. SCI goals include a minimum bandwidth of 1 GB/s per processor in multiprocessor systems with thousands of processors; efficient support of a coherent distributed-cache image of distributed shared memory; support for repeaters which interface to existing or future buses; and support for inexpensive small rings as well as for general switched interconnections like Banyan, Omega, or crossbar networks. Current directions are summarized, and the status of work in progress is reported. Some applications in data acquisition and physics are suggested. >
- Published
- 1990
35. Shared Memory Multiprocessors
- Author
-
Li Zhang and Manish Parashar
- Subjects
Single system image ,Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,Shared memory ,Address space ,Computer science ,Message passing ,Multiprocessing ,Parallel computing ,Scalable Coherent Interface ,Cache coherence - Abstract
Shared memory multiprocessors are multiprocessor systems that logically implement a single global address space. The model for parallel programming based on such systems, the shared address space model, is straightforward and frees programmers from the tedious and sometimes complicated task of orchestrating all communication and snychronization through explicit message passing to access remote data. As a result, this class of multiprocessor systems has received much commercial as well as research interest. Keywords: cache coherence; physically shared memory; distributed shared memory; scalable coherent interface; single system image; code coupling
- Published
- 2007
36. Efficient Java Communication Protocols on High-speed Cluster Interconnects
- Author
-
Ramón Doallo, Juan Touriño, and Guillermo L. Taboada
- Subjects
Middleware ,Java ,Computer science ,business.industry ,strictfp ,Testbed ,Performance analysis ,Testing ,Local area network ,Access protocols ,Libraries ,Multicast protocols ,Sockets ,Transport protocols ,business ,Communications protocol ,computer ,Protocol (object-oriented programming) ,Scalable Coherent Interface ,Optimizing compilers ,computer.programming_language ,Computer network - Abstract
This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/LCN.2006.322110 [Abstract] This paper presents communication strategies for achieving efficient parallel and distributed Java applications on clusters with high-speed interconnects. Communication performance is critical for the overall cluster performance. Previous efforts at obtaining efficient Java communications have a limited applicability on high-speed interconnects as they are focused on high level APIs like RMI, ignoring the particularities of these systems and their native high performance communication protocols. By relying on a custom Java socket implementation higher degrees of performance can be achieved exploiting high-speed interconnect facilities. Several protocol definitions are presented, looking for obtaining high performance Java communications. Moreover, the quality of the protocol implementations and their design decisions has been thoroughly evaluated on a scalable coherent interface (SCI) and gigabit Ethernet (GbE) testbed cluster. The results of this analysis have demonstrated that these Java protocols obtain similar results to native communications Ministerio de Educacion y Ciencia; TIN2004-07797-C02 Ministerio de Educación y Ciencia; FPU AP2004-5984
- Published
- 2007
37. High Performance Java Remote Method Invocation for Parallel Computing on Clusters
- Author
-
Juan Touriño, Guillermo L. Taboada, and Carlos Teijeiro
- Subjects
Source code ,Java ,Computer science ,media_common.quotation_subject ,Embedded Java ,Libraries ,Parallel computing ,Software_PROGRAMMINGTECHNIQUES ,computer.software_genre ,Sockets ,TCPIP ,Real time Java ,Parallel processing ,Emulation ,High-speed networks ,interconnection ,media_common ,computer.programming_language ,Middleware ,strictfp ,Java concurrency ,Java API for XML-based RPC ,Remote procedure call ,LAN ,Operating system ,Scalable Coherent Interface ,computer ,Protocols - Abstract
This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ISCC.2007.4381536 [Abstract] This paper presents a more efficient Java remote method invocation (RMI) implementation for high-speed clusters. The use of Java for parallel programming on clusters is limited by the lack of efficient communication middleware and high-speed cluster interconnect support. This implementation overcomes these limitations through a more efficient Java RMI protocol based on several basic assumptions on clusters. Moreover, the use of a high performance sockets library provides with direct high-speed interconnect support. The performance evaluation of this middleware on a gigabit Ethernet (GbE) and a scalable coherent interface (SCI) cluster shows experimental evidence of throughput increase. Moreover, qualitative aspects of the solution such as transparency to the user, interoperability with other systems and no need of source code modification can augment the performance of existing parallel Java codes and boost the development of new high performance Java RMI applications. Ministerio de Education y Ciencia; TIN2004-07797-C02 Xunta de Galicia; PGIDIT06PXIB105228PR.
- Published
- 2007
38. High Performance Java Sockets for Parallel Computing on Clusters
- Author
-
Juan Touriño, Ramón Doallo, and Guillermo L. Taboada
- Subjects
Source code ,Java ,Computer science ,media_common.quotation_subject ,strictfp ,Embedded Java ,Parallel computing ,computer.software_genre ,Java concurrency ,Real time Java ,Remote procedure call ,Computer cluster ,Middleware (distributed applications) ,Middleware ,Operating system ,computer ,Scalable Coherent Interface ,computer.programming_language ,media_common - Abstract
The use of Java for parallel programming on clusters relies on the need of efficient communication middleware and high-speed cluster interconnect support. Nevertheless, currently there are no solutions that fully fulfill these issues. In this paper, a Java sockets library has been tailored to increase the efficiency of Java parallel applications on clusters. This library supports high-speed cluster interconnects and its API has been extended to meet the requirements of a high performance Java RMI implementation and Java parallel applications on clusters. Thus, it provides Java with a more efficient communication middleware on clusters. The performance evaluation of this middleware on a Gigabit Ethernet (GbE) and a scalable coherent interface (SCI) cluster has shown experimental evidence of throughput increase. Moreover, qualitative aspects of the solution such as transparency to the user, interoperability with other systems and no need of source code modifications are decisive to boost the performance of existing Java parallel applications and their developments in high performance Java cluster computing.
- Published
- 2007
39. Non-blocking Java Communications Support on Clusters
- Author
-
Juan Touriño, Guillermo L. Taboada, and Ramón Doallo
- Subjects
Java ,Computer science ,business.industry ,Distributed computing ,strictfp ,Non-blocking I/O ,Message passing ,Communication Library ,computer.software_genre ,Java concurrency ,Real time Java ,Virtual machine ,Network Fabric ,Scalability ,Communication overhead ,Overhead (computing) ,business ,computer ,Scalable Coherent Interface ,Startup Time ,Message size ,computer.programming_language ,Computer network - Abstract
This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science. The final authenticated version is available online at: https://doi.org/10.1007/11846802_38 [Abstract] This paper presents communication strategies for supporting efficient non-blocking Java communication on clusters. The communication performance is critical for the overall cluster performance. It is possible to use non-blocking communications to reduce the communication overhead. Previous efforts to efficiently support non-blocking communication in Java have led to the introduction of the Java NIO API. Although the Java NIO package addresses scalability issues by providing select() like functionality, it lacks support for high speed interconnects. To solve this issue, this paper introduces a non-blocking communication library to efficiently support specialized communication hardware. This library focuses on reducing the startup communication time, avoiding unnecessary copying, and overlapping computation with communication. This project provides the basis for a Java Message-passing library to be implemented on top of it. Towards the end, this paper evaluates the proposed approach on a Scalable Coherent Interface (SCI) and Gigabit Ethernet (GbE) testbed cluster. Experimental results show that the proposed library reduces the communication overhead and increases computation and communication overlapping. Ministerio de Educación y Ciencia; TIN2004-07797-C02 Mnisterio de Educación y Ciencia; AP2004-5984.
- Published
- 2006
40. A receiver for stardard IEEE 1596-1992 scalable coherent interface
- Author
-
M. Fedeli and C. Vacchi
- Subjects
Engineering ,Hysteresis ,CMOS ,business.industry ,Electronic engineering ,Electrical engineering ,Dissipation ,business ,Scalable Coherent Interface ,Electronic circuit ,Voltage - Published
- 2005
41. SCI based Data Acquisition Architectures
- Author
-
J F Renardy, J. A. Bogaerts, H. Muller, and Roberto Divia
- Subjects
Nuclear and High Energy Physics ,Interconnection ,Large Hadron Collider ,Physics::Instrumentation and Detectors ,Computer science ,Interface (computing) ,Multiprocessing ,Computational science ,Data acquisition ,Nuclear Energy and Engineering ,Cyclic redundancy check ,Bandwidth (computing) ,Electronic engineering ,Detectors and Experimental Techniques ,Electrical and Electronic Engineering ,Scalable Coherent Interface - Abstract
The scalable coherent interface (SCI) is an IEEE proposed standard (P1596) for interconnecting multiprocessor systems. The standard defines point-to-point connections between nodes, which can be processors, memories, or I/O devices. Networks containing a maximum of 64 K nodes with a bandwidth of 1 Gbyte/s between nodes can be constructed. SCI is an attractive candidate to serve as a backbone for high-speed, large-volume data acquisition systems such as those required by future experiments at the proposed Large Hadron Collider (LHC) at CERN. First results for a model of a large LHC experiment containing over 1000 nodes are reported. >
- Published
- 2005
42. Designing Efficient Java Communications on Clusters
- Author
-
Ramón Doallo, Guillermo L. Taboada, and Juan Touriño
- Subjects
Java ,Computer science ,Distributed computing ,strictfp ,Embedded Java ,Message passing ,Software_PROGRAMMINGTECHNIQUES ,computer.software_genre ,Java concurrency ,Real time Java ,Operating system ,computer ,Scalable Coherent Interface ,Implementation ,computer.programming_language - Abstract
This paper aims at designing communication strategies for parallel and distributed Java applications to obtain higher degrees of performance on clusters. Several specific approaches exist to increase the efficiency of Java communications, specially of high level APIs like RMI, although their applicability is relatively limited on clusters, since the development of high performance solutions on clusters usually involves the use of the basic Java Socket interface. This paper examines the current outlook of Java Socket optimisations involving both native and Java side issues in order to make a design proposal named Java Fast Sockets. We have accomplished a thorough analysis of the effects of the suggested configurations and implementations on our scalable coherent interface (SCI) testbed cluster. This evaluation has demonstrated that Java communication performance on clusters can compete with native performance.
- Published
- 2005
43. SCI networking for shared-memory computing in UPC: blueprints of the GASNet SCI conduit
- Author
-
Alan D. George, B. Gordon, Sarp Oral, and H. Su
- Subjects
business.industry ,Computer science ,Message passing ,Local area network ,computer.software_genre ,Communications system ,Software ,Shared memory ,Unified Parallel C ,Programming paradigm ,Operating system ,business ,Scalable Coherent Interface ,computer ,Computer network ,computer.programming_language - Abstract
Unified Parallel C (UPC) is a programming model for shared-memory parallel computing on shared- and distributed-memory systems. The Berkeley UPC software, which operates on top of their Global Addressing Space Networking (GASNet) communication system, is a portable, high-performance implementation of UPC for large-scale clusters. The Scalable Coherent Interface (SCI), a torus-based system-area network (SAN), is known for its ability to provide very low latency transfers as well as its direct support for both shared-memory and message-passing communications. High-speed clusters constructed around SCI promise to he a potent platform for large-scale UPC applications. This work introduces the design of the core API for the new SCI conduit for GASNet and UPC, which is based on active messages (AM). Latency and bandwidth data were collected and are compared with raw SCI results and with other existing GASNet conduits. The outcome shows that the new GASNet SCI conduit is able to provide promising performance in support of UPC applications.
- Published
- 2005
44. Using OPNET to evaluate SCI as an avionics real-time network
- Author
-
Xu Yanjing, Xiong Huagang, Shao Dingrong, Jiang Zhen, and Deng Yiming
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,business.industry ,Embedded system ,Node (networking) ,Network interface ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Avionics ,Real time networks ,business ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,Scalable Coherent Interface ,Computer network - Abstract
In this paper, we evaluate scalable coherent interface (SCI) as avionics real-time network. An OPNET based SCI node model is established. The results of our preliminary experiments show that a certain utilization of network load should be maintained in order to guarantee the timely delivery of real-time messages in avionics system where basic SCI ring is adopted.
- Published
- 2004
45. Pipelining and overlapping for mpi collective operations
- Author
-
J. Worringen
- Subjects
Tree (data structure) ,Software portability ,Computer science ,Distributed computing ,Factor (programming language) ,Pipeline (computing) ,Message passing ,Message Passing Interface ,Programming paradigm ,Parallel computing ,Scalable Coherent Interface ,computer ,computer.programming_language - Abstract
Collective operations are an important aspect of the currently most important message-passing programming model MPI (message passing interface). Many MPI applications make heavy use of collective operations. Collective operations involve the active participation of a known group of processes and are usually implemented on top of MPI point-to-point message passing. Many optimizations of the used communication algorithms have been developed, but the vast majority of those optimizations is still based on plain MPI point-to-point message passing. While this has the advantage of portability, it often does not allow for full exploitation of the underlying interconnection network. In this paper, we present a low-level, pipeline-based optimization of one-to-many and many-to-one collective operations for the SCI (scalable coherent interface) interconnection network. The optimizations increase the performance of some operations by a factor of four if compared with the generic, tree-based algorithms.
- Published
- 2004
46. Extending FPGA based teaching boards into the area of distributed memory multiprocessors
- Author
-
Michael Manzke and Ross Brennan
- Subjects
Non-uniform memory access ,Distributed shared memory ,Hardware_MEMORYSTRUCTURES ,Computer architecture ,Shared memory ,Computer science ,Cache-only memory architecture ,Uniform memory access ,Distributed memory ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Scalable Coherent Interface ,Reconfigurable computing - Abstract
Reconfigurable hardware, in conjunction with soft-CPUs, has increasingly established itself in computer architecture education. In this paper we expand this approach into the area of distributed memory multi-processor systems.Arguments that supported the introduction of reconfigurable hardware as a substitute for commodity CPUs on educational computer architecture boards are equally applicable to teaching hardware that facilitates the construction and configuration of multiprocessor systems.The IEEE Standard for the Scalable Coherent Interface (SCI) was chosen as the interconnect technology because it enables the demonstration of the most important architecture concepts in this context. This interconnect exhibits high bandwidth and low latencies and not only specifies a hardware Distributed Shared Memory (DSM) architecture, but also defines cache coherence protocols. Consequently an implementation of this standard allows the design of Non-Uniform Memory Access (NUMA) and cache-coherent NUMA (ccNUMA) multiprocessor systems.
- Published
- 2004
47. Performance Evaluation of Gigabit Ethernet and SCI in a Linux Cluster
- Author
-
Digamber Sonvane and Rajesh Kalmady
- Subjects
Ethernet ,Interconnection ,Network interface controller ,Computer science ,Computer cluster ,Gigabit Ethernet ,Operating system ,Local area network ,Jumbo frame ,computer.software_genre ,computer ,Scalable Coherent Interface ,Carrier Ethernet - Abstract
Clusters are now one of the most preferred architectures for building high performance computing systems The emergence of high speed commodity microprocessors, network technologies and Open Source operating systems have propelled the cluster concept to an unparalleled high Even though most clusters nowadays use LAN technologies such as Fast and Gigabit Ethernet as the interconnect, there is a growing breed of new interconnection technologies called SAN (System Area Network) specifically designed for HPC These new technologies boast characteristics such as high bandwidth, low latency for communications and scalability to large number of nodes that are so essential for most HPC applications In this paper, we compare the performance of Gigabit Ethernet (LAN), and Scalable Coherent Interface (SAN) on a 128-processor Linux cluster We present the raw bandwidth and latency figures of the two networks and then discuss the performance of several benchmark programs.
- Published
- 2004
48. A comparative throughput analysis of Scalable Coherent Interface and Myrinet
- Author
-
S. Millich, Alan D. George, and Sarp Oral
- Subjects
Interconnection ,Computer science ,business.industry ,Key (cryptography) ,Bandwidth (computing) ,Topology (electrical circuits) ,Myrinet ,Network topology ,business ,Scalable Coherent Interface ,Throughput (business) ,Computer network - Abstract
It has become increasingly popular to construct large parallel computers by connecting many inexpensive nodes built with commercial-off-the-shelf (COTS) parts. These clusters can be built at a much lower cost than traditional supercomputers of comparable performance. A key decision that will greatly affect the overall performance of the cluster is the method used to connect the nodes together. Choosing the best interconnect and topology is not at all trivial since performance and cost will change as the system size is scaled. This paper presents throughput models used for the analysis and comparison of performance in two leading system area networks (SAN), Myrinet and Scalable Coherent Interface (SCI). First, analytical models for throughput are developed by determining the theoretical bandwidth of all internal buses and links that are part of the interconnect architecture. Then, experiments are conducted to measure the actual bandwidth available at each of these components, and the models are calibrated so they accurately represent the experimental results. Finally, the models are used to compare the maximum throughput of Myrinet and SCI systems with respect to system size and overall dollar cost.
- Published
- 2003
49. Boosting the performance of electromagnetic simulations on a PC-cluster
- Author
-
Wolfgang Karl, Martin Schulz, and Carsten Trinitis
- Subjects
Electromagnetic field ,Ethernet ,Interconnection ,Boosting (machine learning) ,business.industry ,Computer science ,High voltage ,Hardware_PERFORMANCEANDRELIABILITY ,Computer cluster ,Embedded system ,Hardware_INTEGRATEDCIRCUITS ,Electronic engineering ,business ,Engineering design process ,Scalable Coherent Interface - Abstract
One of the crucial aspects in the design process of high voltage apparatus is the precise simulation of the electrostatic and/or electromagnetic field distribution in 3D domains. This paper summarizes the results obtained on the PC cluster platform installed at ABB Corporate Research using POLOPT a state-of-the-art parallel simulation environment for both electrostatic and electromagnetic problems. The experiments where conducted on a LINUX cluster using both Ethernet and scalable coherent interface (SCI)-based interconnection technologies to allow a study of the impact of high-speed networking technology. For both electrostatic and electromagnetic field simulations in practical high voltage engineering, a high efficiency was obtained in this commodity environment. In addition, the results show that some codes, in this case the electromagnetic solver, are highly sensitive to the communication performance and hence can significantly benefit from the presence of fast interconnection mechanisms, as with the given SCI.
- Published
- 2003
50. Using idle disks in a cluster as a high-performance storage system
- Author
-
J.S. Hansen and R. Lachaize
- Subjects
business.industry ,Computer science ,computer.software_genre ,Idle ,Storage area network ,Server ,Computer data storage ,Node (computer science) ,Data_FILES ,Cluster (physics) ,Operating system ,Overhead (computing) ,business ,Distributed File System ,Scalable Coherent Interface ,computer ,Computer network - Abstract
In many clusters today, the local disks of a node are only used sporadically. This paper describes the software support for sharing of disks in clusters, where the disks are distributed across the nodes in the cluster, thereby allowing them to be combined into a high-performance storage system. Compared to centralized storage servers, such an architecture allows the total I/O capacity of the cluster to scale up with the number of nodes and disks. Additionally, our software allows customizing the functionality of the remote disk access using a library of code modules. A prototype has been implemented on a cluster connected by a Scalable Coherent Interface (SCI) network and performance measurements using both raw device access and a distributed file system show that the performance is comparable to dedicated storage systems and that the overhead of the framework is moderate even during high load. Thus, the prospects are that clusters sharing disks distributed among the nodes will allow both the application processing power and total I/O capacity of the cluster to scale up with the number of nodes.
- Published
- 2003
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.