Author: "Yeh-Ching Chung" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

1. Profile-Guided optimization for Function Reordering: A Reinforcement Learning Approach

Author: Weibin Chen and Yeh-Ching Chung
Published: 2022
Full Text: View/download PDF

2. EduBloud: A Blockchain-based Education Cloud

Author: Hongbo Zhang, Guiyan Wang, Yeh-Ching Chung, Bowen Xiao, and Wei Cai
Subjects: Blockchain, Computer science, business.industry, Reliability (computer networking), Throughput, Cloud computing, Computer security, computer.software_genre, Critical infrastructure, Market fragmentation, Isolation (database systems), business, computer, Implementation
Abstract: Cloud service for education purpose is the critical infrastructure for the smart campus. The state-of-the-art education clouds are usually developed and maintained by individual schools. The isolation nature makes these data being tampered easily, which leads to malicious tampering and information fragmentation. The blockchain is a perfect technology to address these issues. By leveraging the advantages of the public blockchain, consortium blockchain, and private blockchain, we propose EduBloud, a heterogeneous blockchain system empowered education cloud. The system showed higher reliability, lower latency, higher data throughput, and better economic efficiency than homogeneous blockchain implementations.
Published: 2019
Full Text: View/download PDF

3. An IoT-Based Cloud-Fog Computing Platform for Creative Service Process

Author: Terng-Yin Hsu, Hongji Yang, Tse-Chuan Hsu, and Yeh-Ching Chung
Subjects: Service (systems architecture), Multimedia, business.industry, Process (engineering), Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, computer.software_genre, Service process, Fog computing, Component (UML), 0202 electrical engineering, electronic engineering, information engineering, Internet of Things, business, computer, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: The creative service process is an innovative process that uses newly developed technologies to improve the service models currently used. In this paper, we proposed an IoT-based Cloud-Fog computing platform for the creative service process. The proposed Cloud-Fog computing platform is a distributed computing platform where compute, storage, and network systems of Cloud and Fog are working independently or collaboratively depending on the services performed by the platform. If a service requires resources either from Cloud or Fog, then only the designated Cloud or Fog will be involved. Otherwise, both Cloud and Fog will be collaborated to each other for the service. During the collaboration, the required data may be back and forth in between Cloud and Fog. To demonstrate the usage of the proposed platform for the creative service process, two service examples, smart classroom and city surveillance, are discussed
Published: 2017
Full Text: View/download PDF

4. Application-Aware Traffic Redirection: A Mobile Edge Computing Implementation Toward Future 5G Networks

Author: Bing-Liang Chen, Yeh-Ching Chung, Yu-Cing Luo, Jerry Chou, and Shih-Chun Huang
Subjects: Mobile edge computing, Edge device, business.industry, Computer science, Network packet, Cellular network, Cloud computing, Mobile telephony, business, 5G, Edge computing, Computer network
Abstract: With the development of network technology, there are billions of devices accessing resources and services on the cloud through mobile telecommunication network. A great number of connections and data packets must be handled by the mobile network. It not only consumes the limited spectrum resources and network bandwidth, but also reduces the service quality of applications. To alleviate the problem, the concept of Mobile Edge Computing (MEC) has been proposed by European Telecommunications Standard Institute (ETSI) in 2014. MEC suggests to provide IT and cloud computing capabilities at the network edge to offer low-latency and high-bandwidth service. The architecture and the benefits of MEC have been discussed in many recent literature. But the implementation of underlying network is rarely discussed or evaluated in practice. In this paper, we present our prototype implementation of a MEC platform by developing an application-aware traffic redirection mechanism at edge network to reduce service latency and network bandwidth consumption. Our implementation is based on OAI, an open source project of 5G SoftRAN cellular system. To the best of our knowledge, it is also one of the few MEC solutions that have been built for 5G networks in practice.
Published: 2017
Full Text: View/download PDF

5. A Dynamic Module Deployment Framework for M2M Platforms

Author: Shih-Chun Huang, Jerry Chou, Yu-Cing Luo, Bing-Liang Chen, and Yeh-Ching Chung
Subjects: Network congestion, Service (systems architecture), Resource (project management), Access network, Software deployment, Computer science, business.industry, Server, Distributed computing, Cloud computing, Reuse, business
Abstract: IoT applications are built on top of M2M platforms which facilitate the communication infrastructure among devices and to the clouds. Because of increasing M2M communication traffic and limited edge network bandwidth, it has become a crucial problem of M2M platform to prevent network congestion and service delay. A general approach is to deploy IoT service modules in M2M platform, so that data can be pre-processed and reduced before transmitting over the networks. Moreover, the service modules often need to be deployed dynamically at various locations of M2M platform to accommodate the mobility of devices moving across access networks, and the on-demand service requirement from users. However, existing M2M platforms have limited support to deployment dynamically and automatically. Therefore, the objective of our work is to build a dynamic module deployment framework in M2M platform to manage and optimize module deployment automatically according to user service requirements. We achieved the goal by implementing a solution that integrates a OSGi-based Application Framework(Kura), with a M2M platform(OM2M). By exploiting the resource reuse method in OSGi specification, we were able to reduce the module deployment time by 50~52%. Finally, a computation efficient and near-optimal algorithm was proposed to optimize the the module placement decision in our framework.
Published: 2017
Full Text: View/download PDF

6. File placement mechanisms for improving write throughputs of cloud storage services based on Ceph and HDFS

Author: Yeh-Ching Chung, Tse-Chuan Hsu, Hongji Yang, and Chun-Feng Wu
Subjects: Computer science, Computer file, Stub file, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, File Control Block, Self-certifying File System, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Versioning file system, 020201 artificial intelligence & image processing, Cloud storage, computer, Flash file system, File system fragmentation
Abstract: Cloud storage services are pervasive nowadays. Many cloud storage services use distributed file systems as their backend storage systems. Some research results on file-size distribution of file systems show that file systems contain lots of small files. Therefore, this paper proposed a hybrid distributed file system based on Ceph and HDFS that can deliver satisfactory write throughputs for a cloud storage system with 80–90% small files and 10–20% large files. The experimental results show that the file allocation mechanism without RAM disk caching can improve the write throughputs of Ceph and HDFS by approximately 10% to 50%. While the one with RAM disk caching can have up to 200% write throughputs improvement.
Published: 2017
Full Text: View/download PDF

7. Byzantine Fault Tolerant Optimization in Federated Cloud Computing

Author: Ching-Hsien Hsu, Mahdis Moradi, Yeh-Ching Chung, Jerry Chou, and Hojjat Baghban
Subjects: business.industry, Computer science, Distributed computing, 05 social sciences, 050301 education, Information technology, 020207 software engineering, Fault tolerance, Cloud computing, 02 engineering and technology, Quantum Byzantine agreement, Software, Cloud testing, Server, 0202 electrical engineering, electronic engineering, information engineering, business, 0503 education, Byzantine fault tolerance, Computer network
Abstract: Cloud computing is a forthcoming revolution in information technology industry because of its performance, accessibility and, low cost. It is appropriate to maximize the capacity or step up capabilities vigorously without investing in new infrastructure, nurturing new personnel or licensing new software. The federated cloud, which is the combination of more than one cloud, is the next logical step after hybrid cloud and there are many indicators that are showing more requirements for such a model. Security is the challenging issue in all cloud infrastructures such as single cloud and federated cloud. And, it is significant in distributed systems which have highly fault tolerant. One of the algorithms for this issue is Byzantine Fault Tolerant. This paper introduces a new method, that optimizes the Byzantine Fault Tolerant and decrease the latency, and detecting the number of faults.
Published: 2016
Full Text: View/download PDF

8. DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers

Author: Moïse W. Convolbo, Yeh-Ching Chung, Shihyu Lu, and Jerry Chou
Subjects: Job scheduler, Distributed database, business.industry, Computer science, Distributed computing, Real-time computing, Big data, 020206 networking & telecommunications, Cloud computing, Workload, 02 engineering and technology, computer.software_genre, Replication (computing), Data modeling, Scheduling (computing), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Data center, business, computer
Abstract: Driven by the trends of BigData and Cloud computing, there is a growing demand for processing and analyzing data that are generated and stored across geo-distributed data centers. However, due to the limited network bandwidth between data centers and the growing data volume spread across different locations, it has become increasingly inefficient to aggregate data and to perform computations at a single data center. An approach that has been commonly used by data-intensive cluster computation systems, like Hadoop, is to distribute computations based on data locality so that data can be processed locally to reduce the network overhead and improve performance. But limited work has been done to adapt and evaluate such technique for geo-distributed data centers. In this paper, we proposed DRASH (Data-Replication Aware Scheduler), a job scheduling algorithm that enforces data locality to prevent data transfer, and exploits data replications to improve overall system performance. Our evaluation using simulations with realistic workload traces shows that DRASH can outperform other existing approaches by 16% to 60% in average job completion time, and achieve greater improvements under higher data replication factors.
Published: 2016
Full Text: View/download PDF

9. PROAR: A Weak Consistency Model for Ceph

Author: Yongwei Wu, Yeh-Ching Chung, and Jiayuan Zhang
Subjects: 020203 distributed computing, Weak consistency, Computer science, business.industry, Distributed computing, Hash function, Consistency model, 020206 networking & telecommunications, 02 engineering and technology, Commit, Replication (computing), Object storage, Computer data storage, Node (computer science), 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Latency (engineering), business
Abstract: The primary-copy consistency model used in Ceph cannot satisfy the low latency requirement of write operation required by users. In this paper, we propose a weak consistency model, PROAR, based on a distributed hash ring mechanism to allow clients to only commit data to the primary node and synchronize data to replication nodes asynchronously in Ceph. Based on the distributed hash ring mechanism, the low latency requirement of write operation can be met. In addition, the workload of the primary node can be reduced while that of replication nodes can be more balanced. We have evaluated the proposed scheme on a Ceph storage system with 3 storage nodes. The experimental results show that PROAR can reduce about 50% write overhead compared to that of Ceph and has a more balanced workload around all the replication nodes.
Published: 2016
Full Text: View/download PDF

10. Modeling of Resource Granularity and Utilization with Virtual Machine Splitting

Author: Jerry Chou, Yeh-Ching Chung, Hojjat Baghban, and Ching-Hsien Hsu
Subjects: 020203 distributed computing, Computer science, Full virtualization, business.industry, Distributed computing, Temporal isolation among virtual machines, Cloud computing, 02 engineering and technology, Virtualization, computer.software_genre, Resource (project management), Virtual machine, Server, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data center, business, computer
Abstract: The increasing trend in IT users and their needs for computational power in cloud data centers leads to noticeable growth in physical servers. It is a challenging issue which causes the dramatic burden of power consumption and the number of Physical machines. Virtualization is remarkable method for reducing the number of physical servers with appropriate processing performance and utilization. But, it is worth saying that the fulfilling the resource utilization is still one of the significant challenging issue, especially in in data centers environment. Actually, there are some applications situated on a large single virtual machine. One way to guarantee the reasonable physical server utilization is to let the application to be split and hosted on smaller virtual machines with the sufficient computational power. Although exploiting multiple small virtual machine instead of one large virtual machine benefits appropriate physical resources utilization and reducing the number of turn on physical machine, it is sustained penalty in terms of demanding extra resources due to map the applications on new virtual machines. However, existing research have not clarified precisely the reason in terms of that the data center is sustained extra resources and computational power overhead due to splitting the original application and exploiting more smaller virtual machines provided to preserve the criteria of the original application on the large virtual machine. This paper demonstrates through mathematical modelling that the physical resource providers, which are situated in cloud data center, endure the penalty in terms of extra physical resources. The mentioned mathematical modeling in this paper will be noticeable in cloud data center energy efficiency and physical resource utilization performance.
Published: 2016
Full Text: View/download PDF

11. Community-Based M2M Framework Using Smart/HetNet Gateways for Internet of Things

Author: Cheng-Hsin Hsu, Yeh-Ching Chung, Yi-Lan Lin, and Wu-Chun Chung
Subjects: Computer science, business.industry, Logic gate, Telecommunications link, Mobile computing, Wireless, Cloud computing, Mobile telephony, business, Internet of Things, Heterogeneous network, Computer network
Abstract: In order to manage the Internet of things in a flexible and efficient way, this paper proposes a novel M2M framework using smart/HetNet gateways. Our approach is not only compatible to the M2M standard, but also enables the community-based coordination among gateways and devices. With smart and HetNet gateways, various types of requirements and applications can be fulfilled and handled at a local region. Accordingly, unnecessary network usage is avoided so as to reduce the traffic loads in mobile networks. We also implement a prototype to sustain an application scenario of IoT. In our prototype, the lamp is automatically switched on when a human face is detected. The demonstration shows that our system is practical and supports the subscription and notification in a community. Finally, experimental results reveal that some basic procedures can benefit from the shorter time and less uplink traffic if devices involved in an application are within the same region.
Published: 2015
Full Text: View/download PDF

12. Message from DataCom 2015 Chairs

Author: Geoffrey Fox, Yeh-Ching Chung, Hao Wang, Beniamino Di Martino, Christophe Cérin, Weizhe Zhang, DataCom 2015, Fox, Geoffrey, Chung, Yeh Ching, DI MARTINO, Beniamino, Cérin, Christophe, Zhang, Weizhe, and Wang, Hao
Subjects: Signal processing, Sociology and Political Science, Multimedia, Computer science, Urban studies, Information and Computer Science, Computer Science Applications1707 Computer Vision and Pattern Recognition, Information System, computer.software_genre, Urban Studies, Modeling and simulation, Computer Networks and Communication, Human–computer interaction, Modeling and Simulation, Signal Processing, Media Technology, Information system, computer
Published: 2015
Full Text: View/download PDF

13. BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data

Author: Lyu-Wei Wang, Hung-Chang Hsiao, Wenguang Chen, Yeh-Ching Chung, and Shih-Chang Chen
Subjects: Theoretical computer science, Computer science, Voltage graph, Strength of a graph, Butterfly graph, Simplex graph, law.invention, Graph power, law, Line graph, Folded cube graph, Null graph, Algorithm, MathematicsofComputing_DISCRETEMATHEMATICS
Abstract: Graph computing is widely utilized today, which severely requires the ability of processing graphs of billion vertices rapidly for social network analyzing, bio-informational network analyzing and semantic processing. Therefore, graph processing play a significant role in the research and application development. Data of music and movie recommendation and LDA topics can be modeled as bipartite graph and perform the computation with graph processing engines. The most important step before graph computation is graph partitioning. Graph partitioning is a mature technology, however, most of classic graph partitioning algorithms require iterative calculation for several times, which causes high time complexity. Some algorithms with short partitioning time proposed these years, but they cannot be used in bipartite graph directly. This paper proposes a new bipartite graph partitioning algorithm, BiFennel, which effectively decreases graph processing time and network loading by reducing vertex replication factor and maintaining work balance. We implement BiFennel in a popular graph engine called PowerGraph. The performance results show that BiFennel has 29~55% improvement on communication cost and 21~49% improvement on overall runtime comparing with Aweto.
Published: 2015
Full Text: View/download PDF

14. Minimizing Latency of Real-Time Container Cloud for Software Radio Access Networks

Author: Yeh-Ching Chung, Cheng-Hsin Hsu, Wu-Chun Chung, Shu-Ting Wang, Satyajit Padhy, Mu-Han Huang, and Chen-Nien Mao
Subjects: Software, Access network, business.industry, Computer science, Server, Cloud testing, Packet processing, Testbed, Cellular network, Cloud computing, Software-defined radio, business, Computer network
Abstract: As the huge growth of mobile traffic amount, conventional Radio Access Networks (RANs) suffer from high capital and operating expenditures, especially when new cellular standards are deployed. Software, and cloud RANs have been proposed, but the stringent latency requirements e.g., 1 ms transmission time interval, dictated by cellular networks is difficult to satisfy. We first present a real software RAN testbed based on an opensource LTE implementation. We also investigate the issue of quality assurance when deploying such software RANs in cloud. In particular, running software RANs in cloud leads to high latency, which may violate the latency requirements. We empirically study the problem of minimizing computational and networking latencies in lightweight container cloud. Our experiment results show the feasibility of running software RANs in real-time container cloud. More specifically, a feasible solution to host software RANs in cloud is to adopt lightweight containers with real-time kernels and fast packet processing networking.
Published: 2015
Full Text: View/download PDF

15. Hardware Thread-Level Speculation Performance Analysis

Author: Che-Rung Lee, Michael P. Perrone, Ying-Chieh Wang, Yeh-Ching Chung, and I-Hsin Chung
Subjects: Record locking, Computer science, business.industry, Distributed computing, Thread (computing), Software_PROGRAMMINGTECHNIQUES, Lock (computer science), Instruction set, Consistency (database systems), Embedded system, Programming paradigm, Performance prediction, Overhead (computing), business
Abstract: This paper presents performance analysis for hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer. Unlike traditional multi-thread programming model which uses lock to ensure the consistency of shared data, TLS is a harware mechanism to detect and resolve memory access conflicts among threads. The model shows good performance prediction, as verified by the experiments. This study helps to understand potential gains from using special purpose TLS hardware to accelerate the performance of codes that, in a strict sense, require serial processing to avoid memory conflicts. Furthermore, based on analysis and measurements of the TLS behavior and its overhead together with OpenMP comparison, a strategy is proposed to help utilize this hardware feature. The results also suggest potential improvement for the future TLS architectural designs.
Published: 2015
Full Text: View/download PDF

16. Distributed Metaserver Mechanism and Recovery Mechanism Support in Quantcast File System

Author: Su-Shien Ho, Wenguang Chen, Jiazheng Zhou, Chun-Feng Wu, Yeh-Ching Chung, Hung-Chang Hsiao, and Ching-Hsien Hsu
Subjects: File system, business.industry, Computer science, Distributed computing, Cloud computing, computer.software_genre, Replication (computing), Metadata, Metaserver, Server, Scalability, Computer data storage, Data_FILES, Distributed File System, business, computer
Abstract: With the need of data storage increases tremendously nowadays, distributed file system becomes the most important data storage system in cloud computing. In distributed file system development, there are many researchers work hard to refine the architecture to provide scalability and reliability. In our work, we propose a distributed metaserver system including metaserver scale-out, metadata replication, metaserver recovery, and metaserver management recovery mechanisms. In our experiments, the proposed system can increase the capacity of metadata and increase the reliability by fault tolerance mechanism. The overhead of read/write data is very little in the proposed system as well.
Published: 2015
Full Text: View/download PDF

17. Evaluation of Inter-Cell Interference Coordination with CAP model

Author: Xibin Xu, Shih-Chang Chen, Yeh-Ching Chung, Wei Tan, Jing Wang, Zhigang Tian, Wenqi Wang, Ming Zhao, and Yida Xu
Subjects: business.industry, Computer science, Distributed computing, High availability, Radio spectrum management, Cellular network, The Internet, business, Partition (database), CAP theorem, 5G, Computer network
Abstract: More cooperation or coordination is necessary for future UDN (Ultra-Dense Network) scenario of 5G network, of which ICIC (Inter-Cell Interference Coordination) is a promising and typical scheme. Cooperating cellular network may be thought as a special kind of distributed computing system, whose basic theory and toolsets could be adopted. CAP theory for distributed computing state that any networked shared-data system can have at most two of three desirable properties: consistency (C), high availability (A) and tolerance to network partitions (P) of the data. Since partition could not be avoided in such system, consistency or availability has to be forfeited to a certain extent. This paper extends the CAP theorem to the UDN case, and redefines the consistency, availability and partition-tolerance in the context of future cellular network architecture. Then ICIC process are interpreted with cellular CAP theorem, and the evaluation framework is developed. An improved ICIC scheme is proposed based on such framework.
Published: 2015
Full Text: View/download PDF

18. Message from the program co-chairs IEEE ICPADS 2014

Author: Yeh-Ching Chung and Yanmin Zhu
Subjects: World Wide Web, Operations research, Computer science
Abstract: On behalf of the 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014) Organizing Committee, we are very pleased to announce that more than three hundred researchers and contributors from the world submitted their papers to share their research results and new ideas. The objective of this conference to provide a major international forum for scientists, engineers, and users to exchange and share their experiences, new ideas, and latest research results on all aspects of parallel and distributed computing systems.
Published: 2014
Full Text: View/download PDF

19. Correlation Aware Technique for SQL to NoSQL Transformation

Author: Jen Chun Hsu, Yeh-Ching Chung, Ching-Hsien Hsu, and Shih Chang Chen
Subjects: SQL, Database, Computer science, business.industry, View, Data transformation, Big data, NoSQL, computer.software_genre, Database tuning, Data-intensive computing, Table (database), Data mining, business, computer, computer.programming_language
Abstract: For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.
Published: 2014
Full Text: View/download PDF

20. Performance Modeling for Hardware Thread-Level Speculation

Author: Michael P. Perrone, I-Hsin Chung, Yeh-Ching Chung, Ying-Chieh Wang, and Che-Rung Lee
Subjects: Instruction set, Hardware thread, business.industry, Computer science, Embedded system, Performance prediction, Overhead (computing), IBM, business, Speculation, Computer hardware
Abstract: This paper presents a preliminary performance model for hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer. The model analyzes the TLS behavior and its overhead. We model the scenario when there are 0, 1 and 2 conflicts. The model shows good performance prediction and is verified with experiments. This study helps to understand potential gains from using special purpose TLS hardware to accelerate the performance of codes that, in a strict sense, require serial processing to avoid memory conflicts.
Published: 2014
Full Text: View/download PDF

21. Taiwan UniCloud: A Cloud Testbed with Collaborative Cloud Services

Author: Yeh-Ching Chung, Kuan-Chou Lai, Wu Chun Chung, Kuan-Ching Li, Po Chi Shih, Jerry Chou, Che-Rung Lee, and Ching-Hsien Hsu
Subjects: Cloud computing security, Database, Computer science, business.industry, Testbed, Cloud computing, computer.software_genre, Single-chip Cloud Computer, Resource (project management), Cloud testing, Community cloud, User interface, business, computer
Abstract: This paper introduces a prototype of Taiwan UniCloud, a community-driven hybrid cloud platform for academics in Taiwan. The goal is to leverage resources in multiple clouds among different organizations. Each self-managing cloud can join the UniCloud platform to share its resources and simultaneously benefit from other clouds with scale-out capabilities. Accordingly, resources are elastic and sharable with each other such as to afford unexpected resource demands to each cloud. The proposed platform provides a web portal to operate each cloud via a uniform user interface. The construction of virtual clusters with multi-core VMs is supplied for parallel and distributed processing models. An object-based storage system is also delivered to federate different storage providers. This paper not only presents the architectural design of Taiwan UniCloud, but also evaluates the performance to demonstrate the possibility of current implementation. Experimental results show the feasibility of the proposed platform as well as the benefit from the cloud federation.
Published: 2014
Full Text: View/download PDF

22. Dynamic Data Partitioning and Virtual Machine Mapping: Efficient Data Intensive Computation

Author: Yeh-Ching Chung, Ching-Hsien Hsu, and Kenn Slagter
Subjects: Computer science, business.industry, Dynamic data, Node (networking), Big data, Cloud computing, Parallel computing, computer.software_genre, Task (computing), Virtual machine, Programming paradigm, Data-intensive computing, business, computer
Abstract: Big data refers to data that is so large that it exceeds the processing capabilities of traditional systems. Big data can be awkward to work and the storage, processing and analysis of big data can be problematic. MapReduce is a recent programming model that can handle big data. MapReduce achieves this by distributing the storage and processing of data amongst a large number of computers (nodes). However, this means the time required to process a MapReduce job is dependent on whichever node is last to complete a task. This problem is exacerbated by heterogeneous environments. In this paper we propose a method to improve MapReduce execution in heterogeneous environments. This is done by dynamically partitioning data during the Map phase and by using virtual machine mapping in the Reduce phase in order to maximize resource utilization.
Published: 2013
Full Text: View/download PDF

23. InfiniBand virtualization on KVM

Author: Che-Rung Lee, Yeh-Ching Chung, and Yi-Man Ma
Subjects: Hardware_MEMORYSTRUCTURES, Computer science, business.industry, InfiniBand, Network virtualization, Cloud computing, Network interface, Supercomputer, computer.software_genre, Virtualization, Virtual machine, Embedded system, Operating system, Sockets Direct Protocol, business, computer
Abstract: With the ability to provide on-demand service and to reduce the IT cost, cloud computing becomes more and more popular recently. Virtualization is one of the important technologies in cloud computing, whose main idea is to provide abstractions of the physical resources. However, such abstraction can cause performance degradation, especially for I/O virtualization, which is usually the performance bottleneck in cloud computing. InfiniBand is a network system that provides very low latency (less than 5us) and very high bandwidth (multiple Gbps). Due to its excellent performance, InfiniBand is commonly used in high performance computing (HPC) area. In this paper, we propose Virt-IB for InfiniBand virtualization on Kernel-based Virtual Machine (KVM). The main components of Virt-IB are VM IB library and Virt-IB driver. Our design processes InfiniBand APIs directly in guest VM and communicates with InfiniBand device indirectly to perform the real operations. VM IB library provides API interface and user-level InfiniBand driver. Virt-IB driver provides a channel for VM IB library to write commands into InfiniBand device indirectly. Evaluation results show that our current work is better than network virtualization and it can achieve about 50% performance of native InfiniBand.
Published: 2012
Full Text: View/download PDF

24. Value-based tiering management on heterogeneous block-level storage system

Author: Chai-Hao Tsai, Jerry Chou, and Yeh-Ching Chung
Subjects: business.industry, Computer science, Distributed computing, Quality of service, Cloud computing, computer.software_genre, Data access, Server, Operating system, Resource allocation, iSCSI, business, Integer programming, computer, Data migration
Abstract: As the scale of datacenter continues to grow, it is hard to keep servers homogenous, with the same hardware and performance characteristics. Today's datacenters commonly operates on several generations of servers from multiple vendors, and mix both high-end and low-end devices together to deliver service quality requirement with lowest cost. However, the heterogenous environment also complicates the management of the datacenters, especially in terms of resource allocation. In this paper, we focus on the resource allocation of a tightly unified block-level storage with SSD and HDD. We conduct experiments to quantify the performance of difference access patterns on each type storage devices. Then formulate our resource allocation problem into a ILP (Integer Linear Programming), and proposed data migration algorithms based on the observations. We evaluate our solution by implementing a heterogenous storage consist of HDD, SDD and iSCSI HDD, and show the data access response time can be reduced by 27%.
Published: 2012
Full Text: View/download PDF

25. GPU-based cloud service for multiple sequence alignments with regular expression constrains

Author: Yu-Shiang Lin, Yeh-Ching Chung, and Chun-Yuan Lin
Subjects: Multiple sequence alignment, Computer science, business.industry, Distributed computing, Parallel algorithm, Sequence alignment, Cloud computing, Parallel computing, Dynamic programming, CUDA, Regular expression, User interface, business, Host (network), Graphical user interface
Abstract: Multiple sequence alignments with constrains has become an important problem in the computational biology. The concept of constrained sequence alignment is proposed to incorporate the biologist's domain knowledge into sequence alignments such that the user-specified residues/segments are aligned together in the alignment results. Over the past decade, a series of constrained multiple sequence alignment tools were proposed in the literature. GPU-REMuSiC is a newest tool with the regular expression constrains and uses the graphics processing units (GPUs) with CUDA. GPU-REMuSiC can achieve 29× speedups for overall computation time by the experimental results. However, the execution environment of GPU-REMuSiC need to build, and it's a threshold for biologists to set up. Therefore, we design an intuitive friendly user interface for the potential cloud server with GPUs. Use the user interface through network, we can send the input data to remote server without cumbersome setting in local host. Finally, we can receive the alignment results from the remote cloud server with GPUs.
Published: 2012
Full Text: View/download PDF

26. Arrangement Graph-Based Overlay with Replica Mechanism for File Sharing

Author: Yeh-Ching Chung, Ssu-Hsuan Lu, Kuan-Chou Lai, and Kuan-Ching Li
Subjects: business.industry, Computer science, Replica, Distributed computing, Overlay network, Overlay, Peer-to-peer, computer.software_genre, File sharing, Graph (abstract data type), The Internet, Polling, business, computer, Computer network
Abstract: Over the past decade, the development of Internet technology has led to more awareness on the powerfulness of Peer-to-Peer (P2P) overlay network. How to efficiently establish or maintain overlay networks in large-scale environments always are important issues. Besides, the ways of improving routing efficiency also attract much attention. This study proposes a replica mechanism based on the Arrangement Graph-based Overlay (AGO), and enhances the joining procedures of the AGO. Enhanced AGO reduces system overhead by reducing large number of polling messages of the joining process. In addition, replica mechanism is also integrated into the enhanced AGO to buttress the efficiency of the searching algorithm. Experimental results of the enhanced AGO have demonstrated that efficient routing as well as less bandwidth consumption during communication can be realized.
Published: 2012
Full Text: View/download PDF

27. Message from the WEISS 2012 Workshop Chairs

Author: Jing Chen, Da-Wei Chang, Fahim Kawsar, Hsung-Pin Chang, Bin Guo, Taehyoun Kim, Sasikumar Punnekkat, Chung-Ping Young, Giusy Di Lorenzo, Yann Han Lee, Kaori Fujinami, Li Chi Feng, Yeh-Ching Chung, Minsoo Ryu, Zonghua Gu, Alejandro Masrur, Mei Ling Chiang, Daqiang Zhang, Tatsuo Nakajima, Hamid R. Sharifzadeh, Ian McLoughlin, Koji Nakano, Seongsoo Hong, Douglas L. Maskell, and Yunheung Paek
Subjects: Computer science, Library science
Published: 2012
Full Text: View/download PDF

28. Scalable Communication-aware Task Mapping Algorithms for Interconnected Multicore Systems

Author: I-Hsin Chung, Chung-Yi Chou, Che-Rung Lee, Jiazheng Zhou, and Yeh-Ching Chung
Subjects: Multi-core processor, Parallel processing (DSP implementation), Computational complexity theory, Computer science, Node (networking), Distributed computing, Scalability, Algorithm design, Parallel computing, Supercomputer, Time complexity, Algorithm
Abstract: Communication-aware task mapping algorithms, which map parallel tasks onto processing nodes according to the communication patterns of applications, are essential to reduce the communication time in modern high performance computing. In this paper, we design algorithms specifically for interconnected multicore systems, whose architectural property, namely small number of cores per node, large number of nodes, and large performance gap between the communication within a multicore and among multicores, had brought new challenges and opportunities to the mapping problem. Let k be the number of cores per multicore and n be the number of tasks. We consider the practical case that k « n for k = 2, 4, and 6. The designed algorithms are optimal for the mapping measurement, called Maximum Interconnective Message Size (MIMS), and of time complexity merely O(mlogm) for m communication pairs. Thus, they are highly scalable for large applications. We had experimented the algorithms on the IBM Blue Gene/P system for two synthetic benchmarks and two applications. The results show good communication performance improvement.
Published: 2012
Full Text: View/download PDF

29. GPU Performance Enhancement via Communication Cost Reduction: Case Studies of Radix Sort and WSN Relay Node Placement Problem

Author: Yeh-Ching Chung, Shih-Hsiang Lo, I-Hsin Chung, Nan-Hsi Chen, and Che-Rung Lee
Subjects: Cost reduction, Computer science, Radix sort, Graphics processing unit, Parallel computing, Central processing unit, Performance improvement, Bottleneck, Data compression, Data transmission
Abstract: As the computational power of Graphics Processing Unit (GPU) increases, data transmission becomes the major performance bottleneck. In this study, we investigate two techniques, data streaming and data compression, to reduce the communication cost on GPU. Data streaming enables overlap of communication and computation, whereas data compression reduces the data size transferred among different memory spaces. Although both techniques increase computation cost, overall performance can still be enhanced by reducing communication cost. We demonstrate the effectiveness of the two techniques via two case studies: radix sort and 3-star, a deployment algorithm in wireless sensor networks. For radix sort, a new algorithm, which mixes MSD and LSD algorithms and employs data streaming, is presented. Its performance is 25% faster than the fastest GPU radix sort implementation currently available in the public domain. For the 3-star algorithm, the speed increases several hundreds of times faster than that obtained by the CPU code. The data streaming and data compression, which is a hybrid CPU-GPU algorithm, provide an additional 54% performance improvement to the GPU implementation. Data compression not only reduces communication cost, but also improves the computation time, by which further performance enhancement can be achieved.
Published: 2012
Full Text: View/download PDF

30. Direction-aware resource discovery service in large-scale grid and cloud computing

Author: Kuan-Ching Li, Wu-Chun Chung, Chin-Jung Hsu, Yeh-Ching Chung, and Kuan-Chou Lai
Subjects: Human resource management system, business.industry, Computer science, Distributed computing, Overlay network, Cloud computing, computer.software_genre, Grid, Grid computing, Robustness (computer science), Scalability, Information system, business, computer
Abstract: With the consideration of scalability and robustness, distributed computing systems such as grids and clouds may exploit the P2P approach to enhance their performance. However, conventional techniques in P2P systems cannot be applied directly into grid systems due to restricted sort of queries for desired resources. In this paper, we consider a fully decentralized resource discovery service based on an unstructured overlay, where the major challenge is to locate desired resources without the global knowledge about sharing resource information. Consequently, more nodes involved in the resource discovery scheme may incur higher network overhead. To achieve an efficient resource discovery, this paper aims to alleviate the network traffic among unstructured information systems. Relying on the information of resource attributes and characteristics, we propose the direction-aware resource discovery scheme to improve the overall performance. Experimental results illustrate that the proposed approach is efficient and scalable compared with conventional approaches.
Published: 2011
Full Text: View/download PDF

31. PQEMU: A Parallel System Emulator Based on QEMU

Author: Wei-Chung Hsu, Po-Chun Chang, Jiun-Hung Ding, and Yeh-Ching Chung
Subjects: Multi-core processor, Speedup, Parallel processing (DSP implementation), Computer science, business.industry, Operating system, Software development, computer.software_genre, business, Virtual platform, computer
Abstract: A full system emulator, such as QEMU, can provide a versatile virtual platform for software development. However, most current system simulators do not have sufficient support for multi-processor emulations to effectively utilize the underlying parallelism presented by today's multi-core processors. In this paper, we focus on parallelizing a system emulator and implement a prototype parallel emulator based on the widely used QEMU. Using this parallel QEMU, emulating an ARM11MPCore platform on a quad-core Intel i7 machine with the SPLASH-2 benchmarks, we have achieved 3.8x speedup over the original QEMU design. We have also evaluated and compared the performance impact of two different parallelization strategies, one with minimum sharing among emulated CPU, and one with maximum sharing.
Published: 2011
Full Text: View/download PDF

32. Coalitional game formulation for multi-channel cooperative cognitive radio networks

Author: Feng-Tsun Chien, Yu-Wei Chan, Yeh-Ching Chung, Ronald Y. Chang, and Min-Kuan Chang
Subjects: Mathematical optimization, business.industry, Computer science, TheoryofComputation_GENERAL, Grand coalition, law.invention, Core (game theory), Cognitive radio, Transmission (telecommunications), Relay, law, Resource allocation, Telecommunications, business, Solution concept, Game theory
Abstract: In this paper, we study a coalitional game approach to resource allocation in a multi-channel cooperative cognitive radio network with multiple primary users (PUs) and secondary users (SUs). We propose to form the grand coalition by grouping all PUs and SUs in a set, where each PU can lease its spectrum to all SUs in a time-division manner while the SUs in return help the data transmission of PUs by relaying. The grand coalition is shown to be stable in the considered scenario by justifying that the solution concept of the coalitional game (the core) is nonempty. Also, optimal relay strategies of SUs, to relay or to transmit its own data, are obtained so that the sum rate of all PUs and SUs is maximized. Finally, we demonstrate in simulation the benefits of the grand coalition comparing to the case of direct transmission only and other forms of coalitions as well.
Published: 2011
Full Text: View/download PDF

33. Design and analysis of arrangement graph-based overlay systems for information sharing

Author: Ssu-Hsuan Lu, Kuan-Chou Lai, Yeh-Ching Chung, and Kuan-Ching Li
Subjects: business.industry, Computer science, Distributed computing, Information sharing, Overlay network, Graph theory, Overlay, Peer-to-peer, computer.software_genre, Graph, Scalability, The Internet, business, Chord (peer-to-peer), computer, Computer network
Abstract: With the continuous innovation of advanced Internet technology, Peer-to-Peer (P2P) system emerges as an important information-sharing system for the widespread exchange of resources and information among thousands of users. In this study, we applied properties of arrangement graphs to design a newly structured overlay system, named as Arrangement Graph-based Overlay (AGO). In such an overlay, the IDs between the two adjacent nodes differ only one digit and thus, the joining and leaving processes are easy while keeping the low maintenance cost. Furthermore, the searching performance in the AGO system is efficient, adaptive and scalable. Analyses of experimental results show that the establishment of the system and the node searching could achieve better performance than those obtained in the Chord system.
Published: 2011
Full Text: View/download PDF

34. Scalable Communication-Aware Task Mapping Algorithms for Interconnected Multicore Systems

Author: Che-Rung Lee, Jiazheng Zhou, Yeh-Ching Chung, and I-Hsin Chung
Subjects: Multi-core processor, Computational complexity theory, Computer science, Distributed computing, Node (networking), Scalability, Parallel computing, Supercomputer, Time complexity, Algorithm
Abstract: Communication-aware task mapping algorithms, which map parallel tasks onto processing nodes according to the communication patterns of applications, are essential to reduce the communication time in modern high performance computing. In this paper, we design algorithms specifically for interconnected multicore systems, whose architectural property, namely small number of cores per node, large number of nodes, and large performance gap between the communication within a multicore and among multicores, had brought new challenges and opportunities to the mapping problem. Let k be the number of cores per multicore and n be the number of tasks. We consider the practical case that k is much smaller than n, for k = 2, 4, and 6. The designed algorithms are optimal for the mapping measurement, called Maximum Interconnective Message Size (MIMS), and of time complexity merely O(mlogm) for m communication pairs. Thus, they are highly scalable for large applications. We had experimented the algorithms on the IBM Blue Gene/P system for two synthetic benchmarks and two applications. The results show good communication performance improvement.
Published: 2011
Full Text: View/download PDF

35. An Efficient Programming Paradigm for Shared-Memory Master-Worker Video Decoding on TILE64 Many-Core Platform

Author: Kuan-Ching Li, Kuan-Chou Lai, Xuan-Yi Lin, Shau-Yin Tseng, and Yeh-Ching Chung
Subjects: Speedup, Computer science, Aspect-oriented programming, Distributed computing, Video decoder, Application software, computer.software_genre, Inductive programming, Shared memory, Computer architecture, Reactive programming, Programming paradigm, computer, Implementation, Functional reactive programming
Abstract: The ubiquity of many-core architectures brings challenges in making scalable application software, changing dramatically from the way applications are traditionally developed. Optimization of programs for many-core platforms is a multifaceted problem, where system and architectural factors should be taken into consideration. In this paper, we attack the problem on the aspect of programming paradigm. We propose a hybrid producer-write plus consumer-read shared-memory programming paradigm for implementation of a master-worker video decoder on the TILE64 many-core platform. To evaluate the scalability and performance benefits of different programing paradigms, a Motion JPEG decoder is parallelized using master-worker structure and implemented with combinations of consumer-read programming and producer-write programming. Experimental results show that the proposed implementation obtained competitive performance speedup, scaling well with number of available cores and up to 4 times performance improvement over other implementations on the decoding of a 1080P video.
Published: 2011
Full Text: View/download PDF

36. Hardware/software co-designed accelerator for vector graphics applications

Author: Hsin-Wen Wei, Yeh-Ching Chung, Yi-Cheng Chen, Hsiao-Mei Lin, Shuo-Hung Chen, and Chih-Tsun Huang
Subjects: Hardware architecture, Vector graphics, Software, Computer science, Hardware register, business.industry, Embedded system, Component-based software engineering, Graphics processing unit, Hardware compatibility list, Hardware acceleration, business, Computer hardware
Abstract: This paper proposes a new hardware accelerator to speed up the performance of vector graphics applications on complex embedded systems. The resulting hardware accelerator is synthesized on a field-programmable gate array (FPGA) and integrated with software components. The paper also introduces a hardware/software co-verification environment which provides in-system at-speed functional verification and performance evaluation to verify the hardware/software integrated architecture. The experimental results demonstrate that the integrated hardware accelerator is fifty times faster than a compiler-optimized software component and it enables vector graphics applications to run nearly two times faster.
Published: 2011
Full Text: View/download PDF

37. A Performance Goal Oriented Processor Allocation Technique for Centralized Heterogeneous Multi-cluster Environments

Author: Po-Chi Shih, Kuo-Chan Huang, Che-Rung Lee, I-Hsin Chung, and Yeh-Ching Chung
Subjects: Goal orientation, Computer science, Distributed computing, Multi cluster, Look-ahead
Published: 2011
Full Text: View/download PDF

38. A Parallel Rectangle Intersection Algorithm on GPU+CPU

Author: Che-Rung Lee, I-Hsin Chung, Yeh-Ching Chung, and Shih-Hsiang Lo
Subjects: Instruction set, CUDA, Coprocessor, Speedup, Computer science, Parallel algorithm, Graphics processing unit, Algorithm design, Parallel computing, Rectangle, Algorithm
Abstract: In this paper, we investigate efficient algorithms and implementations using GPU plus CPU to solve the rectangle intersection problem on a plane. The problem is to report all intersecting pairs of iso-oriented rectangles, whose parallelization on GPUs poses two major computational challenges: data partition and the massive output. The algorithm we presented is called PRI-GC, Parallel Rectangle Intersection algorithm on GPU+CPU, which consists of two phases: mapping and intersection-checking. In the mapping phase, rectangles are hashed into different subspaces (called cells) to reduce the unnecessary intersection checking for far-apart rectangles. In the intersection-checking phase, pairs of rectangles within the same cell are examined in parallel, and the intersecting pairs of rectangles are reported. Several optimization techniques, including rectangles re-ordering, output data compressing/encoding, and the execution overlapping of GPU and CPU, are applied to enhance the performance. We had evaluated the performance of PRI-GC and the result shows over 30x speedup against two well-implemented sequential algorithms on single CPU. The effectiveness of each optimization technique for this problem was evaluated as well. Several parameters, including different degrees of rectangle coverage, different block sizes, and different cell sizes, were also experimented to explore their influences on the performance of PRI-GC.
Published: 2011
Full Text: View/download PDF

39. A Locality-Aware Publish/Subscribe Scheme for High Level Architecture on Structured Peer-to-Peer Networks

Author: Wei-Chao Chang, Shih-Hsiang Lo, Kuan-Ching Li, Kuan-Chou Lai, and Yeh-Ching Chung
Subjects: Multicast, ComputingMethodologies_SIMULATIONANDMODELING, Computer science, business.industry, Distributed computing, Locality, Message passing, Local area network, Peer-to-peer, computer.software_genre, High-level architecture, Server, business, computer, Computer network
Abstract: High Level Architecture (HLA) is a distributed simulation architecture which is applied to many simulation environments. In most of these environments, the simulation entities (called the federate in HLA) communicate with each other by interconnected local area networks (LANs). Due to the communications among federates which are in the same LAN have shorter latency and higher bandwidth, this paper proposes a message publish/subscribe scheme of HLA based on structured peer-to-peer overlay according to the principle of locality, by increasing the size of affordable federates and the size of workload by using the same hardware environment. Moreover, the run-time infrastructure (RTI) is implemented in accordance with HLA for performance evaluation. Experimental results show that the proposed scheme improves the performance of simulations.
Published: 2010
Full Text: View/download PDF

40. CORAL-M: Heuristic coding region alignment method for multiple genome sequences

Author: Chuan Yi Tang, Yeh-Ching Chung, Che Lun Hung, Shu Ju Hsieh, Shih-Cheng Chang, Yaw-Ling Lin, and Chun-Yuan Lin
Subjects: Genetics, Multiple sequence alignment, Sequence analysis, Structural alignment, Coding region, Genomics, Computational biology, Biology, Genome, Alignment-free sequence analysis, Homology (biology)
Abstract: Multiple sequence alignment is a scientific tool to assist the study of DNA homology, phylogeny determinations, and conserved motifs identification. Various heuristic MSA methods have been presented to obtain the resulting alignment for multiple sequences. Although these alignment tools are able to align protein, DNA, and RNA sequences successfully, they are not such successful in aligning coding region sequences because the resulting alignments maybe not consistent with practical observations. Therefore, we propose a method, CORAL-M, a heuristic coding regions alignment method for multiple genome sequences, especially for coding regions. CORAL-M adopts a probabilistic filtration model and the local optimal solution to align genome sequences (codon to codon with the wobble mask rule) by the sliding windows and, thus, obtains the near-optimal alignment in linear time. In the experimental results, CORAL-M can be used to find the potential function sites by aligning viral strains of Poliovirus 1–3, Enterovirus 71, and Coxsackievirus 16.
Published: 2010
Full Text: View/download PDF

41. A tiling-based approach for directional sensor network deployment

Author: Chun-Hsien Wu and Yeh-Ching Chung
Subjects: Engineering, business.industry, Visual sensor network, Real-time computing, Process (computing), Key distribution in wireless sensor networks, Software deployment, Polygon, Computer Science::Networking and Internet Architecture, Mobile wireless sensor network, Sensor network deployment, business, Wireless sensor network, Computer network
Abstract: In this paper, we propose a tiling-based wireless sensor network (WSN) deployment approach based on the polygon model for sensor nodes with directional sensing areas. In the tiling-based deployment approach, a hexagon tile is first generated from the polygon that represents the sensing area of a given directional sensor. Then, a tiling process is applied to place tiles to the deployment area. Both sensing coverage holes surrounding the boundaries and the obstacles are considered under the proposed approach. To evaluate the proposed deployment approach, we compare its performance with the strip-based deployment pattern approach, which is under the sector model, in terms of the sensing coverage rate and the usage of sensor nodes. The simulation results show that the sensing coverage rate of the proposed deployment approach is higher than that of the strip-based deployment pattern approach for different types of sensor nodes on deployment areas with/without obstacles.
Published: 2010
Full Text: View/download PDF

42. Keynote 1: Computer Go Game and Competition by Yeh-Ching Chung

Author: Yeh-Ching Chung
Subjects: Non-cooperative game, Win-win game, Game mechanics, Game design, Video game development, Operations research, Computer science, Game design document, Human–computer interaction, Video game design, Game Developer
Published: 2010
Full Text: View/download PDF

43. Pervasive health service system: insights on the development of a grid-based personal health service system

Author: Ssu-Hsuan Lu, Don-Lin Yang, Kuan-Chou Lai, Kuan-Ching Li, Ming-Hsin Tsai, and Yeh-Ching Chung
Subjects: Service (business), Service system, HRHIS, Knowledge management, business.industry, Mobile computing, Health technology, computer.software_genre, Shared resource, Grid computing, Health care, Medicine, business, computer
Abstract: Although medical technologies developed in the twenty-first century have successfully increased man's life span, the pressure of modern life has consequently brought many modern civilization diseases and chronic illness. When all these problems are tackled by hospitals, they will consume considerable amount of medical resources. Alternatively, providing health care services at home is an important issue for improving personal health and save hospital resources. In this paper, we present an ongoing project that designs and implements a pervasive health service infrastructure based on the grid system which is integrated with the P2P's resource sharing mechanism, to provide the personal health service. The personal health status is recorded, monitored, and even mined in/from the proposed pervasive health service system for preventive medicine. Additionally, wireless sensor equipments for mobile personal health services are also integrated into the pervasive health service system, in order to construct a situation-aware, context-aware and environment-aware mobile-health-service platform.
Published: 2010
Full Text: View/download PDF

44. Offloading Region Matching of Data Distribution Management with CUDA

Author: Yeh-Ching Chung, Shih Hsiang Lo, and Fang Ping Pai
Subjects: Computer graphics, Matching (statistics), CUDA, High-level architecture, Coprocessor, Computer science, Computation, Process (computing), Graphics processing unit, Parallel computing
Abstract: Data distribution management (DDM) aims to reduce the transmission of irrelevant data between High Level Architecture (HLA) compliant simulators by taking their interesting regions into account (i.e. region matching). In a large-scale simulation, computation intensive region matching would have a direct impact on the simulation performance. To deal with the high computation cost of region matching, the whole process of region matching is offloaded to graphical processing units (GPUs) based on Computer Unified Device Architecture (CUDA). Two approaches are proposed to perform region matching in parallel. Several metrics, including different numbers of regions, different sizes of regions and different distributions of regions, are used in the experimental tests. The experimental results indicate that the performance of region matching on a GPU can be improved more than one or two orders of magnitude in comparison with that on a CPU.
Published: 2010
Full Text: View/download PDF

45. On the design of contribution-aware P2P streaming networks

Author: Yeh-Ching Chung and Yu-Wei Chan
Subjects: Multimedia, business.industry, computer.internet_protocol, Computer science, media_common.quotation_subject, Problem statement, computer.software_genre, Transmission (telecommunications), Server, Scalability, Bandwidth (computing), Systems architecture, Real Time Streaming Protocol, Quality (business), business, computer, Computer network, media_common
Abstract: P2P live streaming networks recently become an emerging research topic. In such multimedia steaming networks, autonomous users cooperate with each other to provide a distributed, scalable, and cost-efficient transmission environment. However, since each user is self-interested such that they want to receive high-quality video with minimum upload bandwidth, full cooperation and stable playback quality can not be guaranteed. This paper proposes the design of a two-phase, contribution-aware mechanism for P2P live streaming networks. We first propose our two-layered system architecture and the problem statements. Then, we propose a contribution-aware mechanism for identifying the stable peers. We also propose the method concept of stimulating peer cooperation in such a hybrid push-pull P2P streaming platform.
Published: 2009
Full Text: View/download PDF

46. Reducing Leakage Power of JPEG Image on Asymmetric SRAM

Author: Yeh-Ching Chung, Yu-Hsun Lin, and Xuan-Yi Lin
Subjects: Random access memory, Hardware_MEMORYSTRUCTURES, business.industry, Computer science, Circuit design, Parallel computing, Integrated circuit design, computer.file_format, Huffman coding, JPEG, symbols.namesake, Low-power electronics, symbols, Static random-access memory, Cache, business, computer, Computer hardware, Transform coding
Abstract: Leakage power becomes a key challenge and occupies an increasing portion of the total power consumption in nano-scale circuit design. There are many novel cache designs to reduce the leakage power based on the characteristics of programs. One of them is Asymmetric SRAM that can reduce leakage power on cache while storing bit "0". In this paper, we propose two algorithms, value-position-switch algorithm and code-bit-switch algorithm, to make the JPEG image bias on bit "0" based on Asymmetric SRAM. The value-position-switch algorithm and code-bit-switch algorithm can reduce the amount of bit "1" in Huffman coded data up to 7.33% and 25.20%, respectively. The overheads of instruction count, cycle count and power consumption for these two algorithms are negligible (
Published: 2009
Full Text: View/download PDF

47. A Parallel Algorithm for Three-Profile Alignment Method

Author: Yeh-Ching Chung, Chuan Yi Tang, Chun-Yuan Lin, and Che-Lun Hung
Subjects: Dynamic programming, Theoretical computer science, Multiple sequence alignment, Computational complexity theory, Feature (computer vision), Computer science, Parallel algorithm, SUPERFAMILY, Function (mathematics), Sensitivity (control systems), Algorithm
Abstract: Profile-profile alignment is an important technique in the computational biology filed. Several profile–profile alignment methods have been proposed to improve the sensitivity and the alignment quality compared with other sequence–sequence and profile–sequence methods. An increasing number of studies indicated that the three-way alignment may provide additional information or more accurate alignment result than the pair-wise alignment does. Therefore, we propose the dynamic programming based three-profile alignment method, TPA, at first to align three profiles simultaneously. The time and space complexities of TPA are O(n^3) and O(n^2), respectively. To reduce the complexities of TPA, we further develop the parallel version of TPA, PTPA, which achieves O(n^3/p) time and O(n^2/p) space complexities, where p is the number of the processor. In the case study I, the result presented that PTPA can find more conserve candidates than those by the profile-profile alignment method (CLUSTALW). In the case study II, we applied the PTPA to the Feature Amplified Voting Algorithm (FAVA) to analysis the Amidohydrolase superfamily. Several amino acid residues those were known to be related to the function or the structure of mammalian imidase are identified by PTPA-FAVA.
Published: 2009
Full Text: View/download PDF

48. Improving Processor Allocation in Heterogeneous Computing Grid through Considering Both Speed Heterogeneity and Resource Fragmentation

Author: Yeh-Ching Chung, Kuo-Chan Huang, and Po-Chi Shih
Subjects: Grid computing, Computer science, Distributed computing, Fragmentation (computing), Processor scheduling, Workload, Symmetric multiprocessor system, Relative strength, computer.software_genre, Allocation method, Grid, computer
Abstract: In a heterogeneous grid environment, there are two major factors which would severely affect overall system performance: speed heterogeneity and resource fragmentation. Moreover, the relative effect of these two factors changes with different workload and resource conditions. Processor allocation methods have to deal with this issue. However, most existing allocation methods focus on one of these two factors. This paper first analyzes the relative strength of different existing methods. Based on the analysis, we propose an intelligent processor allocation method which considers both the speed heterogeneity and resource fragmentation effects. Extensive simulation studies have been conducted to show that the proposed method can effectively deliver better performance under most resource and workload conditions.
Published: 2009
Full Text: View/download PDF

49. SARIDS: A Self-Adaptive Resource Index and Discovery System

Author: Yeh-Ching Chung, Kuan-Ching Li, Wu-Chun Chung, Kuan-Chou Lai, and Yi-Hsiang Lin
Subjects: Load management, Range query (data structures), Computer science, Distributed computing, Scalability, Locality, Hash function, Load balancing (computing), Grid, Shared resource
Abstract: Recently, the resource sharing systems apply the P2P technique to provide scalable multi-attribute range queries. However, due to the heterogeneity of resources and the variation of sharing policies in different providers, current P2P-based resource discovery systems may suffer the load imbalance problem in a large scale distributed system. In this paper, we propose a self-adaptive resource index and discovery system (SARIDS) to achieve load balancing. SARIDS adopts a two-tier architecture based on the structured P2P overlay. The intra-overlay is constructed by normal peers with the same attribute via the locality preserving hash function; and, the inter-overlay is constructed by super-peers with classified attributes in different intra-overlays. SARIDS supports not only the multi-attribute range queries but also the self-adaptive mechanisms for load balancing in the intra-overlay and among the intra-overlays. The simulation results show that SARIDS is scalable and efficient for load balancing even in the non-uniform peer range environment.
Published: 2009
Full Text: View/download PDF

50. Hardware Supported Multicast in 2-D Mesh InfiniBand Networks

Author: Yeh-Ching Chung, Shen-En Liu, and Jiazheng Zhou
Subjects: Protocol Independent Multicast, Multicast, Computer science, business.industry, computer.internet_protocol, Inter-domain, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Distance Vector Multicast Routing Protocol, Data_CODINGANDINFORMATIONTHEORY, Parallel computing, Network topology, Network simulation, Source-specific multicast, Internet Group Management Protocol, Reliable multicast, Multicast address, IP multicast, Xcast, Unicast, business, computer, Computer hardware, Pragmatic General Multicast, Computer network
Abstract: The multicast operation is a useful operation in parallel applications. With the hardware supported multicast of the InfiniBand Architecture (IBA), we propose a multicast scheme for mxn mesh InfiniBand networks based on the XY routing scheme. The basic concept of the proposed multicast scheme is to find the union sets of the output ports of switches that are in the paths between the source node and each destination node in a multicast group. We have implemented the proposed multicast scheme on a 2-D mesh InfiniBand network simulator. Several multicast cases with different message size and different traffic workload are simulated. The simulation results show that the proposed multicast scheme outperforms their corresponding unicast scheme for all simulated cases. The larger the message size, the number of multicast source nodes, and the size of the multicast group, the better speedup can be expected from the proposed multicast scheme.
Published: 2009
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

132 results on '"Yeh-Ching Chung"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources