Author: "Yeh-Ching Chung" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yeh-Ching Chung"' showing total 499 results

Start Over Author "Yeh-Ching Chung"

499 results on '"Yeh-Ching Chung"'

201. Message from the program co-chairs IEEE ICPADS 2014.

Author: Yeh-Ching Chung and Yanmin Zhu
Published: 2014
Full Text: View/download PDF

202. Improving Static Task Scheduling in Heterogeneous and Homogeneous Computing Systems.

Author: Chih-Hsueh Yang, PeiZong Lee, and Yeh-Ching Chung
Published: 2007
Full Text: View/download PDF

203. Efficient Parallel Algorithm for Optimal Three-Sequences Alignment.

Author: Chun-Yuan Lin, Chen Tai Huang, Yeh-Ching Chung, and Chuan Yi Tang
Published: 2007
Full Text: View/download PDF

204. Cooperative Localization with Pre-Knowledge Using Bayesian Network for Wireless Sensor Networks.

Author: Shih-Hsiang Lo, Chun-Hsien Wu, and Yeh-Ching Chung
Published: 2007
Full Text: View/download PDF

205. EduBloud: A Blockchain-based Education Cloud

Author: Hongbo Zhang, Guiyan Wang, Yeh-Ching Chung, Bowen Xiao, and Wei Cai
Subjects: Blockchain, Computer science, business.industry, Reliability (computer networking), Throughput, Cloud computing, Computer security, computer.software_genre, Critical infrastructure, Market fragmentation, Isolation (database systems), business, computer, Implementation
Abstract: Cloud service for education purpose is the critical infrastructure for the smart campus. The state-of-the-art education clouds are usually developed and maintained by individual schools. The isolation nature makes these data being tampered easily, which leads to malicious tampering and information fragmentation. The blockchain is a perfect technology to address these issues. By leveraging the advantages of the public blockchain, consortium blockchain, and private blockchain, we propose EduBloud, a heterogeneous blockchain system empowered education cloud. The system showed higher reliability, lower latency, higher data throughput, and better economic efficiency than homogeneous blockchain implementations.
Published: 2019
Full Text: View/download PDF

206. Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning

Author: Yeh-Ching Chung, Youhong Zhang, Weiwei Lin, Chuntao Jiang, Xing Chen, Zhifeng Hao, and Ching-Hsien Hsu
Subjects: Colorectal cancer, 02 engineering and technology, Disease, Machine learning, computer.software_genre, 01 natural sciences, Health care, 0202 electrical engineering, electronic engineering, information engineering, medicine, Electrical and Electronic Engineering, Lung cancer, Instrumentation, Screening procedures, Multiple cancer, business.industry, Applied Mathematics, 020208 electrical & electronic engineering, 010401 analytical chemistry, Cancer, Condensed Matter Physics, medicine.disease, 0104 chemical sciences, Clinical trial, Artificial intelligence, business, computer
Abstract: Cancer is a kind of non-communicable disease, progresses with uncontrolled cell growth in the body. The cancerous cell forms a tumor that impairs the immune system, causes other biological changes to malfunction. The most common kinds of cancer are breast, prostate, leukemia, lung, and colon cancer. The presence of the disease is identified with the proper diagnosis. Many screening procedures are suggested to find the presence of the condition under different stages. Medical practitioners further analyze these electronic health records to diagnose and treat the individual. In some cases, misdiagnosis can happen due to manual error or misinterpretation of the data. To avoid these issues, this paper presents an effective computer-aided diagnosis system supported by intelligence learning models. A machine learning-based feature modeling is proposed to improve predictive performance. From the University of California, Irvine repository, breast, cervical, and lung cancer datasets are accessed to conduct this experimental study. Supervised learning algorithms are employed to train and validate the optimal features reduced by the proposed system. Using the 10-Fold cross-validation method, the trained and performance model is evaluated with validation metrics such as accuracy, f-score, precision, and recall. The study's outcome attained 99.62%, 96.88%, and 98.21% accuracy on breast, cervical, and lung cancer datasets, respectively, which exhibits the proposed system's efficacy. Moreover, this system acts as a miscellaneous tool for capturing the pattern from many clinical trials for multiple types of cancer disease.
Published: 2021
Full Text: View/download PDF

207. A Rotate-Tiling Image Composition Method for Parallel Volume Rendering on Distributed Memory Multicomputers.

Author: Chin-Feng Lin, Don-Lin Yang, and Yeh-Ching Chung
Published: 2001
Full Text: View/download PDF

208. An IoT-Based Cloud-Fog Computing Platform for Creative Service Process

Author: Terng-Yin Hsu, Hongji Yang, Tse-Chuan Hsu, and Yeh-Ching Chung
Subjects: Service (systems architecture), Multimedia, business.industry, Process (engineering), Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Cloud computing, 02 engineering and technology, computer.software_genre, Service process, Fog computing, Component (UML), 0202 electrical engineering, electronic engineering, information engineering, Internet of Things, business, computer, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: The creative service process is an innovative process that uses newly developed technologies to improve the service models currently used. In this paper, we proposed an IoT-based Cloud-Fog computing platform for the creative service process. The proposed Cloud-Fog computing platform is a distributed computing platform where compute, storage, and network systems of Cloud and Fog are working independently or collaboratively depending on the services performed by the platform. If a service requires resources either from Cloud or Fog, then only the designated Cloud or Fog will be involved. Otherwise, both Cloud and Fog will be collaborated to each other for the service. During the collaboration, the required data may be back and forth in between Cloud and Fog. To demonstrate the usage of the proposed platform for the creative service process, two service examples, smart classroom and city surveillance, are discussed
Published: 2017
Full Text: View/download PDF

209. Application-Aware Traffic Redirection: A Mobile Edge Computing Implementation Toward Future 5G Networks

Author: Bing-Liang Chen, Yeh-Ching Chung, Yu-Cing Luo, Jerry Chou, and Shih-Chun Huang
Subjects: Mobile edge computing, Edge device, business.industry, Computer science, Network packet, Cellular network, Cloud computing, Mobile telephony, business, 5G, Edge computing, Computer network
Abstract: With the development of network technology, there are billions of devices accessing resources and services on the cloud through mobile telecommunication network. A great number of connections and data packets must be handled by the mobile network. It not only consumes the limited spectrum resources and network bandwidth, but also reduces the service quality of applications. To alleviate the problem, the concept of Mobile Edge Computing (MEC) has been proposed by European Telecommunications Standard Institute (ETSI) in 2014. MEC suggests to provide IT and cloud computing capabilities at the network edge to offer low-latency and high-bandwidth service. The architecture and the benefits of MEC have been discussed in many recent literature. But the implementation of underlying network is rarely discussed or evaluated in practice. In this paper, we present our prototype implementation of a MEC platform by developing an application-aware traffic redirection mechanism at edge network to reduce service latency and network bandwidth consumption. Our implementation is based on OAI, an open source project of 5G SoftRAN cellular system. To the best of our knowledge, it is also one of the few MEC solutions that have been built for 5G networks in practice.
Published: 2017
Full Text: View/download PDF

210. A Dynamic Module Deployment Framework for M2M Platforms

Author: Shih-Chun Huang, Jerry Chou, Yu-Cing Luo, Bing-Liang Chen, and Yeh-Ching Chung
Subjects: Network congestion, Service (systems architecture), Resource (project management), Access network, Software deployment, Computer science, business.industry, Server, Distributed computing, Cloud computing, Reuse, business
Abstract: IoT applications are built on top of M2M platforms which facilitate the communication infrastructure among devices and to the clouds. Because of increasing M2M communication traffic and limited edge network bandwidth, it has become a crucial problem of M2M platform to prevent network congestion and service delay. A general approach is to deploy IoT service modules in M2M platform, so that data can be pre-processed and reduced before transmitting over the networks. Moreover, the service modules often need to be deployed dynamically at various locations of M2M platform to accommodate the mobility of devices moving across access networks, and the on-demand service requirement from users. However, existing M2M platforms have limited support to deployment dynamically and automatically. Therefore, the objective of our work is to build a dynamic module deployment framework in M2M platform to manage and optimize module deployment automatically according to user service requirements. We achieved the goal by implementing a solution that integrates a OSGi-based Application Framework(Kura), with a M2M platform(OM2M). By exploiting the resource reuse method in OSGi specification, we were able to reduce the module deployment time by 50~52%. Finally, a computation efficient and near-optimal algorithm was proposed to optimize the the module placement decision in our framework.
Published: 2017
Full Text: View/download PDF

211. HybridFS - a high performance and balanced file system framework with multiple distributed file systems

Author: Hongji Yang, Lidong Zhang, Yeh-Ching Chung, Tse-Chuan Hsu, Yongwei Wu, and Ruini Xue
Subjects: Java, Computer science, Stub file, 02 engineering and technology, computer.software_genre, Design rule for Camera File system, Server, Data file, 0202 electrical engineering, electronic engineering, information engineering, Data_FILES, Versioning file system, SSH File Transfer Protocol, Distributed File System, File system fragmentation, computer.programming_language, File system, 020203 distributed computing, Indexed file, Computer file, Device file, computer.file_format, Everything is a file, Unix file types, Virtual file system, Torrent file, File Control Block, Self-certifying File System, Journaling file system, Operating system, File area network, 020201 artificial intelligence & image processing, Fork (file system), computer
Abstract: In the big data era, the distributed file system is getting more and more significant due to the characteristics of\ud its scale-out capability, high availability, and high performance. Different distributed file systems may have different design goals. For example, some of them are designed to have good performance for small file operations, such as GlusterFS, while some of them are designed for large file operations, such as Hadoop distributed file system. With the divergence of big data applications, a distributed file system may provide good performance for some applications but fails for some other applications, that is, there has no universal distributed file system that can produce good performance for all applications. In this\ud paper, we propose a hybrid file system framework, HybridFS, which can deliver satisfactory performance for all applications. HybridFS is composed of multiple distributed file systems with the integration of advantages of these distributed file systems. In HybridFS, on top of multiple distributed file systems, we have designed a metadata management server to perform three functions: file placement, partial metadata store, and dynamic file migration. The file placement is performed based on a decision tree. The partial metadata store is performed for files whose size is less than a few hundred Bytes to increase throughput. The dynamic file migration is performed to balance the storage usage of distributed file systems without throttling performance. We have implemented HybridFS in java on eight nodes and choose Ceph, HDFS, and GlusterFS as designated distributed file systems. The experimental results show that, in the best case, HybridFS can have up to 30% performance improvement of read/write operations over a single distributed file system. In addition, if the difference of storage usage among multiple distributed file systems is less than 40%, the performance of HybridFS is guaranteed, that is, no performance degradation.
Published: 2017

212. File placement mechanisms for improving write throughputs of cloud storage services based on Ceph and HDFS

Author: Yeh-Ching Chung, Tse-Chuan Hsu, Hongji Yang, and Chun-Feng Wu
Subjects: Computer science, Computer file, Stub file, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, File Control Block, Self-certifying File System, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Versioning file system, 020201 artificial intelligence & image processing, Cloud storage, computer, Flash file system, File system fragmentation
Abstract: Cloud storage services are pervasive nowadays. Many cloud storage services use distributed file systems as their backend storage systems. Some research results on file-size distribution of file systems show that file systems contain lots of small files. Therefore, this paper proposed a hybrid distributed file system based on Ceph and HDFS that can deliver satisfactory write throughputs for a cloud storage system with 80–90% small files and 10–20% large files. The experimental results show that the file allocation mechanism without RAM disk caching can improve the write throughputs of Ceph and HDFS by approximately 10% to 50%. While the one with RAM disk caching can have up to 200% write throughputs improvement.
Published: 2017
Full Text: View/download PDF

213. Improving GPU Memory Performancewith Artificial Barrier Synchronization

Author: I-Hsin Chung, Quey-Liang Kao, Che-Rung Lee, Shih-Hsiang Lo, and Yeh-Ching Chung
Subjects: Random access memory, Computer science, Distributed computing, Locality, Thread (computing), Parallel computing, Memory barrier, Synchronization, Instruction set, Data access, Computational Theory and Mathematics, Hardware and Architecture, CUDA Pinned memory, Signal Processing, Synchronization (computer science), Data synchronization, Memory model
Abstract: Barrier synchronization, an essential mechanism for a block of threads to guard data consistency, is regarded as a threat to performance. This study, however, provides a different viewpoint for barrier synchronization on GPUs: adding barrier synchronization, even when functionally unnecessary, can improve the performance of some memory-intensive applications. We explain this phenomenon using a memory contention model in which artificial barrier synchronization helps reduce memory contention and preserve data access locality. To yield practical applications, we identify a program pattern: artificial barrier synchronization can be used to synchronize the memory accesses when the data locality among threads is violated. Empirical results from three real-world applications demonstrate that artificial barrier synchronization can increase performance by 10 to 20 percent.
Published: 2014
Full Text: View/download PDF

214. Efficient and Retargetable Dynamic Binary Translation on Multicores

Author: Yeh-Ching Chung, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Pangfeng Liu, Ding-Yong Hong, Chien-Min Wang, and Wei-Chung Hsu
Subjects: Multi-core processor, Speedup, Computer science, Binary translation, Parallel computing, computer.software_genre, Instruction set, Computational Theory and Mathematics, Hardware and Architecture, Virtual machine, Multithreading, Signal Processing, Scalability, Overhead (computing), x86, Instrumentation (computer programming), computer
Abstract: Dynamic binary translation (DBT) is a core technologyto many important applications such as system virtualization, dynamic binary instrumentation, and security. However, there are several factors that often impede its performance: 1) emulation overhead before translation; 2) translation and optimization overhead; and 3) translated code quality. The issues also include its retargetabilitythat supports guest applications from different instruction-set architectures (ISAs) to host machines also with different ISAs-an important feature to system virtualization. In this work, we take advantage of the ubiquitous multicore platforms, and use a multithreaded approach to implement DBT. By running the translator and the dynamic binary optimizer on different cores with different threads, it could off-load the overhead incurred by DBT on the target applications; thus, afford DBT of more sophisticated optimization techniques as well as its retargetability. Using QEMU (a popular retargetable DBT for system virtualization) and Low-Level Virtual Machine (LLVM) as our building blocks, we demonstrated in a multithreaded DBT prototype, called Hybrid-QEMU (HQEMU), that it could improve QEMU performance by a factor of 2.6x and 4.1x on the SPEC CPU2006 integer and floating point benchmarks, respectively, for dynamic translation of x86 code to run on x86-64 platforms. For ARM codes to x86-64 platforms, HQEMU can gain a factor of 2.5x speedup over QEMU for the SPEC CPU2006 integer benchmarks. We also address the performance scalability issue of multithreaded applications across ISAs. We identify two major impediments to performance scalability in QEMU: 1) coarse-grained locks used to protect shared data structures, and 2) inefficient emulation of atomic instructions across ISAs. We proposed two techniques to mitigate those problems: 1) using indirect branch translation caching (IBTC) to avoid frequent accesses to locks, and 2) using lightweight memory transactions to emulate atomic instructions across ISAs. Our experimental results show that for multithread applications, HQEMU achieves 25X speedups over QEMU for the PARSEC benchmarks.
Published: 2014
Full Text: View/download PDF

215. ANG - a combination of Apriori and graph computing techniques for frequent itemsets mining

Author: Wenguang Chen, Hongji Yang, Tse-Chuan Hsu, Rui Zhang, and Yeh-Ching Chung
Subjects: Connected component, 020203 distributed computing, Apriori algorithm, Theoretical computer science, Association rule learning, Computer science, InformationSystems_DATABASEMANAGEMENT, 02 engineering and technology, Graph, Theoretical Computer Science, Vertex (geometry), Test case, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, Trie, 0202 electrical engineering, electronic engineering, information engineering, A priori and a posteriori, Graph (abstract data type), Software, Information Systems, FSA-Red Algorithm
Abstract: The Apriori algorithm is one of the most well-known and widely accepted methods for the association rule mining. In Apriori, it uses a prefix tree to represent k-itemsets, generates k-itemset candidates based on the frequent ( $$k-1$$ )-itemsets, and determines the frequent k-itemsets by traversing the prefix tree iteratively based on the transaction records. When k is small, the execution of Apriori is very efficient. However, the execution of Apriori could be very slow when k becomes large because of the deeper recursion depth to determine the frequent k-itemsets. From the perspective of graph computing, the transaction records can be converted to a graph $$G (V,\, E)$$ , where V is the set of vertices of G that represents the transaction records and E is the set of edges of G that represents the relations among transaction records. Each k-itemset in the transaction records will have a corresponding connected component in G. The number of vertices in the corresponding connected component is the support of the k-itemset. Since the time to find the corresponding connected component of a k-itemset in G is constant for any k, the graph computing method will be very efficient if the number of k-itemsets is relatively small. Based on Apriori and graph computing techniques, a hybrid method, called Apriori and Graph Computing (ANG), is proposed to compute the frequent itemsets. Initially, ANG uses Apriori to compute the frequent k-itemsets and then switches to the graph computing method when k becomes large (where the number of k-itemset candidates is relatively small). The experimental results show that ANG outperforms both Apriori and the graph computing method for all test cases.
Published: 2017

216. Byzantine Fault Tolerant Optimization in Federated Cloud Computing

Author: Ching-Hsien Hsu, Mahdis Moradi, Yeh-Ching Chung, Jerry Chou, and Hojjat Baghban
Subjects: business.industry, Computer science, Distributed computing, 05 social sciences, 050301 education, Information technology, 020207 software engineering, Fault tolerance, Cloud computing, 02 engineering and technology, Quantum Byzantine agreement, Software, Cloud testing, Server, 0202 electrical engineering, electronic engineering, information engineering, business, 0503 education, Byzantine fault tolerance, Computer network
Abstract: Cloud computing is a forthcoming revolution in information technology industry because of its performance, accessibility and, low cost. It is appropriate to maximize the capacity or step up capabilities vigorously without investing in new infrastructure, nurturing new personnel or licensing new software. The federated cloud, which is the combination of more than one cloud, is the next logical step after hybrid cloud and there are many indicators that are showing more requirements for such a model. Security is the challenging issue in all cloud infrastructures such as single cloud and federated cloud. And, it is significant in distributed systems which have highly fault tolerant. One of the algorithms for this issue is Byzantine Fault Tolerant. This paper introduces a new method, that optimizes the Byzantine Fault Tolerant and decrease the latency, and detecting the number of faults.
Published: 2016
Full Text: View/download PDF

217. PROAR: A Weak Consistency Model for Ceph

Author: Yongwei Wu, Yeh-Ching Chung, and Jiayuan Zhang
Subjects: 020203 distributed computing, Weak consistency, Computer science, business.industry, Distributed computing, Hash function, Consistency model, 020206 networking & telecommunications, 02 engineering and technology, Commit, Replication (computing), Object storage, Computer data storage, Node (computer science), 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Latency (engineering), business
Abstract: The primary-copy consistency model used in Ceph cannot satisfy the low latency requirement of write operation required by users. In this paper, we propose a weak consistency model, PROAR, based on a distributed hash ring mechanism to allow clients to only commit data to the primary node and synchronize data to replication nodes asynchronously in Ceph. Based on the distributed hash ring mechanism, the low latency requirement of write operation can be met. In addition, the workload of the primary node can be reduced while that of replication nodes can be more balanced. We have evaluated the proposed scheme on a Ceph storage system with 3 storage nodes. The experimental results show that PROAR can reduce about 50% write overhead compared to that of Ceph and has a more balanced workload around all the replication nodes.
Published: 2016
Full Text: View/download PDF

218. Modeling of Resource Granularity and Utilization with Virtual Machine Splitting

Author: Jerry Chou, Yeh-Ching Chung, Hojjat Baghban, and Ching-Hsien Hsu
Subjects: 020203 distributed computing, Computer science, Full virtualization, business.industry, Distributed computing, Temporal isolation among virtual machines, Cloud computing, 02 engineering and technology, Virtualization, computer.software_genre, Resource (project management), Virtual machine, Server, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data center, business, computer
Abstract: The increasing trend in IT users and their needs for computational power in cloud data centers leads to noticeable growth in physical servers. It is a challenging issue which causes the dramatic burden of power consumption and the number of Physical machines. Virtualization is remarkable method for reducing the number of physical servers with appropriate processing performance and utilization. But, it is worth saying that the fulfilling the resource utilization is still one of the significant challenging issue, especially in in data centers environment. Actually, there are some applications situated on a large single virtual machine. One way to guarantee the reasonable physical server utilization is to let the application to be split and hosted on smaller virtual machines with the sufficient computational power. Although exploiting multiple small virtual machine instead of one large virtual machine benefits appropriate physical resources utilization and reducing the number of turn on physical machine, it is sustained penalty in terms of demanding extra resources due to map the applications on new virtual machines. However, existing research have not clarified precisely the reason in terms of that the data center is sustained extra resources and computational power overhead due to splitting the original application and exploiting more smaller virtual machines provided to preserve the criteria of the original application on the large virtual machine. This paper demonstrates through mathematical modelling that the physical resource providers, which are situated in cloud data center, endure the penalty in terms of extra physical resources. The mentioned mathematical modeling in this paper will be noticeable in cloud data center energy efficiency and physical resource utilization performance.
Published: 2016
Full Text: View/download PDF

219. HIERARCHICAL MAPPING FOR HPC APPLICATIONS

Author: Jiazheng Zhou, I-Hsin Chung, Yeh-Ching Chung, and Che-Rung Lee
Subjects: Strongly connected component, Computational complexity theory, Spectral graph theory, ScaLAPACK, Computer science, Distributed computing, Computation, Graph theory, Parallel computing, Supercomputer, Matrix multiplication, Theoretical Computer Science, Kernel (image processing), Hardware and Architecture, Scalability, Software
Abstract: As the high performance computing systems scale up, mapping the tasks of a parallel application onto physical processors to allow efficient communication becomes one of the critical performance issues. Existing algorithms were usually designed to map applications with regular communication patterns. Their mapping criterion usually overlooks the size of communicated messages, which is the primary factor of communication time. In addition, most of their time complexities are too high to process large scale problems. In this paper, we present a hierarchical mapping algorithm (HMA), which is capable of mapping applications with irregular communication patterns. It first partitions tasks according to their run-time communication information. The tasks that communicate with each other more frequently are regarded as strongly connected. Based on their connectivity strength, the tasks are partitioned into supernodes based on the algorithms in spectral graph theory. The hierarchical partitioning reduces the mapping algorithm complexity to achieve scalability. Finally, the run-time communication information will be used again in fine tuning to explore better mappings. With the experiments, we show how the mapping algorithm helps to reduce the point-to-point communication time for the PDGEMM, a ScaLAPACK matrix multiplication computation kernel, up to 20% and the AMG2006, a tier 1 application of the Sequoia benchmark, up to 7%.
Published: 2011
Full Text: View/download PDF

220. Community-Based M2M Framework Using Smart/HetNet Gateways for Internet of Things

Author: Cheng-Hsin Hsu, Yeh-Ching Chung, Yi-Lan Lin, and Wu-Chun Chung
Subjects: Computer science, business.industry, Logic gate, Telecommunications link, Mobile computing, Wireless, Cloud computing, Mobile telephony, business, Internet of Things, Heterogeneous network, Computer network
Abstract: In order to manage the Internet of things in a flexible and efficient way, this paper proposes a novel M2M framework using smart/HetNet gateways. Our approach is not only compatible to the M2M standard, but also enables the community-based coordination among gateways and devices. With smart and HetNet gateways, various types of requirements and applications can be fulfilled and handled at a local region. Accordingly, unnecessary network usage is avoided so as to reduce the traffic loads in mobile networks. We also implement a prototype to sustain an application scenario of IoT. In our prototype, the lamp is automatically switched on when a human face is detected. The demonstration shows that our system is practical and supports the subscription and notification in a community. Finally, experimental results reveal that some basic procedures can benefit from the shorter time and less uplink traffic if devices involved in an application are within the same region.
Published: 2015
Full Text: View/download PDF

221. Message from DataCom 2015 Chairs

Author: Geoffrey Fox, Yeh-Ching Chung, Hao Wang, Beniamino Di Martino, Christophe Cérin, Weizhe Zhang, DataCom 2015, Fox, Geoffrey, Chung, Yeh Ching, DI MARTINO, Beniamino, Cérin, Christophe, Zhang, Weizhe, and Wang, Hao
Subjects: Signal processing, Sociology and Political Science, Multimedia, Computer science, Urban studies, Information and Computer Science, Computer Science Applications1707 Computer Vision and Pattern Recognition, Information System, computer.software_genre, Urban Studies, Modeling and simulation, Computer Networks and Communication, Human–computer interaction, Modeling and Simulation, Signal Processing, Media Technology, Information system, computer
Published: 2015
Full Text: View/download PDF

222. Minimizing Latency of Real-Time Container Cloud for Software Radio Access Networks

Author: Yeh-Ching Chung, Cheng-Hsin Hsu, Wu-Chun Chung, Shu-Ting Wang, Satyajit Padhy, Mu-Han Huang, and Chen-Nien Mao
Subjects: Software, Access network, business.industry, Computer science, Server, Cloud testing, Packet processing, Testbed, Cellular network, Cloud computing, Software-defined radio, business, Computer network
Abstract: As the huge growth of mobile traffic amount, conventional Radio Access Networks (RANs) suffer from high capital and operating expenditures, especially when new cellular standards are deployed. Software, and cloud RANs have been proposed, but the stringent latency requirements e.g., 1 ms transmission time interval, dictated by cellular networks is difficult to satisfy. We first present a real software RAN testbed based on an opensource LTE implementation. We also investigate the issue of quality assurance when deploying such software RANs in cloud. In particular, running software RANs in cloud leads to high latency, which may violate the latency requirements. We empirically study the problem of minimizing computational and networking latencies in lightweight container cloud. Our experiment results show the feasibility of running software RANs in real-time container cloud. More specifically, a feasible solution to host software RANs in cloud is to adopt lightweight containers with real-time kernels and fast packet processing networking.
Published: 2015
Full Text: View/download PDF

223. Hardware Thread-Level Speculation Performance Analysis

Author: Che-Rung Lee, Michael P. Perrone, Ying-Chieh Wang, Yeh-Ching Chung, and I-Hsin Chung
Subjects: Record locking, Computer science, business.industry, Distributed computing, Thread (computing), Software_PROGRAMMINGTECHNIQUES, Lock (computer science), Instruction set, Consistency (database systems), Embedded system, Programming paradigm, Performance prediction, Overhead (computing), business
Abstract: This paper presents performance analysis for hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer. Unlike traditional multi-thread programming model which uses lock to ensure the consistency of shared data, TLS is a harware mechanism to detect and resolve memory access conflicts among threads. The model shows good performance prediction, as verified by the experiments. This study helps to understand potential gains from using special purpose TLS hardware to accelerate the performance of codes that, in a strict sense, require serial processing to avoid memory conflicts. Furthermore, based on analysis and measurements of the TLS behavior and its overhead together with OpenMP comparison, a strategy is proposed to help utilize this hardware feature. The results also suggest potential improvement for the future TLS architectural designs.
Published: 2015
Full Text: View/download PDF

224. Efficient data compression methods for multidimensional sparse array operations based on the ekmr scheme

Author: Yeh-Ching Chung, Chun-Yuan Lin, and Jen-Shiuh Liu
Subjects: Theoretical computer science, Computational complexity theory, Matrix multiplication, Theoretical Computer Science, Sparse array, Computational Theory and Mathematics, Hardware and Architecture, Multiplication, Karnaugh map, Algorithm, Time complexity, Software, Sparse matrix, Mathematics, Data compression
Abstract: We have proposed the extended Karnaugh map representation (EKMH) scheme for multidimensional array representation. We propose two data compression schemes, EKMR compressed row/column storage (ECRS/ECCS), for multidimensional sparse arrays based on the EKMR scheme. To evaluate the proposed schemes, we compare them to the CRS/CCS schemes. Both theoretical analysis and experimental tests were conducted. In the theoretical analysis, we analyze the CRS/CCS and the ECRS/ECCS schemes in terms of the time complexity, the space complexity, and the range of their usability for practical applications. In experimental tests, we compare the compressing time of sparse arrays and the execution time of matrix-matrix addition and matrix-matrix multiplication based on the CRS/CCS and the ECRS/ECCS schemes. The theoretical analysis and experimental results show that the ECRS/ECCS schemes are superior to the CRS/CCS schemes for all the evaluated criteria, except the space complexity in some case.
Published: 2003
Full Text: View/download PDF

225. Efficient data parallel algorithms for multidimensional array operations based on the ekmr scheme for distributed memory multicomputers

Author: Jen-Shiuh Liu, Yeh-Ching Chung, and Chun-Yuan Lin
Subjects: Fortran, Computer science, Intrinsic function, Computation, Matrix representation, Parallel algorithm, Parallel computing, Finite element method, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, Distributed memory, Multiplication, Karnaugh map, computer, computer.programming_language
Abstract: Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases.
Published: 2003
Full Text: View/download PDF

226. A differential volume rendering method with second-order difference for time-varying volume data

Author: Jim Z. C. Lai, Yeh-Ching Chung, Shih-Kuan Liao, and Chin-Feng Lin
Subjects: Pixel, business.industry, Computer science, Computation, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Volume rendering, Image plane, computer.software_genre, Language and Linguistics, Computer Science Applications, Rendering (computer graphics), Human-Computer Interaction, Voxel, Ray casting, Data file, Computer vision, Artificial intelligence, business, computer, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: The differential volume rendering method is a ray casting based method for time-varying volume data. In the differential volume rendering method, the changed voxels between consecutive time steps are extracted to form differential files in advance. When the dataset is to be rendered, changed voxels are projected onto the image plane to determine the positions of changed pixels. Only the changed pixels, instead of all pixels on the image, are updated by casting new rays in each time step. The main overhead of the differential volume rendering method is the determination of changed pixels. In this paper, we propose a two-level differential volume rendering method, in which the determination of changed pixels is accelerated by the aid of the second-order difference. Since changed voxels in two consecutive differential files may partially overlap in the space, the projection computation spent on the overlapped area is redundant. We use this property to extract the difference of changed voxels between consecutive differential files to form the second-order difference. Based on the second-order difference, the changed pixels can be determined more efficiently. The experimental results show that the proposed method outperforms the comparative methods for all test datasets in most cases. In addition, the rendering time can be predicted once the data files are loaded in each time step.
Published: 2003
Full Text: View/download PDF

227. Evaluation of Inter-Cell Interference Coordination with CAP model

Author: Xibin Xu, Shih-Chang Chen, Yeh-Ching Chung, Wei Tan, Jing Wang, Zhigang Tian, Wenqi Wang, Ming Zhao, and Yida Xu
Subjects: business.industry, Computer science, Distributed computing, High availability, Radio spectrum management, Cellular network, The Internet, business, Partition (database), CAP theorem, 5G, Computer network
Abstract: More cooperation or coordination is necessary for future UDN (Ultra-Dense Network) scenario of 5G network, of which ICIC (Inter-Cell Interference Coordination) is a promising and typical scheme. Cooperating cellular network may be thought as a special kind of distributed computing system, whose basic theory and toolsets could be adopted. CAP theory for distributed computing state that any networked shared-data system can have at most two of three desirable properties: consistency (C), high availability (A) and tolerance to network partitions (P) of the data. Since partition could not be avoided in such system, consistency or availability has to be forfeited to a certain extent. This paper extends the CAP theorem to the UDN case, and redefines the consistency, availability and partition-tolerance in the context of future cellular network architecture. Then ICIC process are interpreted with cellular CAP theorem, and the evaluation framework is developed. An improved ICIC scheme is proposed based on such framework.
Published: 2015
Full Text: View/download PDF

228. Computing Traffic Information in the Cloud

Author: Po-Ting Wei, Tai-Chi Wang, Shih-Yu Chang, and Yeh-Ching Chung
Abstract: Vehicular ad hoc networks have been envisioned to be useful in road safety and commercial applications. In addition, in-vehicle capabilities could be used as a service to provide a variety of applications, for example, to provide real-time junction view of road intersections or to address traffic status for advanced traffic light control. In this work, the authors construct a cloud service over vehicular ad hoc networks to provide event data including capturing videos or Global Positioning System (GPS) data. Moreover, the authors integrate the GPS receiver and the navigation software equipped over On Board Unit to create a Geographic Information System digital map and to offer a traffic safety application. The hardware is implemented by Eeepad for integrating camera and GPS. Furthermore, the cyclic recording scheme has been addressed for data transmission and query. With the design, people can get real-time traffic information including traffic videos or geographical data in the cloud.
Published: 2015
Full Text: View/download PDF

229. [Untitled]

Author: Don-Lin Yang, Yeh-Ching Chung, and Ching-Feng Lin
Subjects: business.industry, Computer science, Parallel algorithm, Volume rendering, Image segmentation, Parallel computing, Computational fluid dynamics, Theoretical Computer Science, Rendering (computer graphics), Data visualization, Factorization, Hardware and Architecture, Distributed memory, business, Software, Information Systems
Abstract: 3-D data visualization is very useful for medical imaging and computational fluid dynamics. Volume rendering can be used to exhibit the shape and volumetric properties of 3-D objects. However, volume rendering requires a considerable amount of time to process the large volume of data. To deliver the necessary rendering rates, parallel hardware architectures such as distributed memory multicomputers offer viable solutions. The challenge is to design efficient parallel algorithms that utilize the hardware parallelism effectively. In this paper, we present two efficient parallel volume rendering algorithms, the 1D-partition and 2D-partition methods, based on the shear-warp factorization for distributed memory multicomputers. The 1D-partition method has a performance bound on the size of the volume data. If the number of processors is less than a threshold, the 1D-partition method can deliver a good rendering rate. If the number of processors is over a threshold, the 2D-partition method can be used. To evaluate the performance of these two algorithms, we implemented the proposed methods along with the slice data partitioning, volume data partitioning, and sheared volume data partitioning methods on an IBM SP2 parallel machine. Six volume data sets were used as the test samples. The experimental results show that the proposed methods outperform other compatible algorithms for all test samples. When the number of processors is over a threshold, the experimental results also demonstrate that the 2D-partition method is better than the 1D-partition method.
Published: 2002
Full Text: View/download PDF

230. Correlation Aware Technique for SQL to NoSQL Transformation

Author: Jen Chun Hsu, Yeh-Ching Chung, Ching-Hsien Hsu, and Shih Chang Chen
Subjects: SQL, Database, Computer science, business.industry, View, Data transformation, Big data, NoSQL, computer.software_genre, Database tuning, Data-intensive computing, Table (database), Data mining, business, computer, computer.programming_language
Abstract: For better efficiency of parallel and distributed computing, Apache Hadoop distributes the imported data randomly on data nodes. This mechanism provides some advantages for general data analysis. With the same concept Apache Sqoop separates each table into four parts and randomly distributes them on data nodes. However, there is still a database performance concern with this data placement mechanism. This paper proposes a Correlation Aware method on Sqoop (CA_Sqoop) to improve the data placement. By gathering related data as closer as it could be to reduce the data transformation cost on the network and improve the performance in terms of database usage. The CA_Sqoop also considers the table correlation and size for better data locality and query efficiency. Simulation results show that data locality of CA_Sqoop is two times better than that of original Apache Sqoop.
Published: 2014
Full Text: View/download PDF

231. Performance Modeling for Hardware Thread-Level Speculation

Author: Michael P. Perrone, I-Hsin Chung, Yeh-Ching Chung, Ying-Chieh Wang, and Che-Rung Lee
Subjects: Instruction set, Hardware thread, business.industry, Computer science, Embedded system, Performance prediction, Overhead (computing), IBM, business, Speculation, Computer hardware
Abstract: This paper presents a preliminary performance model for hardware Thread-Level Speculation (TLS) in the IBM Blue Gene/Q computer. The model analyzes the TLS behavior and its overhead. We model the scenario when there are 0, 1 and 2 conflicts. The model shows good performance prediction and is verified with experiments. This study helps to understand potential gains from using special purpose TLS hardware to accelerate the performance of codes that, in a strict sense, require serial processing to avoid memory conflicts.
Published: 2014
Full Text: View/download PDF

232. Taiwan UniCloud: A Cloud Testbed with Collaborative Cloud Services

Author: Yeh-Ching Chung, Kuan-Chou Lai, Wu Chun Chung, Kuan-Ching Li, Po Chi Shih, Jerry Chou, Che-Rung Lee, and Ching-Hsien Hsu
Subjects: Cloud computing security, Database, Computer science, business.industry, Testbed, Cloud computing, computer.software_genre, Single-chip Cloud Computer, Resource (project management), Cloud testing, Community cloud, User interface, business, computer
Abstract: This paper introduces a prototype of Taiwan UniCloud, a community-driven hybrid cloud platform for academics in Taiwan. The goal is to leverage resources in multiple clouds among different organizations. Each self-managing cloud can join the UniCloud platform to share its resources and simultaneously benefit from other clouds with scale-out capabilities. Accordingly, resources are elastic and sharable with each other such as to afford unexpected resource demands to each cloud. The proposed platform provides a web portal to operate each cloud via a uniform user interface. The construction of virtual clusters with multi-core VMs is supplied for parallel and distributed processing models. An object-based storage system is also delivered to federate different storage providers. This paper not only presents the architectural design of Taiwan UniCloud, but also evaluates the performance to demonstrate the possibility of current implementation. Experimental results show the feasibility of the proposed platform as well as the benefit from the cloud federation.
Published: 2014
Full Text: View/download PDF

233. A binomial tree based parallel load‐balancing method for solution‐adaptive finite element graphs on distributed memory multicomputers

Author: William C. Chu, Yeh-Ching Chung, Don-Lin Yang, and Ching-Jung Liao
Subjects: Prefix code, Fractal tree index, K-ary tree, Theoretical computer science, Trie, Segment tree, General Engineering, Gomory–Hu tree, Parallel computing, Interval tree, Search tree, Mathematics
Abstract: In this paper, we propose a binomial tree based parallel load‐balancing method (BINOTPLB) to deal with the load imbalance of solution‐adaptive finite element application programs on distributed memory multicomputers. The main idea of the BINOTPLB method is first to construct a binomial tree based condensed processor graph. Based on the condensed processor graph, a prefix code tree is built. From the prefix code tree, a schedule for performing load transfer among processors can be determined by concurrently and recursively dividing the prefix code tree into two subtrees and finding a maximum matching for processors in the two subtrees until the leaves are reached. Since each leaf is a binomial tree and a binomial tree can also be divided into two equal halves of binomial trees, the approach used to determine the schedule of a prefix code tree could also be applied to the binomial trees. We have implemented the BINOTPLB method on an SP2 parallel machine and compared its performance with two load‐ba...
Published: 2001
Full Text: View/download PDF

234. [Untitled]

Author: Don-Lin Yang, Yeh-Ching Chung, and Jen-Chih Yu
Subjects: Parallel rendering, Computer science, Volume rendering, Parallel computing, Theoretical Computer Science, Rendering (computer graphics), Hardware and Architecture, Run-length encoding, Compositing, sort, Distributed memory, Minimum bounding rectangle, Software, Information Systems
Abstract: In the sort-last-sparse parallel volume rendering system on distributed memory multicomputers, one can achieve a very good performance improvement in the rendering phase by increasing the number of processors. This is because each processor can render images locally without communicating with other processors. However, in the compositing phase, a processor has to exchange local images with other processors. When the number of processors exceeds a threshold, the image compositing time becomes a bottleneck. In this paper, we propose three compositing methods to efficiently reduce the compositing time in parallel volume rendering. They are the binary-swap with bounding rectangle (BSBR) method, the binary-swap with run-length encoding and static load-balancing (BSLC) method, and the binary-swap with bounding rectangle and run-length encoding (BSBRC) method. The proposed methods were implemented on an SP2 parallel machine along with the binary-swap compositing method. The experimental results show that the BSBRC method has the best performance among these four methods.
Published: 2001
Full Text: View/download PDF

235. [Untitled]

Author: Ching-Jung Liao, Don-Lin Yang, and Yeh-Ching Chung
Subjects: Prefix code, Matching (graph theory), Hardware and Architecture, Computer science, Distributed memory, Parallel computing, Load balancing (computing), Software, Graph, Finite element method, Information Systems, Theoretical Computer Science
Abstract: In this paper, we propose a prefix code matching parallel load-balancing method (PCMPLB) to efficiently deal with the load imbalance of solution-adaptive finite element application programs on distributed memory multicomputers. The main idea of the PCMPLB method is first to construct a prefix code tree for processors. Based on the prefix code tree, a schedule for performing load transfer among processors can be determined by concurrently and recursively dividing the tree into two subtrees and finding a maximum matching for processors in the two subtrees until the leaves of the prefix code tree are reached. We have implemented the PCMPLB method on an SP2 parallel machine and compared its performance with two load-balancing methods, the directed diffusion method and the multilevel diffusion method, and five mapping methods, the AE/ORB method, the AE/MC method, the MLkP method, the PARTY library method, and the JOSTLE-MS method. An unstructured finite element graph Truss was used as a test sample. During the execution, Truss was refined five times. Three criteria, the execution time of mapping/load-balancing methods, the execution time of an application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for an application program, are used for the performance evaluation. The experimental results show that (1) if a mapping method is used for the initial partitioning and this mapping method or a load-balancing method is used in each refinement, the execution time of an application program under a load-balancing method is less than that of the mapping method. (2) The execution time of an application program under the PCMPLB method is less than that of the directed diffusion method and the multilevel diffusion method.
Published: 2000
Full Text: View/download PDF

236. [Untitled]

Author: Ching-Hsien Hsu, Chyi-Ren Dow, and Yeh-Ching Chung
Subjects: Unpacking, Sparse array, Hardware and Architecture, Computer science, Computation, Locality, Distributed memory, Parallel computing, Software, Information Systems, Theoretical Computer Science
Abstract: In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access on distributed memory multicomputers. Since the redistribution is performed at run-time, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods for multi-dimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimension calculation (CDC) techniques. We also developed a theoretical model to analyze the computation costs of these two techniques. The theoretical model shows that the BBC method has smaller indexing costs and performs well for the redistribution with small array size. The CDC method has smaller packing/unpacking costs and performs well when array size is large. When implemented these two techniques on an IBM SP2 parallel machine along with the PITFALLS method and the Prylli's method, the experimental results show that the BBC method has the smallest execution time of these four algorithms when the array size is small. The CDC method has the smallest execution time of these four algorithms when the array size is large.
Published: 2000
Full Text: View/download PDF

237. [Untitled]

Author: Chih-Chang Chen, Ching-Jung Liao, Don-Lin Yang, and Yeh-Ching Chung
Subjects: Computer science, Graph partition, Parallel computing, Solver, Load balancing (computing), Partition (database), Graph, Finite element method, Theoretical Computer Science, Hardware and Architecture, Partition (number theory), Distributed memory, Heuristics, Software, Information Systems
Abstract: To efficiently execute a finite element application program on a distributed memory multicomputer, we need to distribute nodes of a finite element graph to processors of a distributed memory multicomputer as evenly as possible and minimize the communication cost of processors. This partitioning problem is known to be NP-complete. Therefore, many heuristics have been proposed to find satisfactory sub-optimal solutions. Based on these heuristics, many graph partitioners have been developed. Among them, Jostle, Metis, and Party are considered as the best graph partitioners available up-to-date. For these three graph partitioners, in order to minimize the total cut-edges, in general, they allow 3% to 5% load imbalance among processors. This is a tradeoff between the communication cost and the computation cost of the partitioning problem. In this paper, we propose an optimization method, the dynamic diffusion method (DDM), to balance the 3% to 5% load imbalance allowed by these three graph partitioners while minimizing the total cut-edges among partitioned modules. To evaluate the proposed method, we compare the performance of the dynamic diffusion method with the directed diffusion method and the multilevel diffusion method on an IBM SP2 parallel machine. Three 2D and two 3D irregular finite element graphs are used as test samples. For each test sample, 3% and 5% load imbalance situations are tested. From the experimental results, we have the following conclusions. (1) The dynamic diffusion method can improve the partition results of these three partitioners in terms of the total cut-edges and the execution time of a Laplace solver in most test cases while the directed diffusion method and the multilevel diffusion method may fail in many cases. (2) The optimization results of the dynamic diffusion method are better than those of the directed diffusion method and the multilevel diffusion method in terms of the total cut-edges and the execution time of a Laplace solver for most test cases. (3) The dynamic diffusion method can balance the load of processors for all test cases.
Published: 2000
Full Text: View/download PDF

238. [Untitled]

Author: Yeh-Ching Chung and Ching-Hsien Hsu
Subjects: Unpacking, Hardware and Architecture, Computer science, Distributed memory, Parallel computing, Construct (python library), Software, Information Systems, Theoretical Computer Science
Abstract: Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient algorithms for BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BLOCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution. The most significant improvement of our methods is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information that derived from the BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) redistribution and vice versa, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the Thakur‘s methods and the PITFALLS method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the Thakur‘s methods and the PITFALLS method for all test samples. This result encourages us to use the proposed algorithms for array redistribution.
Published: 1998
Full Text: View/download PDF

239. Dynamic Data Partitioning and Virtual Machine Mapping: Efficient Data Intensive Computation

Author: Yeh-Ching Chung, Ching-Hsien Hsu, and Kenn Slagter
Subjects: Computer science, business.industry, Dynamic data, Node (networking), Big data, Cloud computing, Parallel computing, computer.software_genre, Task (computing), Virtual machine, Programming paradigm, Data-intensive computing, business, computer
Abstract: Big data refers to data that is so large that it exceeds the processing capabilities of traditional systems. Big data can be awkward to work and the storage, processing and analysis of big data can be problematic. MapReduce is a recent programming model that can handle big data. MapReduce achieves this by distributing the storage and processing of data amongst a large number of computers (nodes). However, this means the time required to process a MapReduce job is dependent on whichever node is last to complete a task. This problem is exacerbated by heterogeneous environments. In this paper we propose a method to improve MapReduce execution in heterogeneous environments. This is done by dynamically partitioning data during the Map phase and by using virtual machine mapping in the Reduce phase in order to maximize resource utilization.
Published: 2013
Full Text: View/download PDF

240. A parallel run‐time iterative load balancing algorithm for solution‐adaptive finite element meshes on hypercubes

Author: Yeh‐Ching Chung and Ming‐Lien Cheng
Subjects: Computer science, General Engineering, Graph (abstract data type), Polygon mesh, Parallel computing, Hypercube, Finite element program, Load balancing (computing), Algorithm, Finite element method
Abstract: To efficiently execute a finite element program on a hypercube, we need to map nodes of the corresponding finite element graph to processors of a hypercube such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If the number of nodes of a finite element graph will not be increased during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution‐adaptive, that is, the number of nodes will be increased discretely due to the refinement of some finite elements during the execution of a program, a run‐time load balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In this paper, we propose a parallel iterative load balancing algorithm (ILB) to deal with the load imbalancing problem of a solution‐adaptive finite element program. The proposed algorithm has...
Published: 1996
Full Text: View/download PDF

241. A duplication heuristic for static scheduling of tasks on distributed memory multiprocessors

Author: Yeh-Ching Chung, Chia-Cheng Liu, and Jen-Shiuh Liu
Subjects: Computer science, Gene duplication, General Engineering, Heuristic programming, Distributed memory, Start time, Parallel computing, Scheduling (computing)
Abstract: A task duplication heuristic, DSH, was proposed in [11]. The underlying concept of the task duplication heuristic is duplicating some tasks on processors such that the earliest start time of tasks on processors can be reduced, that is, tasks on processors can be executed sooner. This leads to a better scheduling length. In this paper, we propose a more general task duplication heuristic, bottom‐up top‐down duplication heuristic (BTDH), for static scheduling of directed‐acyclic graphs (DAGs) on distributed memory multiprocessors. The key difference between BTDH and DSH is the method used for duplicating tasks. BTDH allows tasks to be duplicated on processors even though the duplication of tasks will temporarily increase the earliest start time of some tasks. DSH only allows those duplications which will reduce the earliest start time of tasks. Simulation results show that, for coarse‐grain DAGs, the scheduling length of BTDH is almost the same as the scheduling length of DSH. However, for medium‐g...
Published: 1995
Full Text: View/download PDF

242. PPT: A parallel programming tool for distributed memory multiprocessors

Author: Chia‐Cheng Liu, Yeh-Ching Chung, and Wu‐Hsun Ho
Subjects: business.industry, Computer science, Distributed computing, Computer programming, General Engineering, Multiprocessing, Parallel computing, Data structure, Multiprocessor scheduling, Scheduling (computing), Distributed memory, business, Programmer, SPMD
Abstract: Traditionally, to program a distributed memory multiprocessor, a programmer is responsible for partitioning an application program into modules or tasks, scheduling tasks on processors, inserting communication primitives, and generating parallel codes for each processor manually. As both the number of processors and the complexity of problems to be solved increases, programming distributed memory multiprocessors becomes difficult and error‐prone. In a distributed memory multiprocessor, the program partitioning and scheduling play an important role in the performance of a parallel program. However, how to find the best program partitioning and scheduling so that the best performance of a parallel program on a distributed memory multiprocessor can be achieved, is not an easy task. In this paper, we present a parallel programming tool, PPT, to aid programmers to find the best program partitioning and scheduling and automatically generate the parallel code for the single program multiple data (SPMD) ...
Published: 1995
Full Text: View/download PDF

243. InfiniBand virtualization on KVM

Author: Che-Rung Lee, Yeh-Ching Chung, and Yi-Man Ma
Subjects: Hardware_MEMORYSTRUCTURES, Computer science, business.industry, InfiniBand, Network virtualization, Cloud computing, Network interface, Supercomputer, computer.software_genre, Virtualization, Virtual machine, Embedded system, Operating system, Sockets Direct Protocol, business, computer
Abstract: With the ability to provide on-demand service and to reduce the IT cost, cloud computing becomes more and more popular recently. Virtualization is one of the important technologies in cloud computing, whose main idea is to provide abstractions of the physical resources. However, such abstraction can cause performance degradation, especially for I/O virtualization, which is usually the performance bottleneck in cloud computing. InfiniBand is a network system that provides very low latency (less than 5us) and very high bandwidth (multiple Gbps). Due to its excellent performance, InfiniBand is commonly used in high performance computing (HPC) area. In this paper, we propose Virt-IB for InfiniBand virtualization on Kernel-based Virtual Machine (KVM). The main components of Virt-IB are VM IB library and Virt-IB driver. Our design processes InfiniBand APIs directly in guest VM and communicates with InfiniBand device indirectly to perform the real operations. VM IB library provides API interface and user-level InfiniBand driver. Virt-IB driver provides a channel for VM IB library to write commands into InfiniBand device indirectly. Evaluation results show that our current work is better than network virtualization and it can achieve about 50% performance of native InfiniBand.
Published: 2012
Full Text: View/download PDF

244. Arrangement Graph-Based Overlay with Replica Mechanism for File Sharing

Author: Yeh-Ching Chung, Ssu-Hsuan Lu, Kuan-Chou Lai, and Kuan-Ching Li
Subjects: business.industry, Computer science, Replica, Distributed computing, Overlay network, Overlay, Peer-to-peer, computer.software_genre, File sharing, Graph (abstract data type), The Internet, Polling, business, computer, Computer network
Abstract: Over the past decade, the development of Internet technology has led to more awareness on the powerfulness of Peer-to-Peer (P2P) overlay network. How to efficiently establish or maintain overlay networks in large-scale environments always are important issues. Besides, the ways of improving routing efficiency also attract much attention. This study proposes a replica mechanism based on the Arrangement Graph-based Overlay (AGO), and enhances the joining procedures of the AGO. Enhanced AGO reduces system overhead by reducing large number of polling messages of the joining process. In addition, replica mechanism is also integrated into the enhanced AGO to buttress the efficiency of the searching algorithm. Experimental results of the enhanced AGO have demonstrated that efficient routing as well as less bandwidth consumption during communication can be realized.
Published: 2012
Full Text: View/download PDF

245. Message from the WEISS 2012 Workshop Chairs

Author: Jing Chen, Da-Wei Chang, Fahim Kawsar, Hsung-Pin Chang, Bin Guo, Taehyoun Kim, Sasikumar Punnekkat, Chung-Ping Young, Giusy Di Lorenzo, Yann Han Lee, Kaori Fujinami, Li Chi Feng, Yeh-Ching Chung, Minsoo Ryu, Zonghua Gu, Alejandro Masrur, Mei Ling Chiang, Daqiang Zhang, Tatsuo Nakajima, Hamid R. Sharifzadeh, Ian McLoughlin, Koji Nakano, Seongsoo Hong, Douglas L. Maskell, and Yunheung Paek
Subjects: Computer science, Library science
Published: 2012
Full Text: View/download PDF

246. GPU Performance Enhancement via Communication Cost Reduction: Case Studies of Radix Sort and WSN Relay Node Placement Problem

Author: Yeh-Ching Chung, Shih-Hsiang Lo, I-Hsin Chung, Nan-Hsi Chen, and Che-Rung Lee
Subjects: Cost reduction, Computer science, Radix sort, Graphics processing unit, Parallel computing, Central processing unit, Performance improvement, Bottleneck, Data compression, Data transmission
Abstract: As the computational power of Graphics Processing Unit (GPU) increases, data transmission becomes the major performance bottleneck. In this study, we investigate two techniques, data streaming and data compression, to reduce the communication cost on GPU. Data streaming enables overlap of communication and computation, whereas data compression reduces the data size transferred among different memory spaces. Although both techniques increase computation cost, overall performance can still be enhanced by reducing communication cost. We demonstrate the effectiveness of the two techniques via two case studies: radix sort and 3-star, a deployment algorithm in wireless sensor networks. For radix sort, a new algorithm, which mixes MSD and LSD algorithms and employs data streaming, is presented. Its performance is 25% faster than the fastest GPU radix sort implementation currently available in the public domain. For the 3-star algorithm, the speed increases several hundreds of times faster than that obtained by the CPU code. The data streaming and data compression, which is a hybrid CPU-GPU algorithm, provide an additional 54% performance improvement to the GPU implementation. Data compression not only reduces communication cost, but also improves the computation time, by which further performance enhancement can be achieved.
Published: 2012
Full Text: View/download PDF

247. Genetic copy number variants in myocardial infarction patients with hyperlipidemia

Author: Yu-Ming Tsao, Ching-Hui Huang, Wei-Chung Shia, Yeh-Ching Chung, Kae-Woei Liang, Tien-Hsiung Ku, Chien-Hsun Hsia, Yung-Ming Chang, Fang-Rong Hsu, and Shih-Lan Hsu
Subjects: Adult, Male, medicine.medical_specialty, lcsh:QH426-470, DNA Copy Number Variations, lcsh:Biotechnology, Myocardial Infarction, Genome-wide association study, Hyperlipidemias, Disease, Biology, Bioinformatics, Polymorphism, Single Nucleotide, Young Adult, Internal medicine, lcsh:TP248.13-248.65, Hyperlipidemia, medicine, Genetics, SNP, Humans, Myocardial infarction, Copy-number variation, cardiovascular diseases, Cause of death, Aged, Middle Aged, medicine.disease, Lipoproteins, LDL, lcsh:Genetics, Cholesterol, Proceedings, Cardiology, Myocardial infarction complications, Female, Biotechnology, Genome-Wide Association Study
Abstract: Background Cardiovascular disease is the chief cause of death in Taiwan and many countries, of which myocardial infarction (MI) is the most serious condition. Hyperlipidemia appears to be a significant cause of myocardial infarction, because it causes atherosclerosis directly. In recent years, copy number variation (CNV) has been analyzed in genomewide association studies of complex diseases. In this study, CNV was analyzed in blood samples and SNP arrays from 31 myocardial infarction patients with hyperlipidemia. Results We identified seven CNV regions that were associated significantly with hyperlipidemia and myocardial infarction in our patients through multistage analysis (PCDC73), 1q42.2 (DISC1), 3p21.31 (CDCP1), 10q11.21 (RET) 12p12.3 (PIK3C2G) and 16q23.3 (CDH13), respectively. In particular, the CNV region at 10q11.21 was examined by quantitative real-time PCR, the results of which were consistent with microarray findings. Conclusions Our preliminary results constitute an alternative method of evaluating the relationship between CNV regions and cardiovascular disease. These susceptibility CNV regions may be used as biomarkers for early-stage diagnosis of hyperlipidemia and myocardial infarction, rendering them valuable for further research and discussion.
Published: 2012

248. QoS-Based Job Scheduling and Resource Management Strategies for Grid Computing

Author: Po-Chi Shih, Kuo-Chan Huang, and Yeh-Ching Chung
Subjects: DRMAA, Job scheduler, Grid computing, Computer science, Distributed computing, Quality of service, Resource management, Flow shop scheduling, Dynamic priority scheduling, computer.software_genre, computer, Fair-share scheduling
Abstract: This chapter elaborates the quality of service (QoS) aspect of load sharing activities in a computational grid environment. Load sharing is achieved through appropriate job scheduling and resource allocation mechanisms. A computational grid usually consists of several geographically distant sites each with different amount of computing resources. Different types of grids might have different QoS requirements. In most academic or experimental grids the computing sites volunteer to join the grids and can freely decide to quit the grids at any time when they feel joining the grids bring them no benefits. Therefore, maintaining an appropriate QoS level becomes an important incentive to attract computing sites to join a grid and stay in it. This chapter explores the QoS issues in such type of academic and experimental grids. This chapter first defines QoS based performance metrics for evaluating job scheduling and resource allocation strategies. According to the QoS performance metrics appropriate grid-level load sharing strategies are developed. The developed strategies address both user-level and site-level QoS concerns. A series of simulation experiments were performed to evaluate the proposed strategies based on real and synthetic workloads.
Published: 2012
Full Text: View/download PDF

249. Direction-aware resource discovery service in large-scale grid and cloud computing

Author: Kuan-Ching Li, Wu-Chun Chung, Chin-Jung Hsu, Yeh-Ching Chung, and Kuan-Chou Lai
Subjects: Human resource management system, business.industry, Computer science, Distributed computing, Overlay network, Cloud computing, computer.software_genre, Grid, Grid computing, Robustness (computer science), Scalability, Information system, business, computer
Abstract: With the consideration of scalability and robustness, distributed computing systems such as grids and clouds may exploit the P2P approach to enhance their performance. However, conventional techniques in P2P systems cannot be applied directly into grid systems due to restricted sort of queries for desired resources. In this paper, we consider a fully decentralized resource discovery service based on an unstructured overlay, where the major challenge is to locate desired resources without the global knowledge about sharing resource information. Consequently, more nodes involved in the resource discovery scheme may incur higher network overhead. To achieve an efficient resource discovery, this paper aims to alleviate the network traffic among unstructured information systems. Relying on the information of resource attributes and characteristics, we propose the direction-aware resource discovery scheme to improve the overall performance. Experimental results illustrate that the proposed approach is efficient and scalable compared with conventional approaches.
Published: 2011
Full Text: View/download PDF

250. PQEMU: A Parallel System Emulator Based on QEMU

Author: Wei-Chung Hsu, Po-Chun Chang, Jiun-Hung Ding, and Yeh-Ching Chung
Subjects: Multi-core processor, Speedup, Parallel processing (DSP implementation), Computer science, business.industry, Operating system, Software development, computer.software_genre, business, Virtual platform, computer
Abstract: A full system emulator, such as QEMU, can provide a versatile virtual platform for software development. However, most current system simulators do not have sufficient support for multi-processor emulations to effectively utilize the underlying parallelism presented by today's multi-core processors. In this paper, we focus on parallelizing a system emulator and implement a prototype parallel emulator based on the widely used QEMU. Using this parallel QEMU, emulating an ARM11MPCore platform on a quad-core Intel i7 machine with the SPLASH-2 benchmarks, we have achieved 3.8x speedup over the original QEMU design. We have also evaluated and compared the performance impact of two different parallelization strategies, one with minimum sharing among emulated CPU, and one with maximum sharing.
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

499 results on '"Yeh-Ching Chung"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources