Author: "Mingxing Zhang" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mingxing Zhang"' showing total 10 results

Start Over Author "Mingxing Zhang" Publisher acm

10 results on '"Mingxing Zhang"'

1. NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing

Author: Shuke Wang, Mingxing Zhang, Ke Yang, Kang Chen, Shaonan Ma, Jinlei Jiang, and Yongwei Wu
Published: 2023

2. Achieving Sub-second Pairwise Query over Evolving Graphs

Author: Hongtao Chen, Mingxing Zhang, Ke Yang, Kang Chen, Albert Zomaya, Yongwei Wu, and Xuehai Qian
Published: 2023

3. Rethinking Open-World Object Detection in Autonomous Driving Scenarios

Author: Zeyu Ma, Yang Yang, Guoqing Wang, Xing Xu, Heng Tao Shen, and Mingxing Zhang
Published: 2022

4. AsymNVM

Author: Kang Chen, Xuehai Qian, Mingxing Zhang, Teng Ma, Song Zhuo, and Yongwei Wu
Subjects: 010302 applied physics, Remote direct memory access, Speedup, Computer science, Distributed computing, 020206 networking & telecommunications, 02 engineering and technology, Data structure, 01 natural sciences, Bottleneck, Replication (computing), Server, 0103 physical sciences, Memory architecture, 0202 electrical engineering, electronic engineering, information engineering, Persistent data structure
Abstract: The byte-addressable non-volatile memory (NVM) is a promising technology since it simultaneously provides DRAM-like performance, disk-like capacity, and persistency. The current NVM deployment with byte-addressability is \em symmetric, where NVM devices are directly attached to servers. Due to the higher density, NVM provides much larger capacity and should be shared among servers. Unfortunately, in the symmetric setting, the availability of NVM devices is affected by the specific machine it is attached to. High availability can be achieved by replicating data to NVM on a remote machine. However, it requires full replication of data structure in local memory --- limiting the size of the working set. This paper rethinks NVM deployment and makes a case for the \em asymmetric byte-addressable non-volatile memory architecture, which decouples servers from persistent data storage. In the proposed \em \anvm architecture, NVM devices (i.e., back-end nodes) can be shared by multiple servers (i.e., front-end nodes) and provide recoverable persistent data structures. The asymmetric architecture, which follows the industry trend of \em resource disaggregation, is made possible due to the high-performance network (e.g., RDMA). At the same time, \anvm leads to a number of key problems such as, still relatively long network latency, persistency bottleneck, and simple interface of the back-end NVM nodes. We build \em \anvm framework based on \anvm architecture that implements: 1) high performance persistent data structure update; 2) NVM data management; 3) concurrency control; and 4) crash-consistency and replication. The key idea to remove persistency bottleneck is the use of \em operation log that reduces stall time due to RDMA writes and enables efficient batching and caching in front-end nodes. To evaluate performance, we construct eight widely used data structures and two transaction applications based on \anvm framework. In a 10-node cluster equipped with real NVM devices, results show that \anvm achieves similar or better performance compared to the best possible symmetric architecture while enjoying the benefits of disaggregation. We found the speedup brought by the proposed optimizations is drastic, --- 5$\sim$12× among all benchmarks.
Published: 2020

5. KnightKing

Author: Mingxing Zhang, Ke Yang, Kang Chen, Yong Jiang, Yang Bai, and Xiaosong Ma
Subjects: Theoretical computer science, Computer science, Computation, Rejection sampling, 02 engineering and technology, Random walk, Graph, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Programming paradigm, Data analysis, 020201 artificial intelligence & image processing, Implementation
Abstract: Random walk on graphs has recently gained immense popularity as a tool for graph data analytics and machine learning. Currently, random walk algorithms are developed as individual implementations and suffer significant performance and scalability problems, especially with the dynamic nature of sophisticated walk strategies. We present KnightKing, the first general-purpose, distributed graph random walk engine. To address the unique interaction between a static graph and many dynamic walkers, it adopts an intuitive walker-centric computation model. The corresponding programming model allows users to easily specify existing or new random walk algorithms, facilitated by a new unified edge transition probability definition that applies across popular known algorithms. With KnightKing, these diverse algorithms benefit from its common distributed random walk execution engine, centered around an innovative rejection-based sampling mechanism that dramatically reduces the cost of higher-order random walk algorithms. Our evaluation confirms that KnightKing brings up to 4 orders of magnitude improvement in executing algorithms that currently can only be afforded with approximation solutions on large graphs.
Published: 2019

6. GraphQ

Author: Chao Wang, Yanzhi Wang, Mingxing Zhang, Niu Dimin, Rui Wang, Youwei Zhuo, and Xuehai Qian
Subjects: 010302 applied physics, Speedup, Computer science, Computation, 02 engineering and technology, Parallel computing, 01 natural sciences, Graph, 020202 computer hardware & architecture, Vertex (geometry), Asynchronous communication, Bounded function, 0103 physical sciences, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Tesseract, Execution model, Conventional memory
Abstract: Processing-In-Memory (PIM) architectures based on recent technology advances (e.g., Hybrid Memory Cube) demonstrate great potential for graph processing. However, existing solutions did not address the key challenge of graph processing---irregular data movements. This paper proposes GraphQ, an improved PIM-based graph processing architecture over recent architecture Tesseract, that fundamentally eliminates irregular data movements. GraphQ is inspired by ideas from distributed graph processing and irregular applications to enable static and structured communication with runtime and architecture co-design. Specifically, GraphQ realizes: 1) batched and overlapped inter-cube communication by reordering vertex processing order; 2) streamlined inter-cube communication by using heterogeneous cores for different access types. Moreover, to tackle the discrepancy between inter-cube and inter-node bandwidth, we propose a hybrid execution model that performs additional local computation during the inter-node communication. This model is general enough and applicable to asynchronous iterative algorithms that can tolerate bounded stale values. Putting all together, GraphQ simultaneously maximizes intra-cube, inter-cube, and inter-node communication throughput. In a zSim-based simulator with five real-world graphs and four algorithms, GraphQ achieves on average 3.3× and maximum 13.9× speedup, 81% energy saving compared with Tesseract. We show that increasing memory size in PIM also proportionally increases compute capability: a 4-node GraphQ achieves 98.34× speedup compared with a single node with the same memory size and conventional memory hierarchy.
Published: 2019

7. Total Factor Productivity Analysis of Industrial Processes Based on Malmquist Model

Author: Ouyang Zhi, Qin Wei, Yajie Wang, Yongming Han, Mingxing Zhang, Zhiqiang Geng, and Kai Chen
Subjects: 0209 industrial biotechnology, business.industry, Process (engineering), Computer science, Industrial production index, 02 engineering and technology, Production efficiency, Technical progress, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Data envelopment analysis, Production (economics), 020201 artificial intelligence & image processing, Process engineering, business, Total factor productivity, Malmquist index, Efficient energy use
Abstract: The production of industrial process is the key factor of evaluating the national industry production level. Therefore, this paper proposes total factor productivity analysis method based on Malmquist model to analyze the production efficiency statically and dynamically. Based on the input and output data of ethylene production plants in China, the total factor production index of ethylene production plants in industrial processes is decomposed into technical efficiency, technical progress, pure technical efficiency and scale efficiency through the Malmquist model based on the data envelopment analysis (DEA). Moreover, the energy efficiency of ethylene production plants can be improved.
Published: 2019

8. Wonderland

Author: Kang Chen, Xuehai Qian, Mingxing Zhang, Chengying Huan, Yongwei Wu, and Youwei Zhuo
Subjects: 020203 distributed computing, Speedup, Theoretical computer science, Computer science, 02 engineering and technology, Disjoint sets, Computer Graphics and Computer-Aided Design, Graph, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), Out-of-core algorithm, Abstraction, Software
Abstract: Many important graph applications are iterative algorithms that repeatedly process the input graph until convergence. For such algorithms, graph abstraction is an important technique: although much smaller than the original graph, it can bootstrap an initial result that can significantly accelerate the final convergence speed, leading to a better overall performance. However, existing graph abstraction techniques typically assume either fully in-memory or distributed environment, which leads to many obstacles preventing the application to an out-of-core graph processing system. In this paper, we propose Wonderland, a novel out-of-core graph processing system based on abstraction. Wonderland has three unique features: 1) A simple method applicable to out-of-core systems allowing users to extract effective abstractions from the original graph with acceptable cost and a specific memory limit; 2) Abstraction-enabled information propagation, where an abstraction can be used as a bridge over the disjoint on-disk graph partitions; 3) Abstraction guided priority scheduling, where an abstraction can infer the better priority-based order in processing on-disk graph partitions. Wonderland is a significant advance over the state-of-the-art because it not only makes graph abstraction feasible to out-of-core systems, but also broadens the applications of the concept in important ways. Evaluation results of Wonderland reveal that Wonderland achieves a drastic speedup over the other state-of-the-art systems, up to two orders of magnitude for certain cases.
Published: 2018

9. RFP

Author: Kang Chen, Maomeng Su, Mingxing Zhang, Yongwei Wu, and Zhenyu Guo
Subjects: 020203 distributed computing, Hardware_MEMORYSTRUCTURES, Remote direct memory access, Computer science, 020206 networking & telecommunications, IOPS, 02 engineering and technology, Data structure, computer.software_genre, Set (abstract data type), RDMA over Converged Ethernet, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Performance improvement, computer
Abstract: Remote Direct Memory Access (RDMA) has been widely deployed in modern data centers. However, existing usages of RDMA lead to a dilemma between performance and redesign cost. They either directly replace socket-based send/receive primitives with the corresponding RDMA counterpart (server-reply), which only achieves moderate performance improvement; or push performance further by using one-sided RDMA operations to totally bypass the server (server-bypass), at the cost of redesigning the software.In this paper, we introduce two interesting observations about RDMA. First, RDMA has asymmetric performance characteristics, which can be used to improve server-reply's performance. Second, the performance of server-bypass is not as good as expected in many cases, because more rounds of RDMA may be needed if the server is totally bypassed. We therefore introduce a new RDMA paradigm called Remote Fetching Paradigm (RFP). Although RFP requires users to set several parameters to achieve the best performance, it supports the legacy RPC interfaces and hence avoids the need of redesigning application-specific data structures. Moreover, with proper parameters, it can achieve even higher IOPS than that of the previous paradigms.We have designed and implemented an in-memory key-value store based on RFP to evaluate its effectiveness. Experimental results show that RFP improves performance by 1.6×~4× compared with both server-reply and server-bypass paradigms.
Published: 2017

10. AI: a lightweight system for tolerating concurrency bugs

Author: Yongwei Wu, Shan Lu, Weimin Zheng, Jinglei Ren, Shanxiang Qi, and Mingxing Zhang
Subjects: Atomicity, Debugging, Software testing, Computer science, Distributed computing, Concurrency, media_common.quotation_subject, Bebugging, Thread (computing), Software quality, media_common
Abstract: Concurrency bugs are notoriously difficult to eradicate during software testing because of their non-deterministic nature. Moreover, fixing concurrency bugs is time-consuming and error-prone. Thus, tolerating concurrency bugs during production runs is an attractive complementary approach to bug detection and testing. Unfortunately, existing bug-tolerating tools are usually either 1) constrained in types of bugs they can handle or 2) requiring roll-back mechanism, which can hitherto not be fully achieved efficiently without hardware supports. This paper presents a novel program invariant, called Anticipating Invariant (AI), which can help anticipate bugs before any irreversible changes are made. Benefiting from this ability of anticipating bugs beforehand, our software-only system is able to forestall the failures with a simple thread stalling technique, which does not rely on execution roll-back and hence has good performance Experiments with 35 real-world concurrency bugs demonstrate that AI is capable of detecting and tolerating most types of concurrency bugs, including both atomicity and order violations. Two new bugs have been detected and confirmed by the corresponding developers. Performance evaluation with 6 representative parallel programs shows that AI incurs negligible overhead (
Published: 2014

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Mingxing Zhang"'

1. NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing

2. Achieving Sub-second Pairwise Query over Evolving Graphs

3. Rethinking Open-World Object Detection in Autonomous Driving Scenarios

4. AsymNVM

5. KnightKing

6. GraphQ

7. Total Factor Productivity Analysis of Industrial Processes Based on Malmquist Model

8. Wonderland

9. RFP

10. AI: a lightweight system for tolerating concurrency bugs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

10 results on '"Mingxing Zhang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources