Author: "Liu, Simeng" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liu, Simeng"' showing total 5 results

Start Over Author "Liu, Simeng" Publication Type Reports

5 results on '"Liu, Simeng"'

1. LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Author: Zhao, Junchen, Song, Yurun, Liu, Simeng, Harris, Ian G., and Jyothi, Sangeetha Abdu
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Networking and Internet Architecture
Abstract: Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$., Comment: 16 pages, 8 figures
Published: 2023

2. Quantifying Overheads in Charm++ and HPX using Task Bench

Author: Wu, Nanmiao, Gonidelis, Ioannis, Liu, Simeng, Fink, Zane, Gupta, Nikunj, Mohammadiporshokooh, Karame, Diehl, Patrick, Kaiser, Hartmut, and Kale, Laxmikant V.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Abstract: Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX with the main stream MPI, OpenMP, and MPI+OpenMP libraries using the Task Bench benchmarks. Charm++ is a parallel programming language based on C++, supporting stackless tasks as well as light-weight threads asynchronously along with an adaptive runtime system. HPX is a C++ library for concurrency and parallelism, exposing C++ standards conforming API. First, we analyze the commonalities, differences, and advantageous scenarios of Charm++ and HPX in detail. Further, to investigate the potential overheads introduced by the tasking systems of Charm++ and HPX, we utilize an existing parameterized benchmark, Task Bench, wherein 15 different programming systems were implemented, e.g., MPI, OpenMP, MPI + OpenMP, and extend Task Bench by adding HPX implementations. We quantify the overheads of Charm++, HPX, and the main stream libraries in different scenarios where a single task and multi-task are assigned to each core, respectively. We also investigate each system's scalability and the ability to hide the communication latency.
Published: 2022
Full Text: View/download PDF

3. Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py

Author: Fink, Zane, Liu, Simeng, Choi, Jaemin, Diener, Matthias, and Kale, Laxmikant V.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Computer Science - Programming Languages
Abstract: Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While high-performance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale high-performance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks' relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py. We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing., Comment: 7 pages, 7 figures. To appear at "Sixth International IEEE Workshop on Extreme Scale Programming Models and Middleware"
Published: 2021
Full Text: View/download PDF

4. Matrix Variate RBM Model with Gaussian Distributions

Author: Liu, Simeng, Sun, Yanfeng, Hu, Yongli, Gao, Junbin, and Yin, Baocai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Restricted Boltzmann Machine (RBM) is a particular type of random neural network models modeling vector data based on the assumption of Bernoulli distribution. For multi-dimensional and non-binary data, it is necessary to vectorize and discretize the information in order to apply the conventional RBM. It is well-known that vectorization would destroy internal structure of data, and the binary units will limit the applying performance due to fickle real data. To address the issue, this paper proposes a Matrix variate Gaussian Restricted Boltzmann Machine (MVGRBM) model for matrix data whose entries follow Gaussian distributions. Compared with some other RBM algorithm, MVGRBM can model real value data better and it has good performance in image classification., Comment: We think we need more mathematical derivation and experiments to support the proposed theory of the paper. In this period, it is not appropriate to publish it
Published: 2016

5. Mixture of Bilateral-Projection Two-dimensional Probabilistic Principal Component Analysis

Author: Ju, Fujiao, Sun, Yanfeng, Gao, Junbin, Liu, Simeng, and Hu, Yongli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The probabilistic principal component analysis (PPCA) is built upon a global linear mapping, with which it is insufficient to model complex data variation. This paper proposes a mixture of bilateral-projection probabilistic principal component analysis model (mixB2DPPCA) on 2D data. With multi-components in the mixture, this model can be seen as a soft cluster algorithm and has capability of modeling data with complex structures. A Bayesian inference scheme has been proposed based on the variational EM (Expectation-Maximization) approach for learning model parameters. Experiments on some publicly available databases show that the performance of mixB2DPPCA has been largely improved, resulting in more accurate reconstruction errors and recognition rates than the existing PCA-based algorithms.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Liu, Simeng"'

1. LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

2. Quantifying Overheads in Charm++ and HPX using Task Bench

3. Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py

4. Matrix Variate RBM Model with Gaussian Distributions

5. Mixture of Bilateral-Projection Two-dimensional Probabilistic Principal Component Analysis

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

5 results on '"Liu, Simeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources