Author: "Cai, Ruisi" / Publication Year Range: This year - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cai, Ruisi"' showing total 7 results

Start Over Author "Cai, Ruisi" Publication Year Range This year

7 results on '"Cai, Ruisi"'

1. Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Author: Cai, Ruisi, Ro, Yeonju, Kim, Geon-Woo, Wang, Peihao, Bejnordi, Babak Ehteshami, Akella, Aditya, and Wang, Zhangyang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The proliferation of large language models (LLMs) has led to the adoption of Mixture-of-Experts (MoE) architectures that dynamically leverage specialized subnetworks for improved efficiency and performance. Despite their benefits, MoE models face significant challenges during inference, including inefficient memory management and suboptimal batching, due to misaligned design choices between the model architecture and the system policies. Furthermore, the conventional approach of training MoEs from scratch is increasingly prohibitive in terms of cost. In this paper, we propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models (in contrast to "upcycling" generalist MoEs), avoiding the high costs of ground-up training. Our approach employs activation sparsity to extract experts. To compose experts, we examine the widely-adopted layer-wise router design and show its redundancy, and thus we introduce the pre-gating router decoupled from the MoE backbone that facilitates system-friendly pre-computing and lookahead scheduling, enhancing expert-aware batching and caching. Our codesign therefore addresses critical gaps on both the algorithmic and system fronts, establishing a scalable and efficient alternative for LLM inference in resource-constrained settings. Read-ME outperforms other popular open-source dense models of similar scales, achieving improvements of up to 10.1% on MMLU, and improving mean end-to-end latency up to 6.1%. Codes are available at: https://github.com/VITA-Group/READ-ME., Comment: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Published: 2024

2. Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Author: Zhao, Xinyu, Sun, Guoheng, Cai, Ruisi, Zhou, Yukun, Li, Pingzhi, Wang, Peihao, Tan, Bowen, He, Yexiao, Chen, Li, Liang, Yi, Chen, Beidi, Yuan, Binhang, Wang, Hongyi, Li, Ang, Wang, Zhangyang, and Chen, Tianlong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE., Comment: 24 pages, 4 figures, accepted to NeurIPS 2024 Datasets and Benchmarks Track
Published: 2024

3. Flextron: Many-in-One Flexible Large Language Model

Author: Cai, Ruisi, Muralidharan, Saurav, Heinrich, Greg, Yin, Hongxu, Wang, Zhangyang, Kautz, Jan, and Molchanov, Pavlo
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elastic structure to rapidly adapt to specific user-defined latency and accuracy targets during inference with no additional fine-tuning required. It is also input-adaptive, and can automatically route tokens through its sub-networks for improved performance and efficiency. We present a sample-efficient training method and associated routing algorithms for systematically transforming an existing trained LLM into a Flextron model. We evaluate Flextron on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining.
Published: 2024

4. LoCoCo: Dropping In Convolutions for Long Context Compression

Author: Cai, Ruisi, Tian, Yuandong, Wang, Zhangyang, and Chen, Beidi
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heuristics, LoCoCo leverages a data-driven adaptive fusion technique, blending previous KV pairs with incoming tokens to minimize the loss of contextual information and ensure accurate attention modeling. This token integration is achieved through injecting one-dimensional convolutional kernels that dynamically calculate mixing weights for each KV cache slot. Designed for broad compatibility with existing LLM frameworks, LoCoCo allows for straightforward "drop-in" integration without needing architectural modifications, while incurring minimal tuning overhead. Experiments demonstrate that LoCoCo maintains consistently outstanding performance across various context lengths and can achieve a high context compression rate during both inference and fine-tuning phases. During inference, we successfully compressed up to 3482 tokens into a 128-size KV cache, while retaining comparable performance to the full sequence - an accuracy improvement of up to 0.2791 compared to baselines at the same cache size. During post-training tuning, we also effectively extended the context length from 4K to 32K using a KV cache of fixed size 512, achieving performance similar to fine-tuning with entire sequences.
Published: 2024

5. Glucose-Responsive Microneedle Patch with High Insulin Loading Capacity for Prolonged Glycemic Control in Mice and Minipigs

Author: Wang, Shiqi, Yang, Changwei, Zhang, Wentao, Zhao, Sheng, You, Jiahuan, Cai, Ruisi, Wang, Hao, Bao, Yuhang, Zhang, Ying, Zhang, Juan, Ji, Kangfan, Zhang, Yuqi, Ye, Xiao, Gu, Zhen, and Yu, Jicheng
Abstract: Transdermal microneedle-mediated glucose-responsive insulin delivery systems can modulate insulin release based on fluctuations in blood glucose levels, thus maintaining normoglycemia effectively in a continuous, convenient, and minimally invasive manner. However, conventional microneedles are limited by the low drug loading capacity, making it challenging to be applied on human skin at a reasonable size for a lasting glucose-controlling effect, thus hindering their clinical translation. Here, we design a microneedle patch with a solid insulin powder core to achieve a high loading capacity of insulin (>70 wt %) as well as a glucose-sensitive polymeric shell to realize glucose-responsive insulin release. Once exposed to hyperglycemia, the formation of negatively charged glucose–boronate complexes increases the charge density of the shell matrix, leading to swelling of the shell and accelerating insulin release from the core. We have demonstrated that this glucose-responsive microneedle patch could achieve long-term regulation of blood glucose levels in both type 1 diabetic mice and minipigs (up to 48 h with patches of ∼3.5 cm2for minipigs >25 kg).
Published: 2024
Full Text: View/download PDF

6. Dual-Targeted Cascade-Responsive Prodrug Micelle System for Tumor Therapy in Vivo

Author: Dai, Liangliang, Cai, Ruisi, Li, Menghuan, Luo, Zhong, Yu, Yonglin, Chen, Weizhen, Shen, Xinkun, Pei, Yuxia, Zhao, Xiaojing, and Cai, Kaiyong
Abstract: This study reports a cascade-responsive disassemble micellar drug delivery system with dual-targeting potential (cell and mitochondria targeting), which optimizes the distribution of antitumor drugs on systemic, local, and subcellular levels to enhance antitumor efficacy. A new cationic porphyrin derivative 5-(3-hydroxy-p-(4-trimethylammonium)butoxyphenyl)-10,15,20-triphenylporphyrin chlorine (MTPP) is synthesized as a mitochondria-targeting photosensitizer. After accumulating at a tumor site, the micellar nanosystem is endocytosed by tumor cells facilitated by the folate receptor-mediated pathway. Then, the hydrophobic PDEA block would be protonated in intracellular acidic endo-/lysosomes and promote the escape of prodrug micelles from endo-/lysosome to cytoplasm, resulting in the first-stage destabilization of micelles. Subsequently, the CPT is released in response to high concentration of GSH in cytoplasm, which would greatly increase the hydrophilicity of the BOH block and initiate the complete disassembly of the polymer micelles owing to the damage of the hydrophilic–hydrophobic balance. Additionally, the released MTPP is selectively accumulated in mitochondria and activates mitochondria apoptotic pathway upon light irradiation as a result of ROS generation. Both in vitroand in vivostudies indicate that the polymeric micelle not only effectively improves the targeted delivery efficiency but also dramatically enhances the combinational antitumor efficacy while reducing the side effects associated with the laser irradiation and mitochondria-targeted tumor therapy.
Published: 2024
Full Text: View/download PDF

7. Wireless, Programmable, and Refillable Hydrogel Bioelectronics for Enhanced Diabetic Wound Healing.

Author: Du N, Fan Y, Zhang Y, Huang H, Lyu Y, Cai R, Zhang Y, Zhang T, Guan Y, and Nan K
Abstract: Diabetic wounds, characterized by complex pathogenesis and high infection rates, pose significant challenges in treatment due to prolonged recovery times and high recurrence rates, often leading to severe complications such as amputation and death. Traditional dry dressing treatments fail to address the unique microenvironment of diabetic wounds and tend to cause secondary damage due to frequent replacement. In this study, an electronic-embedding, drug-loading hydrogel bioelectronics is reported for accelerating diabetic wound healing using a combination of programmable pharmaceutical and electrostimulative approaches. Encapsulated in stretchable and biocompatible materials, this device is capable of multiple drug refilling and accelerated drug release modulated by on-board electronics. In vivo experiments on diabetic model rats confirm the device's effectiveness in promoting wound healing. This innovative approach implies the potential for improving diabetic wound management using a combination of physical, material, and pharmaceutical interventions., (© 2024 The Author(s). Advanced Science published by Wiley‐VCH GmbH.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Cai, Ruisi"'

1. Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

2. Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

3. Flextron: Many-in-One Flexible Large Language Model

4. LoCoCo: Dropping In Convolutions for Long Context Compression

5. Glucose-Responsive Microneedle Patch with High Insulin Loading Capacity for Prolonged Glycemic Control in Mice and Minipigs

6. Dual-Targeted Cascade-Responsive Prodrug Micelle System for Tumor Therapy in Vivo

7. Wireless, Programmable, and Refillable Hydrogel Bioelectronics for Enhanced Diabetic Wound Healing.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

7 results on '"Cai, Ruisi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources