6,173 results on '"MURALI, P."'
Search Results
2. LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
- Author
-
Chitty-Venkata, Krishna Teja, Raskar, Siddhisanket, Kale, Bharat, Ferdaus, Farah, Tanikanti, Aditya, Raffenetti, Ken, Taylor, Valerie, Emani, Murali, and Vishwanath, Venkatram
- Subjects
Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have propelled groundbreaking advancements across several domains and are commonly used for text generation applications. However, the computational demands of these complex models pose significant challenges, requiring efficient hardware acceleration. Benchmarking the performance of LLMs across diverse hardware platforms is crucial to understanding their scalability and throughput characteristics. We introduce LLM-Inference-Bench, a comprehensive benchmarking suite to evaluate the hardware inference performance of LLMs. We thoroughly analyze diverse hardware platforms, including GPUs from Nvidia and AMD and specialized AI accelerators, Intel Habana and SambaNova. Our evaluation includes several LLM inference frameworks and models from LLaMA, Mistral, and Qwen families with 7B and 70B parameters. Our benchmarking results reveal the strengths and limitations of various models, hardware platforms, and inference frameworks. We provide an interactive dashboard to help identify configurations for optimal performance for a given hardware platform.
- Published
- 2024
3. Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
- Author
-
Zhang, Chaoyun, Yao, Randolph, Qin, Si, Li, Ze, Agrawal, Shekhar, Mishra, Binit R., Tran, Tri, Ma, Minghua, Lin, Qingwei, Chintalapati, Murali, and Zhang, Dongmei
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored to recommending mitigation actions for unhealthy node in cloud systems to minimize virtual machine downtime and interruptions during unhealthy events. It employs double machine learning combined with causal forest to produce precise and reliable mitigation recommendations based solely on limited observational data collected from the historical unhealthy events. To enhance the causal inference model, Deoxys further incorporates a policy fallback mechanism based on model uncertainty and action overriding mechanisms to (i) improve the reliability of the system, and (ii) strike a good tradeoff between downtime reduction and resource utilization, thereby enhancing the overall system performance. After deploying Deoxys in a large-scale cloud infrastructure at Microsoft, our observations demonstrate that Deoxys significantly reduces average VM downtime by 53% compared to a legacy policy, while leading to 49.5% lower VM interruption rate. This substantial improvement enhances the reliability and stability of cloud platforms, resulting in a seamless customer experience.
- Published
- 2024
4. DiffusionSeeder: Seeding Motion Optimization with Diffusion for Rapid Motion Planning
- Author
-
Huang, Huang, Sundaralingam, Balakumar, Mousavian, Arsalan, Murali, Adithyavairavan, Goldberg, Ken, and Fox, Dieter
- Subjects
Computer Science - Robotics - Abstract
Running optimization across many parallel seeds leveraging GPU compute have relaxed the need for a good initialization, but this can fail if the problem is highly non-convex as all seeds could get stuck in local minima. One such setting is collision-free motion optimization for robot manipulation, where optimization converges quickly on easy problems but struggle in obstacle dense environments (e.g., a cluttered cabinet or table). In these situations, graph-based planning algorithms are used to obtain seeds, resulting in significant slowdowns. We propose DiffusionSeeder, a diffusion based approach that generates trajectories to seed motion optimization for rapid robot motion planning. DiffusionSeeder takes the initial depth image observation of the scene and generates high quality, multi-modal trajectories that are then fine-tuned with a few iterations of motion optimization. We integrate DiffusionSeeder to generate the seed trajectories for cuRobo, a GPU-accelerated motion optimization method, which results in 12x speed up on average, and 36x speed up for more complicated problems, while achieving 10% higher success rate in partially observed simulation environments. Our results show the effectiveness of using diverse solutions from a learned diffusion model. Physical experiments on a Franka robot demonstrate the sim2real transfer of DiffusionSeeder to the real robot, with an average success rate of 86% and planning time of 26ms, improving on cuRobo by 51% higher success rate while also being 2.5x faster.
- Published
- 2024
5. Fastrack: Fast IO for Secure ML using GPU TEEs
- Author
-
Wang, Yongqin, Rajat, Rachit, Lee, Jonghyun, Tang, Tingting, and Annavaram, Murali
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Hardware Architecture - Abstract
As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 to 33.53 times. This results in GPU TEE inference becoming 54.12% to 903.9% slower and training 10% to 455% slower than non-TEE systems, undermining GPU TEE advantages in latency-sensitive applications. This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads: 1) redundant CPU re-encryption, 2) limited authentication parallelism, and 3) unnecessary operation serialization. We propose Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission. These optimizations cut communication costs and reduce inference/training runtime by up to 84.6%, with minimal overhead compared to non-TEE systems.
- Published
- 2024
6. Learning autonomous driving from aerial imagery
- Author
-
Murali, Varun, Rosman, Guy, Karaman, Sertac, and Rus, Daniela
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world., Comment: Presented at IROS 2024
- Published
- 2024
7. Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree
- Author
-
Jaggi, Harbani, Murali, Kashyap, Fleisig, Eve, and Bıyık, Erdem
- Subjects
Computer Science - Computation and Language - Abstract
When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information: a neural collaborative filtering (NCF) approach, an in-context learning (ICL) approach, and an intermediate embedding-based architecture. We also study the utility of demographic information for rating prediction. NCF showed limited utility; however, integrating annotator history, demographics, and survey information permits both the embedding-based architecture and ICL to substantially improve prediction accuracy, with the embedding-based architecture outperforming the other methods. We also find that, if demographics are predicted from survey information, using these imputed demographics as features performs comparably to using true demographic data. This suggests that demographics may not provide substantial information for modeling ratings beyond what is captured in survey responses. Our findings raise considerations about the relative utility of different types of annotator information and provide new approaches for modeling annotators in subjective NLP tasks.
- Published
- 2024
8. Moving Faster and Reducing Risk: Using LLMs in Release Deployment
- Author
-
Abreu, Rui, Murali, Vijayaraghavan, Rigby, Peter C, Maddila, Chandra, Sun, Weiyan, Ge, Jun, Chinniah, Kaavya, Mockus, Audris, Mehta, Megh, and Nagappan, Nachiappan
- Subjects
Computer Science - Software Engineering - Abstract
Release engineering has traditionally focused on continuously delivering features and bug fixes to users, but at a certain scale, it becomes impossible for a release engineering team to determine what should be released. At Meta's scale, the responsibility appropriately and necessarily falls back on the engineer writing and reviewing the code. To address this challenge, we developed models of diff risk scores (DRS) to determine how likely a diff is to cause a SEV, i.e., a severe fault that impacts end-users. Assuming that SEVs are only caused by diffs, a naive model could randomly gate X% of diffs from landing, which would automatically catch X% of SEVs on average. However, we aimed to build a model that can capture Y% of SEVs by gating X% of diffs, where Y >> X. By training the model on historical data on diffs that have caused SEVs in the past, we can predict the riskiness of an outgoing diff to cause a SEV. Diffs that are beyond a particular threshold of risk can then be gated. We have four types of gating: no gating (green), weekend gating (weekend), medium impact on end-users (yellow), and high impact on end-users (red). The input parameter for our models is the level of gating, and the outcome measure is the number of captured SEVs. Our research approaches include a logistic regression model, a BERT-based model, and generative LLMs. Our baseline regression model captures 18.7%, 27.9%, and 84.6% of SEVs while respectively gating the top 5% (weekend), 10% (yellow), and 50% (red) of risky diffs. The BERT-based model, StarBERT, only captures 0.61x, 0.85x, and 0.81x as many SEVs as the logistic regression for the weekend, yellow, and red gating zones, respectively. The generative LLMs, iCodeLlama-34B and iDiffLlama-13B, when risk-aligned, capture more SEVs than the logistic regression model in production: 1.40x, 1.52x, 1.05x, respectively.
- Published
- 2024
9. SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments
- Author
-
Ravichandran, Zachary, Murali, Varun, Tzes, Mariliza, Pappas, George J., and Kumar, Vijay
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
As robots become increasingly capable, users will want to describe high-level missions and have robots fill in the gaps. In many realistic settings, pre-built maps are difficult to obtain, so execution requires exploration and mapping that are necessary and specific to the mission. Consider an emergency response scenario where a user commands a robot, "triage impacted regions." The robot must infer relevant semantics (victims, etc.) and exploration targets (damaged regions) based on priors or other context, then explore and refine its plan online. These missions are incompletely specified, meaning they imply subtasks and semantics. While many semantic planning methods operate online, they are typically designed for well specified tasks such as object search or exploration. Recently, Large Language Models (LLMs) have demonstrated powerful contextual reasoning over a range of robotic tasks described in natural language. However, existing LLM planners typically do not consider online planning or complex missions; rather, relevant subtasks are provided by a pre-built map or a user. We address these limitations via SPINE (online Semantic Planner for missions with Incomplete Natural language specifications in unstructured Environments). SPINE uses an LLM to reason about subtasks implied by the mission then realizes these subtasks in a receding horizon framework. Tasks are automatically validated for safety and refined online with new observations. We evaluate SPINE in simulation and real-world settings. Evaluation missions require multiple steps of semantic reasoning and exploration in cluttered outdoor environments of over 20,000m$^2$ area. We evaluate SPINE against competitive baselines in single-agent and air-ground teaming applications. Please find videos and software on our project page: https://zacravichandran.github.io/SPINE
- Published
- 2024
10. Characterizing Context Influence and Hallucination in Summarization
- Author
-
Flemings, James, Zhang, Wanrong, Jiang, Bo, Takhirov, Zafar, and Annavaram, Murali
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Although Large Language Models (LLMs) have achieved remarkable performance in numerous downstream tasks, their ubiquity has raised two significant concerns. One is that LLMs can hallucinate by generating content that contradicts relevant contextual information; the other is that LLMs can inadvertently leak private information due to input regurgitation. Many prior works have extensively studied each concern independently, but none have investigated them simultaneously. Furthermore, auditing the influence of provided context during open-ended generation with a privacy emphasis is understudied. To this end, we comprehensively characterize the influence and hallucination of contextual information during summarization. We introduce a definition for context influence and Context-Influence Decoding (CID), and then we show that amplifying the context (by factoring out prior knowledge) and the context being out of distribution with respect to prior knowledge increases the context's influence on an LLM. Moreover, we show that context influence gives a lower bound of the private information leakage of CID. We corroborate our analytical findings with experimental evaluations that show improving the F1 ROGUE-L score on CNN-DM for LLaMA 3 by $\textbf{10}$% over regular decoding also leads to $\textbf{1.5x}$ more influence by the context. Moreover, we empirically evaluate how context influence and hallucination are affected by (1) model capacity, (2) context size, (3) the length of the current response, and (4) different token $n$-grams of the context. Our code can be accessed here: https://github.com/james-flemings/context_influence.
- Published
- 2024
11. Ultrathin BIC metasurfaces based on ultralow-loss Sb2Se3 phase-change material
- Author
-
Xie, Zhaoyang, Li, Chi, Murali, Krishna, Yu, Haoyi, Liu, Changxu, Lu, Yiqing, Maier, Stefan A., Bhaskaran, Madhu, and Ren, Haoran
- Subjects
Physics - Optics ,Condensed Matter - Materials Science ,Physics - Applied Physics - Abstract
Phase-change materials (PCMs) are increasingly recognised as promising platforms for tunable photonic devices due to their ability to modulate optical properties through solid-state phase transitions. Ultrathin and low-loss PCMs are highly valued for their fast and more effective phase transitions and applications in reconfigurable photonic chips, metasurfaces, optical modulators, sensors, photonic memories, and neuromorphic computing. However, conventional PCMs such as GST, GSST, VO2, and In3SbTe2, despite optimisation for tunable meta-optics, suffer from high intrinsic losses in the near-infrared (NIR) region, limiting their potential for high quality factor (Q-factor) resonant metasurfaces. Here we present the design and fabrication of tunable bound states in the continuum (BIC) metasurfaces using the ultralow-loss PCM Sb2Se3. Our BIC metasurfaces, only 25 nm thick, achieve high modulation depth and broad resonance tuning in the NIR with high Q-factors up to 130, without the need for additional materials. Experimentally, we employ these BIC metasurfaces to modulate photoluminescence in rare earth-doped upconversion nanoparticles, reducing the excitation power for multiphoton photoluminescence and enabling emission polarisation manipulation. This work offers a promising platform for developing active resonant metasurfaces in the NIR region, with broad applications including super resolution imaging, optical modulation, ultrafast switches, harmonic generation, colour filtering, and optical sensing.
- Published
- 2024
12. Adaptively Private Next-Token Prediction of Large Language Models
- Author
-
Flemings, James, Razaviyayn, Meisam, and Annavaram, Murali
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security - Abstract
As Large Language Models (LLMs) proliferate, developing privacy safeguards for these models is crucial. One popular safeguard involves training LLMs in a differentially private manner. However, such solutions are shown to be computationally expensive and detrimental to the utility of these models. Since LLMs are deployed on the cloud and thus only accessible via an API, a Machine Learning as a Service (MLaaS) provider can protect its downstream data by privatizing the predictions during the decoding process. However, the practicality of such solutions still largely lags behind DP training methods. One recent promising approach, Private Mixing of Ensemble Distributions (PMixED), avoids additive noise by sampling from the output distributions of private LLMs mixed with the output distribution of a public model. Yet, PMixED must satisfy a fixed privacy level for a given number of queries, which is difficult for an analyst to estimate before inference and, hence, does not scale. To this end, we relax the requirements to a more practical setting by introducing Adaptive PMixED (AdaPMixED), a private decoding framework based on PMixED that is adaptive to the private and public output distributions evaluated on a given input query. In this setting, we introduce a noisy screening mechanism that filters out queries with potentially expensive privacy loss, and a data-dependent analysis that exploits the divergence of the private and public output distributions in its privacy loss calculation. Our experimental evaluations demonstrate that our mechanism and analysis can reduce the privacy loss by 16x while preserving the utility over the original PMixED. Furthermore, performing 100K predictions with AdaPMixED still achieves strong utility and a reasonable data-dependent privacy loss of 5.25.
- Published
- 2024
13. RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration
- Author
-
Tao, Yuezhan, Ong, Dexter, Murali, Varun, Spasojevic, Igor, Chaudhari, Pratik, and Kumar, Vijay
- Subjects
Computer Science - Robotics - Abstract
We propose a framework for active mapping and exploration that leverages Gaussian splatting for constructing information-rich maps. Further, we develop a parallelized motion planning algorithm that can exploit the Gaussian map for real-time navigation. The Gaussian map constructed onboard the robot is optimized for both photometric and geometric quality while enabling real-time situational awareness for autonomy. We show through simulation experiments that our method is competitive with approaches that use alternate information gain metrics, while being orders of magnitude faster to compute. In real-world experiments, our algorithm achieves better map quality (10% higher Peak Signal-to-Noise Ratio (PSNR) and 30% higher geometric reconstruction accuracy) than Gaussian maps constructed by traditional exploration baselines. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/RT_GuIDE/, Comment: Submitted to ICRA2025
- Published
- 2024
14. On 1-Planar Graphs with Bounded Cop-Number
- Author
-
Bose, Prosenjit, De Carufel, Jean-Lou, Maheshwari, Anil, and Murali, Karthik
- Subjects
Computer Science - Discrete Mathematics ,Mathematics - Combinatorics - Abstract
Cops and Robbers is a type of pursuit-evasion game played on a graph where a set of cops try to capture a single robber. The cops first choose their initial vertex positions, and later the robber chooses a vertex. The cops and robbers make their moves in alternate turns: in the cops' turn, every cop can either choose to move to an adjacent vertex or stay on the same vertex, and likewise the robber in his turn. If the cops can capture the robber in a finite number of rounds, the cops win, otherwise the robber wins. The cop-number of a graph is the minimum number of cops required to catch a robber in the graph. It has long been known that graphs embedded on surfaces (such as planar graphs and toroidal graphs) have a small cop-number. Recently, Durocher et al. [Graph Drawing, 2023] investigated the problem of cop-number for the class of $1$-planar graphs, which are graphs that can be embedded in the plane such that each edge is crossed at most once. They showed that unlike planar graphs which require just three cops, 1-planar graphs have an unbounded cop-number. On the positive side, they showed that maximal 1-planar graphs require only three cops by crucially using the fact that the endpoints of every crossing in an embedded maximal 1-planar graph induce a $K_4$. In this paper, we show that the cop-number remains bounded even under the relaxed condition that the endpoints induce at least three edges. More precisely, let an $\times$-crossing of an embedded 1-planar graph be a crossing whose endpoints induce a matching; i.e., there is no edge connecting the endpoints apart from the crossing edges themselves. We show that any 1-planar graph that can be embedded without $\times$-crossings has cop-number at most 21. Moreover, any 1-planar graph that can be embedded with at most $\gamma$ $\times$-crossings has cop-number at most $\gamma + 21$.
- Published
- 2024
15. Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines
- Author
-
Gao, Lei, Ziashahabi, Amir, Niu, Yue, Avestimehr, Salman, and Annavaram, Murali
- Subjects
Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. The next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Given the sensitive nature of such private data, it is desirable to fine-tune these models on edge devices to improve user trust. However, fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands, as well as limited infrastructure support. We observe that inference engines (e.g., ExecuTorch) can be repurposed for fine-tuning by leveraging zeroth-order (ZO) optimization, which uses multiple forward passes to approximate gradients. However, directly applying ZO methods on edge devices is impractical due to the high computational cost of multiple model perturbations required to achieve accuracy improvements. Based on these observations, we propose a memory- and computation-efficient LLM fine-tuning method for edge devices. Our approach has three key innovations: (1) We introduce a parallelized randomized gradient estimation (P-RGE) technique that achieves high parallel efficiency by leveraging outer-loop and inner-loop parallelization. This enables multiple function queries and forward passes to be executed in parallel, reducing training time. (2) We integrate P-RGE with parameter-efficient fine-tuning methods (e.g. LoRA) to further reduce computational and memory overhead. (3) We implement a P-RGE LoRA-FA module that fully supports fine-tuning with ExecuTorch. Our approach requires no modifications to ExecuTorch's runtime code, as it can be implemented with server-side code changes only. Experiments demonstrate that P-RGE achieves substantial runtime speedups and memory savings while improving fine-tuning accuracy, paving the way for practical deployment of LLMs in real-time, on-device applications., Comment: Accepted at NeurIPS 2024 ENLSP-IV workshop
- Published
- 2024
16. AgriNeRF: Neural Radiance Fields for Agriculture in Challenging Lighting Conditions
- Author
-
Chopra, Samarth, Cladera, Fernando, Murali, Varun, and Kumar, Vijay
- Subjects
Computer Science - Robotics - Abstract
Neural Radiance Fields (NeRFs) have shown significant promise in 3D scene reconstruction and novel view synthesis. In agricultural settings, NeRFs can serve as digital twins, providing critical information about fruit detection for yield estimation and other important metrics for farmers. However, traditional NeRFs are not robust to challenging lighting conditions, such as low-light, extreme bright light and varying lighting. To address these issues, this work leverages three different sensors: an RGB camera, an event camera and a thermal camera. Our RGB scene reconstruction shows an improvement in PSNR and SSIM by +2.06 dB and +8.3% respectively. Our cross-spectral scene reconstruction enhances downstream fruit detection by +43.0% in mAP50 and +61.1% increase in mAP50-95. The integration of additional sensors leads to a more robust and informative NeRF. We demonstrate that our multi-modal system yields high quality photo-realistic reconstructions under various tree canopy covers and at different times of the day. This work results in the development of a resilient NeRF, capable of performing well in visibly degraded scenarios, as well as a learnt cross-spectral representation, that is used for automated fruit detection., Comment: 7 pages, 5 figures
- Published
- 2024
17. Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot
- Author
-
Sachdeva, Bhuvan, Ramjee, Pragnya, Fulari, Geeta, Murali, Kaushik, and Jain, Mohit
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Large Language Models (LLMs) are widely used in healthcare, but limitations like hallucinations, incomplete information, and bias hinder their reliability. To address these, researchers released the Build Your Own expert Bot (BYOeB) platform, enabling developers to create LLM-powered chatbots with integrated expert verification. CataractBot, its first implementation, provides expert-verified responses to cataract surgery questions. A pilot evaluation showed its potential; however the study had a small sample size and was primarily qualitative. In this work, we conducted a large-scale 24-week deployment of CataractBot involving 318 patients and attendants who sent 1,992 messages, with 91.71% of responses verified by seven experts. Analysis of interaction logs revealed that medical questions significantly outnumbered logistical ones, hallucinations were negligible, and experts rated 84.52% of medical answers as accurate. As the knowledge base expanded with expert corrections, system performance improved by 19.02%, reducing expert workload. These insights guide the design of future LLM-powered chatbots., Comment: The first two authors contributed equally to this research
- Published
- 2024
18. Optimal Workload Placement on Multi-Instance GPUs
- Author
-
Turkkan, Bekir, Murali, Pavankumar, Harsha, Pavithra, Arora, Rohan, Vanloo, Gerard, and Narayanaswami, Chandra
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share a GPU, albeit with some constraints. In this paper we investigate how to optimize the placement of Large Language Model (LLM)-based AI Inferencing workloads on GPUs. We first identify and present several use cases that are encountered in practice that require workloads to be efficiently placed or migrated to other GPUs to make room for incoming workloads. The overarching goal is to use as few GPUs as possible and to further minimize memory and compute wastage on GPUs that are utilized. We have developed two approaches to address this problem: an optimization method and a heuristic method. We benchmark these with two workload scheduling heuristics for multiple use cases. Our results show up to 2.85x improvement in the number of GPUs used and up to 70% reduction in GPU wastage over baseline heuristics. We plan to enable the SRE community to leverage our proposed method in production environments., Comment: 14 pages
- Published
- 2024
19. Quantifying brain development in the HEALthy Brain and Child Development (HBCD) Study: The magnetic resonance imaging and spectroscopy protocol.
- Author
-
Dean, Douglas, Tisdall, M, Wisnowski, Jessica, Feczko, Eric, Gagoski, Borjan, Alexander, Andrew, Edden, Richard, Gao, Wei, Hendrickson, Timothy, Howell, Brittany, Huang, Hao, Humphreys, Kathryn, Riggins, Tracy, Sylvester, Chad, Weldon, Kimberly, Yacoub, Essa, Ahtam, Banu, Beck, Natacha, Banerjee, Suchandrima, Boroday, Sergiy, Caprihan, Arvind, Caron, Bryan, Carpenter, Samuel, Chang, Yulin, Chung, Ai, Cieslak, Matthew, Clarke, William, Dale, Anders, Das, Samir, Davies-Jenkins, Christopher, Dufford, Alexander, Evans, Alan, Fesselier, Laetitia, Ganji, Sandeep, Gilbert, Guillaume, Graham, Alice, Gudmundson, Aaron, Macgregor-Hannah, Maren, Harms, Michael, Hilbert, Tom, Hui, Steve, Irfanoglu, M, Kecskemeti, Steven, Kober, Tobias, Kuperman, Joshua, Lamichhane, Bidhan, Landman, Bennett, Lecour-Bourcher, Xavier, Lee, Erik, Li, Xu, MacIntyre, Leigh, Madjar, Cecile, Manhard, Mary, Mayer, Andrew, Mehta, Kahini, Moore, Lucille, Murali-Manohar, Saipavitra, Navarro, Cristian, Nebel, Mary, Newman, Sharlene, Newton, Allen, Noeske, Ralph, Norton, Elizabeth, Oeltzschner, Georg, Ongaro-Carcy, Regis, Ou, Xiawei, Ouyang, Minhui, Parrish, Todd, Pekar, James, Pengo, Thomas, Pierpaoli, Carlo, Poldrack, Russell, Rajagopalan, Vidya, Rettmann, Dan, Rioux, Pierre, Rosenberg, Jens, Salo, Taylor, Satterthwaite, Theodore, Scott, Lisa, Shin, Eunkyung, Simegn, Gizeaddis, Simmons, W, Song, Yulu, Tikalsky, Barry, Tkach, Jean, van Zijl, Peter, Vannest, Jennifer, Versluis, Maarten, Zhao, Yansong, Zöllner, Helge, Fair, Damien, Smyser, Christopher, and Elison, Jed
- Subjects
Development ,HBCD ,Infant ,MRI ,MRS ,Protocol - Abstract
The HEALthy Brain and Child Development (HBCD) Study, a multi-site prospective longitudinal cohort study, will examine human brain, cognitive, behavioral, social, and emotional development beginning prenatally and planned through early childhood. The acquisition of multimodal magnetic resonance-based brain development data is central to the studys core protocol. However, application of Magnetic Resonance Imaging (MRI) methods in this population is complicated by technical challenges and difficulties of imaging in early life. Overcoming these challenges requires an innovative and harmonized approach, combining age-appropriate acquisition protocols together with specialized pediatric neuroimaging strategies. The HBCD MRI Working Group aimed to establish a core acquisition protocol for all 27 HBCD Study recruitment sites to measure brain structure, function, microstructure, and metabolites. Acquisition parameters of individual modalities have been matched across MRI scanner platforms for harmonized acquisitions and state-of-the-art technologies are employed to enable faster and motion-robust imaging. Here, we provide an overview of the HBCD MRI protocol, including decisions of individual modalities and preliminary data. The result will be an unparalleled resource for examining early neurodevelopment which enables the larger scientific community to assess normative trajectories from birth through childhood and to examine the genetic, biological, and environmental factors that help shape the developing brain.
- Published
- 2024
20. AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes
- Author
-
Murali, Aniruddhan, Alfadel, Mahmoud, Nagappan, Meiyappan, Xu, Meng, and Sun, Chengnian
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Software Engineering ,D.2.5 ,D.2.2 - Abstract
Memory leak bugs are a major problem in C/C++ programs. They occur when memory objects are not deallocated.Developers need to manually deallocate these objects to prevent memory leaks. As such, several techniques have been proposed to automatically fix memory leaks. Although proposed approaches have merit in automatically fixing memory leaks, they present limitations. Static-based approaches attempt to trace the complete semantics of memory object across all paths. However, they have scalability-related challenges when the target program has a large number of leaked paths. On the other hand, dynamic approaches can spell out precise semantics of memory object only on a single execution path (not considering multiple execution paths). In this paper, we complement prior approaches by designing and implementing a novel framework named AddressWatcher. AddressWatcher allows the semantics of a memory object to be tracked on multiple execution paths as a dynamic approach. Addresswatcher accomplishes this by using a leak database that is designed to allow storing and comparing different execution paths of a leak over several test cases. We conduct an evaluation of AddressWatcher on a benchmark of five open-source packages, namely binutils, openssh, tmux, openssl and git. In 23 out of the 50 examined memory leak bugs, AddressWatcher correctly points to a free location to fix memory leaks. Moreover, we submitted 25 new pull requests (PRs) to 12 popular open-source project repositories. These PRs targeted the resolution of memory leaks within these repositories. Among these, 21 PRs were merged, addressing 5 open GitHub issues. In fact, a critical fix prompted a new version release for the calc repository, a program used to find large primes. Furthermore, our contributions through these PRs sparked intense discussions and appreciation in various repositories such as coturn, h2o, and radare2., Comment: Accepted in Transactions in Software Engineering
- Published
- 2024
- Full Text
- View/download PDF
21. Eigenbasis for a weighted adjacency matrix associated with the projective geometry $B_q(n)$
- Author
-
Srinivasan, Murali K.
- Subjects
Mathematics - Combinatorics ,05E30, 51E20 - Abstract
In a recent article "Projective geometries, $Q$-polynomial structures, and quantum groups" Terwilliger (arXiv:2407.14964) defined a certain weighted adjacency matrix, depending on a free (positive real) parameter, associated with the projective geometry, and showed (among many other results) that it is diagonalizable, with the eigenvalues and their multiplicities explicitly written down, and that it satisfies the $Q$-polynomial property (with respect to the zero subspace). In this note we (i) Write down an explicit eigenbasis for this matrix. (ii) Evaluate the adjacency matrix-eigenvector products, yielding a new proof for the eigenvalues and their multiplicities. (iii) Evaluate the dual adjacency matrix-eigenvector products and directly show that the action of the dual adjacency matrix on the eigenspaces of the adjacency matrix is block-tridiagonal, yielding a new proof of the $Q$-polynomial property., Comment: arXiv admin note: substantial text overlap with arXiv:2204.05540
- Published
- 2024
22. AI-Assisted SQL Authoring at Industry Scale
- Author
-
Maddila, Chandra, Ghorbani, Negar, Jabre, Kosay, Murali, Vijayaraghavan, Kim, Edwin, Thakkar, Parth, Laptev, Nikolay Pavlovich, Harman, Olivia, Hsu, Diana, Abreu, Rui, and Rigby, Peter C.
- Subjects
Computer Science - Software Engineering ,Computer Science - Databases - Abstract
SqlCompose brings generative AI into the data analytics domain. SQL is declarative, has formal table schemas, and is often written in a non-linear manner. We address each of these challenges and develop a set of models that shows the importance of each problem. We first develop an internal SQL benchmark to perform offline tests at Meta. We evaluate how well the Public Llama model performs. We attain a BLEU score of 53% and 24% for single- and multi-line predictions, respectively. This performance is consistent with prior works on imperative languages. We then fine-tune Llama on our internal data and database schemas. SqlComposeSA substantially outperforms Llama by 16 percentage points on BLEU score. SQL is often written with multiple sub queries and in a non-sequential manner. We develop SqlComposeFIM which is aware of the context before and after the line(s) that need to be completed. This fill-in-the-middle model outperform SqlComposeFIM by 35 percentage points. We also measure how often the models get the correct table names, and SqlComposeFIM is able to do this 75% of the time. Aside from our scientific research, we also roll out SqlComposeFIM at Meta. SqlCompose is used on a weekly basis by over 10k users including data scientists and software engineers, less than 1% of users have disabled SqlCompose. We use the feedback from users to improve SqlCompose. Interesting positive themes include completing tedious or repetitive SQL clauses, suggesting boilerplate coding, and help in eliminate the need to remember difficult SQL syntax. The most significant negative themes was table and column name hallucinations, which has been reduced with the release of SqlComposeFIM. The SqlCompose models consistently outperform public and internal LLMs, despite being smaller (7 bn and 13 bn), which provides early indications that smaller specialist models can outperform larger general purpose models., Comment: 11 pages
- Published
- 2024
23. On Mitigating Code LLM Hallucinations with API Documentation
- Author
-
Jain, Nihal, Kwiatkowski, Robert, Ray, Baishakhi, Ramanathan, Murali Krishna, and Kumar, Varun
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. CloudAPIBench also provides annotations for frequencies of API occurrences in the public domain, allowing us to study API hallucinations at various frequency levels. Our findings reveal that Code LLMs struggle with low frequency APIs: for e.g., GPT-4o achieves only 38.58% valid low frequency API invocations. We demonstrate that Documentation Augmented Generation (DAG) significantly improves performance for low frequency APIs (increase to 47.94% with DAG) but negatively impacts high frequency APIs when using sub-optimal retrievers (a 39.02% absolute drop). To mitigate this, we propose to intelligently trigger DAG where we check against an API index or leverage Code LLMs' confidence scores to retrieve only when needed. We demonstrate that our proposed methods enhance the balance between low and high frequency API performance, resulting in more reliable API invocations (8.20% absolute improvement on CloudAPIBench for GPT-4o).
- Published
- 2024
24. FACTS About Building Retrieval Augmented Generation-based Chatbots
- Author
-
Akkiraju, Rama, Xu, Anbang, Bora, Deepak, Yu, Tan, An, Lu, Seth, Vishal, Shukla, Aaditya, Gundecha, Pritam, Mehta, Hridhay, Jha, Ashwin, Raj, Prithvi, Balasubramanian, Abhinav, Maram, Murali, Muthusamy, Guru, Annepally, Shivakesh Reddy, Knowles, Sidney, Du, Min, Burnett, Nick, Javiya, Sean, Marannan, Ashok, Kumari, Mamta, Jha, Surbhi, Dereszenski, Ethan, Chakraborty, Anupam, Ranjan, Subhash, Terfai, Amina, Surya, Anoop, Mercer, Tracey, Thanigachalam, Vinodh Kumar, Bar, Tamar, Krishnan, Sanjana, Kilaru, Samy, Jaksic, Jasmine, Algarici, Nave, Liberman, Jacob, Conway, Joey, Nayyar, Sonu, and Boitano, Justin
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots.", Comment: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024
- Published
- 2024
25. CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
- Author
-
Zarch, Hossein Entezari, Alshabanah, Abdulla, Jiang, Chaoyi, and Annavaram, Murali
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Deep learning recommendation models (DLRMs) are at the heart of the current e-commerce industry. However, the amount of training data used to train these large models is growing exponentially, leading to substantial training hurdles. The training dataset contains two primary types of information: content-based information (features of users and items) and collaborative information (interactions between users and items). One approach to reduce the training dataset is to remove user-item interactions. But that significantly diminishes collaborative information, which is crucial for maintaining accuracy due to its inclusion of interaction histories. This loss profoundly impacts DLRM performance. This paper makes an important observation that if one can capture the user-item interaction history to enrich the user and item embeddings, then the interaction history can be compressed without losing model accuracy. Thus, this work, Collaborative Aware Data Compression (CADC), takes a two-step approach to training dataset compression. In the first step, we use matrix factorization of the user-item interaction matrix to create a novel embedding representation for both the users and items. Once the user and item embeddings are enriched by the interaction history information the approach then applies uniform random sampling of the training dataset to drastically reduce the training dataset size while minimizing model accuracy drop. The source code of CADC is available at \href{https://anonymous.4open.science/r/DSS-RM-8C1D/README.md}{https://anonymous.4open.science/r/DSS-RM-8C1D/README.md}.
- Published
- 2024
26. CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM
- Author
-
Murali, Aditya, Mascagni, Pietro, Mutter, Didier, and Padoy, Nicolas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The recently introduced Segment-Anything Model (SAM) has the potential to greatly accelerate the development of segmentation models. However, directly applying SAM to surgical images has key limitations including (1) the requirement of image-specific prompts at test-time, thereby preventing fully automated segmentation, and (2) ineffectiveness due to substantial domain gap between natural and surgical images. In this work, we propose CycleSAM, an approach for one-shot surgical scene segmentation that uses the training image-mask pair at test-time to automatically identify points in the test images that correspond to each object class, which can then be used to prompt SAM to produce object masks. To produce high-fidelity matches, we introduce a novel spatial cycle-consistency constraint that enforces point proposals in the test image to rematch to points within the object foreground region in the training image. Then, to address the domain gap, rather than directly using the visual features from SAM, we employ a ResNet50 encoder pretrained on surgical images in a self-supervised fashion, thereby maintaining high label-efficiency. We evaluate CycleSAM for one-shot segmentation on two diverse surgical semantic segmentation datasets, comprehensively outperforming baseline approaches and reaching up to 50% of fully-supervised performance.
- Published
- 2024
27. Subspaces, subsets, and Motzkin paths
- Author
-
Farley, Jonathan D. and Srinivasan, Murali K.
- Subjects
Mathematics - Combinatorics - Abstract
We define a map from subspaces to Motzkin paths and show that the inverse image of every path is a disjoint union of symmetric Boolean subsets yielding an explicit symmetric Boolean decomposition of the subspace lattice.
- Published
- 2024
28. Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models
- Author
-
Yusuf, Bolaji, Baskar, Murali Karthick, Rosenberg, Andrew, and Ramabhadran, Bhuvana
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language - Abstract
This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency., Comment: Interspeech 2024
- Published
- 2024
29. A Parameterized Algorithm for Vertex and Edge Connectivity of Embedded Graphs
- Author
-
Biedl, Therese, Bose, Prosenjit, and Murali, Karthik
- Subjects
Computer Science - Data Structures and Algorithms ,Mathematics - Combinatorics - Abstract
The problem of computing vertex and edge connectivity of a graph are classical problems in algorithmic graph theory. The focus of this paper is on computing these parameters on embedded graphs. A typical example of an embedded graph is a planar graph which can be drawn with no edge crossings. It has long been known that vertex and edge connectivity of planar embedded graphs can be computed in linear time. Very recently, Biedl and Murali extended the techniques from planar graphs to 1-plane graphs without $\times$-crossings, i.e., crossings whose endpoints induce a matching. While the tools used were novel, they were highly tailored to 1-plane graphs, and do not provide much leeway for further extension. In this paper, we develop alternate techniques that are simpler, have wider applications to near-planar graphs, and can be used to test both vertex and edge connectivity. Our technique works for all those embedded graphs where any pair of crossing edges are connected by a path that, roughly speaking, can be covered with few cells of the drawing. Important examples of such graphs include optimal 2-planar and optimal 3-planar graphs, $d$-map graphs, $d$-framed graphs, graphs with bounded crossing number, and $k$-plane graphs with bounded number of $\times$-crossings.
- Published
- 2024
30. Extended Equivalence of Fuzzy Sets
- Author
-
Murali, Venkat and Nkonkobe, Sithembele
- Subjects
Mathematics - General Mathematics - Abstract
Preferential equality is an equivalence relation on fuzzy subsets of finite sets and is a generalization of classical equality of subsets. In this paper we introduce a tightened version of the preferential equality on fuzzy subsets and derive some important combinatorial formulae for the number of such tight fuzzy subsets of an n-element set where n is a natural number. We also offer some asymptotic results
- Published
- 2024
31. Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
- Author
-
Baskar, Murali Karthick, Rosenberg, Andrew, Ramabhadran, Bhuvana, Gaur, Neeraj, and Meng, Zhong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning. This is a simple approach and does not increase the model complexity or alter the inference pipeline. We also propose language-based soft prompting to further improve with frozen LLMs. Empirical analysis on realtime testset from 10 Indic languages demonstrate that our proposed speech prefix-tuning yields improvements with both frozen and fine-tuned LLMs. Our recognition results on an average of 10 Indics show that the proposed prefix-tuning with RNNT loss results in a 12\% relative improvement in WER over the baseline with a fine-tuned LLM. Our proposed approches with the frozen LLM leads to a 31\% relative improvement over basic soft-prompting prefixLM.
- Published
- 2024
32. AI-coupled HPC Workflow Applications, Middleware and Performance
- Author
-
Brewer, Wes, Gainaru, Ana, Suter, Frédéric, Wang, Feiyi, Emani, Murali, and Jha, Shantenu
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
AI integration is revolutionizing the landscape of HPC simulations, enhancing the importance, use, and performance of AI-driven HPC workflows. This paper surveys the diverse and rapidly evolving field of AI-driven HPC and provides a common conceptual basis for understanding AI-driven HPC workflows. Specifically, we use insights from different modes of coupling AI into HPC workflows to propose six execution motifs most commonly found in scientific applications. The proposed set of execution motifs is by definition incomplete and evolving. However, they allow us to analyze the primary performance challenges underpinning AI-driven HPC workflows. We close with a listing of open challenges, research issues, and suggested areas of investigation including the the need for specific benchmarks that will help evaluate and improve the execution of AI-driven HPC workflows.
- Published
- 2024
33. RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
- Author
-
Yuan, Wentao, Duan, Jiafei, Blukis, Valts, Pumacay, Wilbert, Krishna, Ranjay, Murali, Adithyavairavan, Mousavian, Arsalan, and Fox, Dieter
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs to robotic domains and needs. Using the pipeline, we train RoboPoint, a VLM that predicts image keypoint affordances given language instructions. Compared to alternative approaches, our method requires no real-world data collection or human demonstration, making it much more scalable to diverse environments and viewpoints. In addition, RoboPoint is a general model that enables several downstream applications such as robot navigation, manipulation, and augmented reality (AR) assistance. Our experiments demonstrate that RoboPoint outperforms state-of-the-art VLMs (GPT-4o) and visual prompting techniques (PIVOT) by 21.8% in the accuracy of predicting spatial affordance and by 30.5% in the success rate of downstream tasks. Project website: https://robo-point.github.io.
- Published
- 2024
34. Huge BPS Operators and Fluid Dynamics in $\mathcal{N}=4$ SYM
- Author
-
Kazakov, Vladimir, Murali, Harish, and Vieira, Pedro
- Subjects
High Energy Physics - Theory - Abstract
In the bulk dual of holography, huge operators correspond to sources so heavy that they fully backreact on the space-time geometry. Here we study the correlation function of three such huge operators when they are given by $1/2$ BPS operators in $\mathcal{N}=4$ SYM theory, dual to IIB Strings in $AdS_5 \times S^5$. We unveil simple matrix model representations for these correlators which we can sometimes solve analytically. For general huge operators, we translate these matrix model expressions into a $1+1$ dimensional hydrodynamical fluid problem. This fluid is integrable thus unveiling a novel integrable sector of the $AdS/CFT$ duality in a full fledged gravitational regime, very far from the usual free string planar regime where integrability reigns supreme. We explain how an adiabatic deformation method can be developed to yield the solution to an integrable discrete formulation of these fluids -- the rational Calogero-Moser Model -- so we can access the general three point correlation functions of generic huge $1/2$-BPS operators. Everything will be done on the gauge theory side of the duality. It would be fascinating to find the holographic dual of these matrix models and fluids., Comment: 68 pages, 25 figures
- Published
- 2024
35. Scalable Surface Micro-Texturing of LLZO Solid Electrolytes for Battery Applications
- Author
-
Go, Wooseok, Parkinson, Dilworth Y, Oropeza, Dayana, Zorba, Vassilia, Murali, Sriram S, Doeff, Marca M, and Tucker, Michael C
- Subjects
Engineering ,Materials Engineering ,Chemical Sciences ,Physical Chemistry ,Affordable and Clean Energy ,Chemical sciences - Abstract
A challenge for lithium lanthanum zirconate (LLZO)-based solid-state batteries is to increase the critical current density (CCD) to enable high current cycling. A promising strategy is to modify the LLZO surface morphology to provide a larger contact area with the Li metal. Here, a surface-textured thin LLZO electrolyte was prepared through an easily scalable process. The texturing process is a simple pressing of green LLZO tapes between micro-textured substrates. A variety of textures can be produced, depending on the type of substrate, and texturing can be on either one side or both sides. For this work, after pressing and sintering, several micro-patterns are formed on thin LLZO (∼118 μm thick). The properties of the various samples were characterized to investigate the impact of surface texturing, and the most promising ones were selected for electrochemical testing in symmetrical lithium cells and full cells. Li symmetric cells using a coarse ridge-textured LLZO exhibit ∼2.5 times increased CCD compared to planar non-textured LLZO, and a solid-state full cell shows stable cycling and improved rate performance. We believe this process offers a favorable trade-off of processing complexity vs structural optimization to maximize CCD.
- Published
- 2024
36. A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases
- Author
-
Jain, Ananya, Bhardwaj, Aviral, Murali, Kaushik, and Surani, Isha
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Large language models, notably utilizing Transformer architectures, have emerged as powerful tools due to their scalability and ability to process large amounts of data. Dosovitskiy et al. expanded this architecture to introduce Vision Transformers (ViT), extending its applicability to image processing tasks. Motivated by this advancement, we fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset containing over 100,000 frontal-view X-ray images. Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases, while using Convolutional Neural Networks (CNNs) and ResNet architectures as baseline models for comparison. Through rigorous assessment based on accuracy metrics, we identify that the pre-trained ViT model surpasses CNNs and ResNet in this multilabel classification task, highlighting its potential for accurate diagnosis of various lung conditions from chest X-ray images., Comment: 8 pages, 6 figures
- Published
- 2024
37. Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scans
- Author
-
Quetin, Sébastien, Heschl, Andrew, Murillo, Mauricio, Murali, Rohit, Enger, Shirin A., and Maleki, Farhad
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients. Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline. Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission. Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs.
- Published
- 2024
38. Challenges and Opportunities for Large-Scale Exploration with Air-Ground Teams using Semantics
- Author
-
Cladera, Fernando, Miller, Ian D., Ravichandran, Zachary, Murali, Varun, Hughes, Jason, Hsieh, M. Ani, Taylor, C. J., and Kumar, Vijay
- Subjects
Computer Science - Robotics - Abstract
One common and desirable application of robots is exploring potentially hazardous and unstructured environments. Air-ground collaboration offers a synergistic approach to addressing such exploration challenges. In this paper, we demonstrate a system for large-scale exploration using a team of aerial and ground robots. Our system uses semantics as lingua franca, and relies on fully opportunistic communications. We highlight the unique challenges from this approach, explain our system architecture and showcase lessons learned during our experiments. All our code is open-source, encouraging researchers to use it and build upon., Comment: 6 pages, 5 figres
- Published
- 2024
39. Tunable dynamical tissue phantom for laser speckle imaging
- Author
-
Sarkar, Soumyajit, Murali, K., and Varma, Hari M.
- Subjects
Physics - Medical Physics - Abstract
We introduce a novel method to design and implement a tunable dynamical tissue phantom for laser speckle-based in-vivo blood flow imaging. This approach relies on Stochastic Differential Equations (SDE) to control a piezoelectric actuator which, upon illuminated with a laser source, generates speckles of pre-defined probability density function and auto-correlation. The validation experiments show that the phantom can generate dynamic speckles that closely replicate both surfaces as well as deep tissue blood flow for a reasonably wide range and accuracy.
- Published
- 2024
40. High-efficiency perovskite-organic blend light-emitting diodes featuring self-assembled monolayers as hole-injecting interlayers
- Author
-
Gedda, Murali, Gkeka, Despoina, Nugraha, Mohamad Insan, Scaccabarozzi, Alberto D., Yengel, Emre, Khan, Jafar I., Hamilton, Iain, Lin, Yuanbao, Deconinck, Marielle, Vaynzof, Yana, Laquai, Frédéric, Bradley, Donal D. C., and Anthopoulos, Thomas D.
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science - Abstract
The high photoluminescence efficiency, color purity, extended gamut, and solution processability make low-dimensional hybrid perovskites attractive for light-emitting diode (PeLED) applications. However, controlling the microstructure of these materials to improve the device performance remains challenging. Here, the development of highly efficient green PeLEDs based on blends of the quasi-2D (q2D) perovskite, PEA2Cs4Pb5Br16, and the wide bandgap organic semiconductor 2,7 dioctyl[1] benzothieno[3,2-b]benzothiophene (C8-BTBT) is reported. The presence of C8-BTBT enables the formation of single-crystal-like q2D PEA2Cs4Pb5Br16 domains that are uniform and highly luminescent. Combining the PEA2Cs4Pb5Br16:C8-BTBT with self-assembled monolayers (SAMs) as hole-injecting layers (HILs), yields green PeLEDs with greatly enhanced performance characteristics, including external quantum efficiency up to 18.6%, current efficiency up to 46.3 cd/A, the luminance of 45 276 cd m^-2, and improved operational stability compared to neat PeLEDs. The enhanced performance originates from multiple synergistic effects, including enhanced hole-injection enabled by the SAM HILs, the single crystal-like quality of the perovskite phase, and the reduced concentration of electronic defects. This work highlights perovskite:organic blends as promising systems for use in LEDs, while the use of SAM HILs creates new opportunities toward simpler and more stable PeLEDs.
- Published
- 2024
41. Photophysics of defect-passivated quasi-2D (PEA)2PbBr4 perovskite using an organic small-molecule
- Author
-
Khan, Jafar I., Gedda, Murali, Wang, Mingcong, Yengel, Emre, Kreß, Joshua A., Vaynzof, Yana, Anthopoulos, Thomas D., and Laquai, Frédéric
- Subjects
Condensed Matter - Materials Science ,Physics - Optics - Abstract
2D Ruddlesden - Popper perovskites are promising candidates for energy harvesting applications due to their tunable optical properties and excellent ambient stability. Moreover, they are solution-processable and compatible with upscalable manufacturing via various printing techniques. Unfortunately, such methods often induce large degrees of heterogeneity due to poorly controlled crystallization. Here, we address this issue by blending the well-known 2D perovskite (PEA)2PbBr4 with an organic small-molecule, namely C8-BTBT, employed as an additive with different blending ratios. Using terahertz (THz) absorption and temperature-dependent photoluminescence (PL) spectroscopy techniques we observe that with the C8-BTBT additive the photophysical properties are altered while the perovskite structure in the film remains unaffected. More precisely, the inclusion of trace amounts of C8-BTBT in the hybrid films results in defect passivation at perovskite platelet boundaries and at the surfaces, as indicated by increased carrier lifetimes and substantially increased photoluminescence quantum yields (PLQY). This in turn improves the responsivity of photodetectors using the 2D perovskite as active layer. Our study highlights a straightforward strategy for fabricating high-quality 2D perovskites via large-area processing techniques.
- Published
- 2024
42. CodeFort: Robust Training for Code Generation Models
- Author
-
Zhang, Yuhao, Wang, Shiqi, Qian, Haifeng, Wang, Zijian, Shang, Mingyue, Liu, Linbo, Gouda, Sanjay Krishna, Ray, Baishakhi, Ramanathan, Murali Krishna, Ma, Xiaofei, and Deoras, Anoop
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Code generation models are not robust to small perturbations, which often lead to incorrect generations and significantly degrade the performance of these models. Although improving the robustness of code generation models is crucial to enhancing user experience in real-world applications, existing research efforts do not address this issue. To fill this gap, we propose CodeFort, a framework to improve the robustness of code generation models, generalizing a large variety of code perturbations to enrich the training data and enabling various robust training strategies, mixing data augmentation, batch augmentation, adversarial logits pairing, and contrastive learning, all carefully designed to support high-throughput training. Extensive evaluations show that we increase the average robust pass rates of baseline CodeGen models from 14.79 to 21.74. We notably decrease the robustness drop rate from 95.02% to 54.95% against code-syntax perturbations.
- Published
- 2024
43. Adiabatic modulation of driving protocols in periodically driven quantum systems
- Author
-
Murali, Ashwin, Sarkar, Tapomoy Guha, and Bandyopadhyay, Jayendra N.
- Subjects
Quantum Physics ,Condensed Matter - Quantum Gases - Abstract
We consider a periodically driven system where the high-frequency driving protocol consists of a sequence of potentials switched on and off at different instants within a period. We explore the possibility of introducing an adiabatic modulation of the driving protocol by considering a slow evolution of the instants when the sequence of potentials is switched on/off. We examine how this influences the long-term dynamics of periodically driven quantum systems. By assuming that the slow and fast timescales in the problem can be decoupled, we derive the stroboscopic (effective) Hamiltonian for a four-step driving sequence up to the first order in perturbation theory. We then apply this approach to a rigid rotor, where the adiabatic modulation of the driving protocol is chosen to produce an evolving emergent magnetic field that interacts with the rotor's spin. We study the emergence of $\textit{diabolical points}$ and $\textit{diabolical loci}$ in the parameter space of the effective Hamiltonian. Further, we study the topological properties of the maps of the adiabatic paths in the parameter space to the eigenspace of the effective Hamiltonian. In effect, we obtain a technique to tune the topological properties of the eigenstates by selecting various adiabatic evolution of the driving protocol characterized by different paths in the parameter space. This technique can be applied to any periodic driving protocol to achieve desirable topological effects., Comment: 10 pages, 7 figures
- Published
- 2024
44. Predictable Verification using Intrinsic Definitions
- Author
-
Murali, Adithya, Rivera, Cody, and Madhusudan, P.
- Subjects
Computer Science - Programming Languages ,Computer Science - Logic in Computer Science - Abstract
We propose a novel mechanism of defining data structures using intrinsic definitions that avoids recursion and instead utilizes monadic maps satisfying local conditions. We show that intrinsic definitions are a powerful mechanism that can capture a variety of data structures naturally. We show that they also enable a predictable verification methodology that allows engineers to write ghost code to update monadic maps and perform verification using reduction to decidable logics. We evaluate our methodology using Boogie and prove a suite of data structure manipulating programs correct., Comment: Published at PLDI 2024
- Published
- 2024
- Full Text
- View/download PDF
45. FaiRTT: An Empirical Approach for Enhanced RTT Fairness and Bottleneck Throughput in BBR
- Author
-
Abrol, Akshita, Mohan, Purnima Murali, and Truong-Huu, Tram
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
In next-generation networks, achieving Round-trip Time (RTT) fairness is essential for ensuring fair bandwidth distribution among diverse flow types, enhancing overall network utilization. The TCP congestion control algorithm -- BBR, was proposed by Google to dynamically adjust sending rates in response to changing network conditions. While BBRv2 was implemented to overcome the unfairness limitation of BBRv1, it still faces intra-protocol fairness challenges in balancing the demands of high-bandwidth, long-RTT elephant flows and more frequent short-RTT mice flows. These issues lead to throughput imbalances and queue buildup, resulting in elephant flow dominance and mice flow starvation. In this paper, we first investigate the limitations of Google's BBR algorithm, specifically in the context of intra-protocol RTT fairness in beyond 5G (B5G) networks. While existing works address this limitation by adjusting the pacing rate, it eventually leads to low throughput. We hence develop the FaiRTT algorithm to resolve the problem by dynamically estimating the Bandwidth Delay Product (BDP) sending rate based on RTT measurements, focusing on equitable bandwidth allocation. By modeling the Inf light dependency on the BDP, bottleneck bandwidth, and packet departure time after every ACK, we can resolve the intra-protocol fairness while not compromising the throughput on the bottleneck link. Through extensive simulations on NS-3 and comprehensive performance evaluations, FaiRTT is shown to significantly improve the fairness index and network throughput, significantly outperforming BBRv2, for diverse flow types. FaiRTT achieves an average throughput ratio of 1.08 between elephant and mice flows, an average fairness index of 0.98, and an average utilization of the bottleneck link of 98.78%., Comment: Accepted for IEEE ICC 2024 Workshop - DDINS
- Published
- 2024
46. Differentially Private Next-Token Prediction of Large Language Models
- Author
-
Flemings, James, Razaviyayn, Meisam, and Annavaram, Murali
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the other hand, commercial LLM deployments are predominantly cloud-based; hence, adversarial access to LLMs is black-box. Motivated by these observations, we present Private Mixing of Ensemble Distributions (PMixED): a private prediction protocol for next-token prediction that utilizes the inherent stochasticity of next-token sampling and a public model to achieve Differential Privacy. We formalize this by introducing RD-mollifers which project each of the model's output distribution from an ensemble of fine-tuned LLMs onto a set around a public LLM's output distribution, then average the projected distributions and sample from it. Unlike DP-SGD which needs to consider the model architecture during training, PMixED is model agnostic, which makes PMixED a very appealing solution for current deployments. Our results show that PMixED achieves a stronger privacy guarantee than sample-level privacy and outperforms DP-SGD for privacy $\epsilon = 8$ on large-scale datasets. Thus, PMixED offers a practical alternative to DP training methods for achieving strong generative utility without compromising privacy.
- Published
- 2024
47. Edge Private Graph Neural Networks with Singular Value Perturbation
- Author
-
Tang, Tingting, Niu, Yue, Avestimehr, Salman, and Annavaram, Murali
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Social and Information Networks - Abstract
Graph neural networks (GNNs) play a key role in learning representations from graph-structured data and are demonstrated to be useful in many applications. However, the GNN training pipeline has been shown to be vulnerable to node feature leakage and edge extraction attacks. This paper investigates a scenario where an attacker aims to recover private edge information from a trained GNN model. Previous studies have employed differential privacy (DP) to add noise directly to the adjacency matrix or a compact graph representation. The added perturbations cause the graph structure to be substantially morphed, reducing the model utility. We propose a new privacy-preserving GNN training algorithm, Eclipse, that maintains good model utility while providing strong privacy protection on edges. Eclipse is based on two key observations. First, adjacency matrices in graph structures exhibit low-rank behavior. Thus, Eclipse trains GNNs with a low-rank format of the graph via singular values decomposition (SVD), rather than the original graph. Using the low-rank format, Eclipse preserves the primary graph topology and removes the remaining residual edges. Eclipse adds noise to the low-rank singular values instead of the entire graph, thereby preserving the graph privacy while still maintaining enough of the graph structure to maintain model utility. We theoretically show Eclipse provide formal DP guarantee on edges. Experiments on benchmark graph datasets show that Eclipse achieves significantly better privacy-utility tradeoff compared to existing privacy-preserving GNN training methods. In particular, under strong privacy constraints ($\epsilon$ < 4), Eclipse shows significant gains in the model utility by up to 46%. We further demonstrate that Eclipse also has better resilience against common edge attacks (e.g., LPA), lowering the attack AUC by up to 5% compared to other state-of-the-art baselines., Comment: Accepted at Privacy Enhancing Technologies Symposium (PETS) 2024
- Published
- 2024
48. Repoformer: Selective Retrieval for Repository-Level Code Completion
- Author
-
Wu, Di, Ahmad, Wasi Uddin, Zhang, Dejiao, Ramanathan, Murali Krishna, and Ma, Xiaofei
- Subjects
Computer Science - Software Engineering ,Computer Science - Computation and Language - Abstract
Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a large proportion of the retrieved contexts proving unhelpful or harmful to code language models (code LMs). In this paper, we propose a selective RAG framework to avoid retrieval when unnecessary. To power this framework, we design a self-supervised learning approach to enable a code LM to accurately self-evaluate whether retrieval can improve its output quality and robustly leverage the potentially noisy retrieved contexts. Using this LM as both the selective RAG policy and the generation model, our framework achieves state-of-the-art repository-level code completion performance on diverse benchmarks including RepoEval, CrossCodeEval, and CrossCodeLongEval, a new long-form code completion benchmark. Meanwhile, our analyses show that selectively retrieving brings as much as 70% inference speedup in the online serving setting without harming the performance. We further demonstrate that our framework is able to accommodate different generation models, retrievers, and programming languages. These advancements position our framework as an important step towards more accurate and efficient repository-level code completion., Comment: ICML 2024
- Published
- 2024
49. Ethos: Rectifying Language Models in Orthogonal Parameter Space
- Author
-
Gao, Lei, Niu, Yue, Tang, Tingting, Avestimehr, Salman, and Annavaram, Murali
- Subjects
Computer Science - Computation and Language - Abstract
Language models (LMs) have greatly propelled the research on natural language processing. However, LMs also raise concerns regarding the generation of biased or toxic content and the potential disclosure of private information from the training dataset. In this work, we present a new efficient approach, Ethos, that rectifies LMs to mitigate toxicity and bias in outputs and avoid privacy leakage. Ethos is built on task arithmetic. However, unlike current task arithmetic algorithms, Ethos distinguishes general beneficial and undesired knowledge when reconstructing task vectors. Specifically, Ethos first obtains a set of principal components from the pre-trained models using singular value decomposition. Then, by projecting the task vector onto principal components, Ethos identifies the principal components that encode general or undesired knowledge. Ethos performs negating using the task vector with undesired knowledge only, thereby minimizing collateral damage on general model utility. We demonstrate the efficacy of our approach on three different tasks: debiasing, detoxification, and memorization unlearning. Evaluations show Ethos is more effective in removing undesired knowledge and maintaining the overall model performance compared to current task arithmetic methods.
- Published
- 2024
50. Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer
- Author
-
Satyanaik, Siddhant, Murali, Aditya, Alapatt, Deepak, Wang, Xin, Mascagni, Pietro, and Padoy, Nicolas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Purpose: Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multi-centric performance benchmark of object-centric approaches, focusing on Critical View of Safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. Methods: We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g. ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. Results: Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. Conclusion: We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods., Comment: 7 pages, 3 figures, Accepted to IPCAI 2024
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.