72,412 results on '"Krishnamurthy, A."'
Search Results
2. BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
- Author
-
Rodriguez, Juan, Jian, Xiangru, Panigrahi, Siba Smarak, Zhang, Tianyu, Feizi, Aarash, Puri, Abhay, Kalkunte, Akshay, Savard, François, Masry, Ahmed, Nayak, Shravan, Awal, Rabiul, Massoud, Mahsa, Abaskohi, Amirhossein, Li, Zichao, Wang, Suyuchen, Noël, Pierre-André, Richter, Mats Leon, Vadacchino, Saverio, Agarwal, Shubbam, Biswas, Sanket, Shanian, Sara, Zhang, Ying, Bolger, Noah, MacDonald, Kurt, Fauvel, Simon, Tejaswi, Sathwik, Sunkara, Srinivas, Monteiro, Joao, Dvijotham, Krishnamurthy DJ, Scholak, Torsten, Chapados, Nicolas, Kharagani, Sepideh, Hughes, Sean, Özsu, M., Reddy, Siva, Pedersoli, Marco, Bengio, Yoshua, Pal, Christopher, Laradji, Issam, Gella, Spandanna, Taslakian, Perouz, Vazquez, David, and Rajeswar, Sai
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io ., Comment: The project is hosted at https://bigdocs.github.io
- Published
- 2024
3. DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism
- Author
-
Krishnamurthy, Sudha, Bhat, Vimal, and Jain, Abhinav
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The proliferation of several streaming services in recent years has now made it possible for a diverse audience across the world to view the same media content, such as movies or TV shows. While translation and dubbing services are being added to make content accessible to the local audience, the support for making content accessible to people with different abilities, such as the Deaf and Hard of Hearing (DHH) community, is still lagging. Our goal is to make media content more accessible to the DHH community by generating sign language videos with synthetic signers that are realistic and expressive. Using the same signer for a given media content that is viewed globally may have limited appeal. Hence, our approach combines parametric modeling and generative modeling to generate realistic-looking synthetic signers and customize their appearance based on user preferences. We first retarget human sign language poses to 3D sign language avatars by optimizing a parametric model. The high-fidelity poses from the rendered avatars are then used to condition the poses of synthetic signers generated using a diffusion-based generative model. The appearance of the synthetic signer is controlled by an image prompt supplied through a visual adapter. Our results show that the sign language videos generated using our approach have better temporal consistency and realism than signing videos generated by a diffusion model conditioned only on text prompts. We also support multimodal prompts to allow users to further customize the appearance of the signer to accommodate diversity (e.g. skin tone, gender). Our approach is also useful for signer anonymization., Comment: Published in Proceedings of ECCV, Workshop on Assistive Computer Vision and Robotics, 2024
- Published
- 2024
4. Self-Improvement in Language Models: The Sharpening Mechanism
- Author
-
Huang, Audrey, Block, Adam, Foster, Dylan J., Rohatgi, Dhruv, Zhang, Cyril, Simchowitz, Max, Ash, Jordan T., and Krishnamurthy, Akshay
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Recent work in language modeling has raised the possibility of self-improvement, where a language models evaluates and refines its own generations to achieve higher performance without external feedback. It is impossible for this self-improvement to create information that is not already in the model, so why should we expect that this will lead to improved capabilities? We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening. Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training in order to ``sharpen'' the model to one placing large mass on high-quality sequences, thereby amortizing the expensive inference-time computation of generating good sequences. We begin by introducing a new statistical framework for sharpening in which the learner aims to sharpen a pre-trained base policy via sample access, and establish fundamental limits. Then we analyze two natural families of self-improvement algorithms based on SFT and RLHF. We find that (i) the SFT-based approach is minimax optimal whenever the initial model has sufficient coverage, but (ii) the RLHF-based approach can improve over SFT-based self-improvement by leveraging online exploration, bypassing the need for coverage. Finally, we empirically validate the sharpening mechanism via inference-time and amortization experiments. We view these findings as a starting point toward a foundational understanding that can guide the design and evaluation of self-improvement algorithms.
- Published
- 2024
5. RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
- Author
-
Goswami, Raktim Gautam, Krishnamurthy, Prashanth, LeCun, Yann, and Khorrami, Farshad
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. While images of robots inherently contain rich information about the robot's physical structures, existing methods often fail to leverage it fully; therefore, limiting performance under occlusions and truncations. To address this, we introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture. Specifically, we mask the robot's joints and pre-train an encoder-predictor model to infer the joints' embeddings from surrounding unmasked regions, enhancing the encoder's understanding of the robot's physical model. The pre-trained encoder-predictor pair, along with joint angle and keypoint prediction networks, is then fine-tuned for pose and joint angle estimation. Random masking of input during fine-tuning and keypoint filtering during evaluation further improves robustness. Our method, evaluated on several datasets, achieves the best results in robot pose and joint angle estimation while being the least sensitive to occlusions and requiring the lowest execution time.
- Published
- 2024
6. Enhanced Capture Point Control Using Thruster Dynamics and QP-Based Optimization for Harpy
- Author
-
Pitroda, Shreyansh, Sihite, Eric, Liu, Taoran, Krishnamurthy, Kaushik Venkatesh, Wang, Chenghao, Salagame, Adarsh, Nemovi, Reza, Ramezani, Alireza, and Gharib, Morteza
- Subjects
Computer Science - Robotics - Abstract
Our work aims to make significant strides in understanding unexplored locomotion control paradigms based on the integration of posture manipulation and thrust vectoring. These techniques are commonly seen in nature, such as Chukar birds using their wings to run on a nearly vertical wall. In this work, we developed a capture-point-based controller integrated with a quadratic programming (QP) solver which is used to create a thruster-assisted dynamic bipedal walking controller for our state-of-the-art Harpy platform. Harpy is a bipedal robot capable of legged-aerial locomotion using its legs and thrusters attached to its main frame. While capture point control based on centroidal models for bipedal systems has been extensively studied, the use of these thrusters in determining the capture point for a bipedal robot has not been extensively explored. The addition of these external thrust forces can lead to interesting interpretations of locomotion, such as virtual buoyancy studied in aquatic-legged locomotion. In this work, we derive a thruster-assisted bipedal walking with the capture point controller and implement it in simulation to study its performance., Comment: Submitted to ACC2025. arXiv admin note: substantial text overlap with arXiv:2406.14799, arXiv:2411.12968
- Published
- 2024
7. Conjugate momentum based thruster force estimate in dynamic multimodal robot
- Author
-
Pitroda, Shreyansh, Sihite, Eric, Liu, Taoran, Krishnamurthy, Kaushik Venkatesh, Wang, Chenghao, Salagame, Adarsh, Nemovi, Reza, Ramezani, Alireza, and Gharib, Morteza
- Subjects
Computer Science - Robotics - Abstract
In a multi-modal system which combines thruster and legged locomotion such our state-of-the-art Harpy platform to perform dynamic locomotion. Therefore, it is very important to have a proper estimate of Thruster force. Harpy is a bipedal robot capable of legged-aerial locomotion using its legs and thrusters attached to its main frame. we can characterize thruster force using a thrust stand but it generally does not account for working conditions such as battery voltage. In this study, we present a momentum-based thruster force estimator. One of the key information required to estimate is terrain information. we show estimation results with and without terrain knowledge. In this work, we derive a conjugate momentum thruster force estimator and implement it on a numerical simulator that uses thruster force to perform thruster-assisted walking., Comment: Submitted to ACC 2025. arXiv admin note: text overlap with arXiv:2411.12968
- Published
- 2024
8. Quadratic Programming Optimization for Bio-Inspired Thruster-Assisted Bipedal Locomotion on Inclined Slopes
- Author
-
Pitroda, Shreyansh, Sihite, Eric, Krishnamurthy, Kaushik Venkatesh, Wang, Chenghao, Salagame, Adarsh, Nemovi, Reza, Ramezani, Alireza, and Gharib, Morteza
- Subjects
Computer Science - Robotics - Abstract
Our work aims to make significant strides in understanding unexplored locomotion control paradigms based on the integration of posture manipulation and thrust vectoring. These techniques are commonly seen in nature, such as Chukar birds using their wings to run on a nearly vertical wall. In this work, we show quadratic programming with contact constraints which is then given to the whole body controller to map on robot states to produce a thruster-assisted slope walking controller for our state-of-the-art Harpy platform. Harpy is a bipedal robot capable of legged-aerial locomotion using its legs and thrusters attached to its main frame. The optimization-based walking controller has been used for dynamic locomotion such as slope walking, but the addition of thrusters to perform inclined slope walking has not been extensively explored. In this work, we derive a thruster-assisted bipedal walking with the quadratic programming (QP) controller and implement it in simulation to study its performance., Comment: Submitted to ACC2025. arXiv admin note: text overlap with arXiv:2406.14799
- Published
- 2024
9. Enabling steep slope walking on Husky using reduced order modeling and quadratic programming
- Author
-
Krishnamurthy, Kaushik Venkatesh, Sihite, Eric, Wang, Chenghao, Pitroda, Shreyansh, Salagame, Adarsh, Ramezani, Alireza, and Gharib, Morteza
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Wing-assisted inclined running (WAIR) observed in some young birds, is an attractive maneuver that can be extended to legged aerial systems. This study proposes a control method using a modified Variable Length Inverted Pendulum (VLIP) by assuming a fixed zero moment point and thruster forces collocated at the center of mass of the pendulum. A QP MPC is used to find the optimal ground reaction forces and thruster forces to track a reference position and velocity trajectory. Simulation results of this VLIP model on a slope of 40 degrees is maintained and shows thruster forces that can be obtained through posture manipulation. The simulation also provides insight to how the combined efforts of the thrusters and the tractive forces from the legs make WAIR possible in thruster-assisted legged systems., Comment: 6 pages, 8 figures, submitted to the Humanoids 2025 conference
- Published
- 2024
10. Optimization free control and ground force estimation with momentum observer for a multimodal legged aerial robot
- Author
-
Krishnamurthy, Kaushik Venkatesh, Wang, Chenghao, Pitroda, Shreyansh, Sihite, Eric, Ramezani, Alireza, and Gharib, Morteza
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Legged-aerial multimodal robots can make the most of both legged and aerial systems. In this paper, we propose a control framework that bypasses heavy onboard computers by using an optimization-free Explicit Reference Governor that incorporates external thruster forces from an attitude controller. Ground reaction forces are maintained within friction cone constraints using costly optimization solvers, but the ERG framework filters applied velocity references that ensure no slippage at the foot end. We also propose a Conjugate momentum observer, that is widely used in Disturbance Observation to estimate ground reaction forces and compare its efficacy against a constrained model in estimating ground reaction forces in a reduced-order simulation of Husky., Comment: 6 pages, 10 figures, submitted to American Control Conference 2025
- Published
- 2024
11. CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
- Author
-
Joshi, Ishika, Shahid, Simra, Venneti, Shreeya, Vasu, Manushree, Zheng, Yantao, Li, Yunyao, Krishnamurthy, Balaji, and Chan, Gromit Yeuk-Yin
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Ensuring large language models' (LLMs) responses align with prompt instructions is crucial for application development. Based on our formative study with industry professionals, the alignment requires heavy human involvement and tedious trial-and-error especially when there are many instructions in the prompt. To address these challenges, we introduce CoPrompter, a framework that identifies misalignment based on assessing multiple LLM responses with criteria. It proposes a method to generate evaluation criteria questions derived directly from prompt requirements and an interface to turn these questions into a user-editable checklist. Our user study with industry prompt engineers shows that CoPrompter improves the ability to identify and refine instruction alignment with prompt requirements over traditional methods, helps them understand where and how frequently models fail to follow user's prompt requirements, and helps in clarifying their own requirements, giving them greater control over the response evaluation process. We also present the design lessons to underscore our system's potential to streamline the prompt engineering process.
- Published
- 2024
12. Longitudinal Ensemble Integration for sequential classification with multimodal data
- Author
-
Susman, Aviad, Krishnamurthy, Rupak, Li, Yan Chak, Olaimat, Mohammad, Bozdag, Serdar, Varghese, Bino, Sheikh-Bahaei, Nasim, and Pandey, Gaurav
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Effectively modeling multimodal longitudinal data is a pressing need in various application areas, especially biomedicine. Despite this, few approaches exist in the literature for this problem, with most not adequately taking into account the multimodality of the data. In this study, we developed multiple configurations of a novel multimodal and longitudinal learning framework, Longitudinal Ensemble Integration (LEI), for sequential classification. We evaluated LEI's performance, and compared it against existing approaches, for the early detection of dementia, which is among the most studied multimodal sequential classification tasks. LEI outperformed these approaches due to its use of intermediate base predictions arising from the individual data modalities, which enabled their better integration over time. LEI's design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses. Overall, our work demonstrates the potential of LEI for sequential classification from longitudinal multimodal data., Comment: 11 pages, submitted to ICLR 2025
- Published
- 2024
13. ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
- Author
-
Srivastava, Ashutosh, Menta, Tarun Ram, Java, Abhinav, Jadhav, Avadhoot, Singh, Silky, Jandial, Surgan, and Krishnamurthy, Balaji
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality photorealistic images. While the de facto method for performing edits with T2I models is through text instructions, this approach non-trivial due to the complex many-to-many mapping between natural language and images. In this work, we address exemplar-based image editing -- the task of transferring an edit from an exemplar pair to a content image(s). We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities while ensuring the fidelity of the edited image. We validate the effectiveness of ReEdit through extensive comparisons with state-of-the-art baselines and sensitivity analyses of key design choices. Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively. Additionally, ReEdit boasts high practical applicability, as it does not require any task-specific optimization and is four times faster than the next best baseline., Comment: First three authors contributed equally to this work
- Published
- 2024
14. GRATEV2.0: Computational Tools for Real-time Analysis of High-throughput High-resolution TEM (HRTEM) Images of Conjugated Polymers
- Author
-
Gamdha, Dhruv, Fair, Ryan, Krishnamurthy, Adarsh, Gomez, Enrique, and Ganapathysubramanian, Baskar
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
Automated analysis of high-resolution transmission electron microscopy (HRTEM) images is increasingly essential for advancing research in organic electronics, where precise characterization of nanoscale crystal structures is crucial for optimizing material properties. This paper introduces an open-source computational framework called GRATEV2.0 (GRaph-based Analysis of TEM), designed for real-time analysis of HRTEM data, with a focus on characterizing complex microstructures in conjugated polymers, illustrated using Poly[N-9'-heptadecanyl-2,7-carbazole-alt-5,5-(4',7'-di-2-thienyl-2',1',3'-benzothiadiazole)] (PCDTBT), a key material in organic photovoltaics. GRATEV2.0 employs fast, automated image processing algorithms, enabling rapid extraction of structural features like d-spacing, orientation, and crystal shape metrics. Gaussian process optimization rapidly identifies the user-defined parameters in the approach, reducing the need for manual parameter tuning and thus enhancing reproducibility and usability. Additionally, GRATEV2.0 is compatible with high-performance computing (HPC) environments, allowing for efficient, large-scale data processing at near real-time speeds. A unique feature of GRATEV2.0 is a Wasserstein distance-based stopping criterion, which optimizes data collection by determining when further sampling no longer adds statistically significant information. This capability optimizes the amount of time the TEM facility is used while ensuring data adequacy for in-depth analysis. Open-source and tested on a substantial PCDTBT dataset, this tool offers a powerful, robust, and accessible solution for high-throughput material characterization in organic electronics., Comment: 15 pages, 9 figures, 3 tables
- Published
- 2024
15. Interacting Large Language Model Agents. Interpretable Models and Social Learning
- Author
-
Jain, Adit and Krishnamurthy, Vikram
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Emerging Technologies ,Computer Science - Multiagent Systems ,Electrical Engineering and Systems Science - Systems and Control - Abstract
This paper develops theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making by interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and stochastic control algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from prior decisions and external inputs, they exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the sufficient conditions for rationally inattentive (bounded rationality) utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under two settings: (a) centrally controlled LLMAs and (b) autonomous LLMAs with incentives. Throughout the paper, we demonstrate the efficacy of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like Mistral and closed-source models like ChatGPT. The main takeaway of this paper, based on substantial empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
- Published
- 2024
16. Simulating incompressible flows over complex geometries using the shifted boundary method with incomplete adaptive octree meshes
- Author
-
Yang, Cheng-Hau, Scovazzi, Guglielmo, Krishnamurthy, Adarsh, and Ganapathysubramanian, Baskar
- Subjects
Physics - Fluid Dynamics ,Mathematics - Numerical Analysis - Abstract
We extend the shifted boundary method (SBM) to the simulation of incompressible fluid flow using immersed octree meshes. Previous work on SBM for fluid flow primarily utilized two- or three-dimensional unstructured tetrahedral grids. Recently, octree grids have become an essential component of immersed CFD solvers, and this work addresses this gap and the associated computational challenges. We leverage an optimal (approximate) surrogate boundary constructed efficiently on incomplete and adaptive octree meshes. The resulting framework enables the simulation of the incompressible Navier-Stokes equations in complex geometries without requiring boundary-fitted grids. Simulations of benchmark tests in two and three dimensions demonstrate that the Octree-SBM framework is a robust, accurate, and efficient approach to simulating fluid dynamics problems with complex geometries.
- Published
- 2024
17. CurateGPT: A flexible language-model assisted biocuration tool
- Author
-
Caufield, Harry, Kroll, Carlo, O'Neil, Shawn T, Reese, Justin T, Joachimiak, Marcin P, Hegde, Harshad, Harris, Nomi L, Krishnamurthy, Madan, McLaughlin, James A, Smedley, Damian, Haendel, Melissa A, Robinson, Peter N, and Mungall, Christopher J
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Databases ,Quantitative Biology - Quantitative Methods - Abstract
Effective data-driven biomedical discovery requires data curation: a time-consuming process of finding, organizing, distilling, integrating, interpreting, annotating, and validating diverse information into a structured form suitable for databases and knowledge bases. Accurate and efficient curation of these digital assets is critical to ensuring that they are FAIR, trustworthy, and sustainable. Unfortunately, expert curators face significant time and resource constraints. The rapid pace of new information being published daily is exceeding their capacity for curation. Generative AI, exemplified by instruction-tuned large language models (LLMs), has opened up new possibilities for assisting human-driven curation. The design philosophy of agents combines the emerging abilities of generative AI with more precise methods. A curator's tasks can be aided by agents for performing reasoning, searching ontologies, and integrating knowledge across external sources, all efforts otherwise requiring extensive manual effort. Our LLM-driven annotation tool, CurateGPT, melds the power of generative AI together with trusted knowledge bases and literature sources. CurateGPT streamlines the curation process, enhancing collaboration and efficiency in common workflows. Compared to direct interaction with an LLM, CurateGPT's agents enable access to information beyond that in the LLM's training data and they provide direct links to the data supporting each claim. This helps curators, researchers, and engineers scale up curation efforts to keep pace with the ever-increasing volume of scientific data.
- Published
- 2024
18. Data-Efficient System Identification via Lipschitz Neural Networks
- Author
-
Wei, Shiqing, Krishnamurthy, Prashanth, and Khorrami, Farshad
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Extracting dynamic models from data is of enormous importance in understanding the properties of unknown systems. In this work, we employ Lipschitz neural networks, a class of neural networks with a prescribed upper bound on their Lipschitz constant, to address the problem of data-efficient nonlinear system identification. Under the (fairly weak) assumption that the unknown system is Lipschitz continuous, we propose a method to estimate the approximation error bound of the trained network and the bound on the difference between the simulated trajectories by the trained models and the true system. Empirical results show that our method outperforms classic fully connected neural networks and Lipschitz regularized networks through simulation studies on three dynamical systems, and the advantage of our method is more noticeable when less data is used for training.
- Published
- 2024
19. PARPHOM: PARallel PHOnon calculator for Moir\'e systems
- Author
-
Mandal, Shinjan, Maity, Indrajit, Krishnamurthy, H R, and Jain, Manish
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science ,Physics - Computational Physics - Abstract
The introduction of a twist between two layers of two-dimensional materials has opened up a new and exciting field of research known as twistronics. In these systems, the phonon dispersions show significant renormalization and enhanced electron-phonon interactions as a function of the twist angle. However, the large system size of the resulting moir\'e patterns in these systems makes phonon calculations computationally challenging. In this paper, we present PARPHOM, a powerful code package designed to address these challenges. PARPHOM enables the generation of force constants, computation of phononic band structures, and determination of density of states in twisted 2D material systems. Moreover, PARPHOM provides essential routines to investigate the finite temperature dynamics in these systems and analyze the chirality of the phonon bands. This paper serves as an introduction to PARPHOM, highlighting its capabilities and demonstrating its utility in unraveling the intricate phononic properties of twisted 2D materials., Comment: 10 pages, 7 figures
- Published
- 2024
20. Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits
- Author
-
Jain, Adit, Pal, Soumyabrata, Choudhary, Sunav, Narayanam, Ramasuri, and Krishnamurthy, Vikram
- Subjects
Computer Science - Machine Learning - Abstract
This paper considers the problem of annotating datapoints using an expert with only a few annotation rounds in a label-scarce setting. We propose soliciting reliable feedback on difficulty in annotating a datapoint from the expert in addition to ground truth label. Existing literature in active learning or coreset selection turns out to be less relevant to our setting since they presume the existence of a reliable trained model, which is absent in the label-scarce regime. However, the literature on coreset selection emphasizes the presence of difficult data points in the training set to perform supervised learning in downstream tasks (Mindermann et al., 2022). Therefore, for a given fixed annotation budget of $\mathsf{T}$ rounds, we model the sequential decision-making problem of which (difficult) datapoints to choose for annotation in a sparse linear bandits framework with the constraint that no arm can be pulled more than once (blocking constraint). With mild assumptions on the datapoints, our (computationally efficient) Explore-Then-Commit algorithm BSLB achieves a regret guarantee of $\widetilde{\mathsf{O}}(k^{\frac{1}{3}} \mathsf{T}^{\frac{2}{3}} +k^{-\frac{1}{2}} \beta_k + k^{-\frac{1}{12}} \beta_k^{\frac{1}{2}}\mathsf{T}^{\frac{5}{6}})$ where the unknown parameter vector has tail magnitude $\beta_k$ at sparsity level $k$. To this end, we show offline statistical guarantees of Lasso estimator with mild Restricted Eigenvalue (RE) condition that is also robust to sparsity. Finally, we propose a meta-algorithm C-BSLB that does not need knowledge of the optimal sparsity parameters at a no-regret cost. We demonstrate the efficacy of our BSLB algorithm for annotation in the label-scarce setting for an image classification task on the PASCAL-VOC dataset, where we use real-world annotation difficulty scores., Comment: 31 Pages
- Published
- 2024
21. Collision Avoidance for Convex Primitives via Differentiable Optimization Based High-Order Control Barrier Functions
- Author
-
Wei, Shiqing, Khorrambakht, Rooholla, Krishnamurthy, Prashanth, Gonçalves, Vinicius Mariano, and Khorrami, Farshad
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Ensuring the safety of dynamical systems is crucial, where collision avoidance is a primary concern. Recently, control barrier functions (CBFs) have emerged as an effective method to integrate safety constraints into control synthesis through optimization techniques. However, challenges persist when dealing with convex primitives and tasks requiring torque control, as well as the occurrence of unintended equilibria. This work addresses these challenges by introducing a high-order CBF (HOCBF) framework for collision avoidance among convex primitives. We transform nonconvex safety constraints into linear constraints by differentiable optimization and prove the high-order continuous differentiability. Then, we employ HOCBFs to accommodate torque control, enabling tasks involving forces or high dynamics. Additionally, we analyze the issue of spurious equilibria in high-order cases and propose a circulation mechanism to prevent the undesired equilibria on the boundary of the safe set. Finally, we validate our framework with three experiments on the Franka Research 3 robotic manipulator, demonstrating successful collision avoidance and the efficacy of the circulation mechanism.
- Published
- 2024
22. LEO-based Positioning: Foundations, Signal Design, and Receiver Enhancements for 6G NTN
- Author
-
Dureppagari, Harish K., Saha, Chiranjib, Krishnamurthy, Harikumar, Wang, Xiao Feng, Rico-Alvariño, Alberto, Buehrer, R. Michael, and Dhillon, Harpreet S.
- Subjects
Computer Science - Information Theory ,Electrical Engineering and Systems Science - Signal Processing - Abstract
The integration of non-terrestrial networks (NTN) into 5G new radio (NR) has opened up the possibility of developing a new positioning infrastructure using NR signals from Low-Earth Orbit (LEO) satellites. LEO-based cellular positioning offers several advantages, such as a superior link budget, higher operating bandwidth, and large forthcoming constellations. Due to these factors, LEO-based positioning, navigation, and timing (PNT) is a potential enhancement for NTN in 6G cellular networks. However, extending the existing terrestrial cellular positioning methods to LEO-based NTN positioning requires considering key fundamental enhancements. These include creating broad positioning beams orthogonal to conventional communication beams, time-domain processing at the user equipment (UE) to resolve large delay and Doppler uncertainties, and efficiently accommodating positioning reference signals (PRS) from multiple satellites within the communication resource grid. In this paper, we present the first set of design insights by incorporating these enhancements and thoroughly evaluating LEO-based positioning, considering the constraints and capabilities of the NR-NTN physical layer. To evaluate the performance of LEO-based NTN positioning, we develop a comprehensive NR-compliant simulation framework, including LEO orbit simulation, transmission (Tx) and receiver (Rx) architectures, and a positioning engine incorporating the necessary enhancements. Our findings suggest that LEO-based NTN positioning could serve as a complementary infrastructure to existing Global Navigation Satellite Systems (GNSS) and, with appropriate enhancements, may also offer a viable alternative., Comment: 7 pages, 6 figures, submitted to IEEE Communications Magazine
- Published
- 2024
23. Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
- Author
-
Amortila, Philip, Foster, Dylan J., Jiang, Nan, Krishnamurthy, Akshay, and Mhammedi, Zakaria
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are poorly understood. This paper addresses the question of reinforcement learning under $\textit{general}$ latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions -- that is, reductions that transform an arbitrary algorithm for the latent MDP into an algorithm that can operate on rich observations -- in two settings: one where the agent has access to hindsight observations of the latent dynamics [LADZ23], and one where the agent can estimate self-predictive latent models [SAGHCB20]. Together, our results serve as a first step toward a unified statistical and algorithmic theory for reinforcement learning under latent dynamics.
- Published
- 2024
24. Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise
- Author
-
Yin, George and Krishnamurthy, Vikram
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
We analyze the finite sample regret of a decreasing step size stochastic gradient algorithm. We assume correlated noise and use a perturbed Lyapunov function as a systematic approach for the analysis. Finally we analyze the escape time of the iterates using large deviations theory.
- Published
- 2024
25. Slow Convergence of Interacting Kalman Filters in Word-of-Mouth Social Learning
- Author
-
Krishnamurthy, Vikram and Rojas, Cristian
- Subjects
Computer Science - Machine Learning ,Economics - Theoretical Economics ,Electrical Engineering and Systems Science - Signal Processing - Abstract
We consider word-of-mouth social learning involving $m$ Kalman filter agents that operate sequentially. The first Kalman filter receives the raw observations, while each subsequent Kalman filter receives a noisy measurement of the conditional mean of the previous Kalman filter. The prior is updated by the $m$-th Kalman filter. When $m=2$, and the observations are noisy measurements of a Gaussian random variable, the covariance goes to zero as $k^{-1/3}$ for $k$ observations, instead of $O(k^{-1})$ in the standard Kalman filter. In this paper we prove that for $m$ agents, the covariance decreases to zero as $k^{-(2^m-1)}$, i.e, the learning slows down exponentially with the number of agents. We also show that by artificially weighing the prior at each time, the learning rate can be made optimal as $k^{-1}$. The implication is that in word-of-mouth social learning, artificially re-weighing the prior can yield the optimal learning rate.
- Published
- 2024
26. OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
- Author
-
Devarakonda, Venkata Naren, Goswami, Raktim Gautam, Kaypak, Ali Umut, Patel, Naman, Khorrambakht, Rooholla, Krishnamurthy, Prashanth, and Khorrami, Farshad
- Subjects
Computer Science - Robotics - Abstract
Enabling robots to autonomously navigate unknown, complex, dynamic environments and perform diverse tasks remains a fundamental challenge in developing robust autonomous physical agents. These agents must effectively perceive their surroundings while leveraging world knowledge for decision-making. Although recent approaches utilize vision-language and large language models for scene understanding and planning, they often rely on offline processing, offboard compute, make simplifying assumptions about the environment and perception, limiting real-world applicability. We present a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines. Our system fuses data from multiple onboard sensors for localization and mapping and integrates it with open-vocabulary semantics to generate hierarchical scene graphs from continuously updated semantic object map. The LLM-based planner uses these graphs to create multi-step plans that guide low-level controllers in executing navigation tasks specified in natural language. The system's real-time operation enables the LLM to adjust its plans based on updates to the scene graph and task execution status, ensuring continuous adaptation to new situations or when the current plan cannot accomplish the task, a key advantage over static or rule-based systems. We demonstrate our system's efficacy on a quadruped navigating dynamic environments, showcasing its adaptability and robustness in diverse scenarios.
- Published
- 2024
27. Measuring and Improving Persuasiveness of Large Language Models
- Author
-
Singh, Somesh, Singla, Yaman K, SI, Harini, and Krishnamurthy, Balaji
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
LLMs are increasingly being used in workflows involving generating content to be consumed by humans (e.g., marketing) and also in directly interacting with humans (e.g., through chatbots). The development of such systems that are capable of generating verifiably persuasive messages presents both opportunities and challenges for society. On the one hand, such systems could positively impact domains like advertising and social good, such as addressing drug addiction, and on the other, they could be misused for spreading misinformation and shaping political opinions. To channel LLMs' impact on society, we need to develop systems to measure and benchmark their persuasiveness. With this motivation, we introduce PersuasionBench and PersuasionArena, the first large-scale benchmark and arena containing a battery of tasks to measure the persuasion ability of generative models automatically. We investigate to what extent LLMs know and leverage linguistic patterns that can help them generate more persuasive language. Our findings indicate that the persuasiveness of LLMs correlates positively with model size, but smaller models can also be made to have a higher persuasiveness than much larger models. Notably, targeted training using synthetic and natural datasets significantly enhances smaller models' persuasive capabilities, challenging scale-dependent assumptions. Our findings carry key implications for both model developers and policymakers. For instance, while the EU AI Act and California's SB-1047 aim to regulate AI models based on the number of floating point operations, we demonstrate that simple metrics like this alone fail to capture the full scope of AI's societal impact. We invite the community to explore and contribute to PersuasionArena and PersuasionBench, available at https://bit.ly/measure-persuasion, to advance our understanding of AI-driven persuasion and its societal implications.
- Published
- 2024
28. EMMA: Efficient Visual Alignment in Multi-Modal LLMs
- Author
-
Ghazanfari, Sara, Araujo, Alexandre, Krishnamurthy, Prashanth, Garg, Siddharth, and Khorrami, Farshad
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA
- Published
- 2024
29. FlashMix: Fast Map-Free LiDAR Localization via Feature Mixing and Contrastive-Constrained Accelerated Training
- Author
-
Goswami, Raktim Gautam, Patel, Naman, Krishnamurthy, Prashanth, and Khorrami, Farshad
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Map-free LiDAR localization systems accurately localize within known environments by predicting sensor position and orientation directly from raw point clouds, eliminating the need for large maps and descriptors. However, their long training times hinder rapid adaptation to new environments. To address this, we propose FlashMix, which uses a frozen, scene-agnostic backbone to extract local point descriptors, aggregated with an MLP mixer to predict sensor pose. A buffer of local descriptors is used to accelerate training by orders of magnitude, combined with metric learning or contrastive loss regularization of aggregated descriptors to improve performance and convergence. We evaluate FlashMix on various LiDAR localization benchmarks, examining different regularizations and aggregators, demonstrating its effectiveness for rapid and accurate LiDAR localization in real-world scenarios. The code is available at https://github.com/raktimgg/FlashMix.
- Published
- 2024
30. FlowBench: A Large Scale Benchmark for Flow Simulation over Complex Geometries
- Author
-
Tali, Ronak, Rabeh, Ali, Yang, Cheng-Hau, Shadkhah, Mehdi, Karki, Samundra, Upadhyaya, Abhisek, Dhakshinamoorthy, Suriya, Saadati, Marjan, Sarkar, Soumik, Krishnamurthy, Adarsh, Hegde, Chinmay, Balu, Aditya, and Ganapathysubramanian, Baskar
- Subjects
Physics - Fluid Dynamics ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Simulating fluid flow around arbitrary shapes is key to solving various engineering problems. However, simulating flow physics across complex geometries remains numerically challenging and computationally resource-intensive, particularly when using conventional PDE solvers. Machine learning methods offer attractive opportunities to create fast and adaptable PDE solvers. However, benchmark datasets to measure the performance of such methods are scarce, especially for flow physics across complex geometries. We introduce FlowBench, a dataset for neural simulators with over 10K samples, which is currently larger than any publicly available flow physics dataset. FlowBench contains flow simulation data across complex geometries (\textit{parametric vs. non-parametric}), spanning a range of flow conditions (\textit{Reynolds number and Grashoff number}), capturing a diverse array of flow phenomena (\textit{steady vs. transient; forced vs. free convection}), and for both 2D and 3D. FlowBench contains over 10K data samples, with each sample the outcome of a fully resolved, direct numerical simulation using a well-validated simulator framework designed for modeling transport phenomena in complex geometries. For each sample, we include velocity, pressure, and temperature field data at 3 different resolutions and several summary statistics features of engineering relevance (such as coefficients of lift and drag, and Nusselt numbers). %Additionally, we include masks and signed distance fields for each shape. We envision that FlowBench will enable evaluating the interplay between complex geometry, coupled flow phenomena, and data sufficiency on the performance of current, and future, neural PDE solvers. We enumerate several evaluation metrics to help rank order the performance of neural PDE solvers. We benchmark the performance of several baseline methods including FNO, CNO, WNO, and DeepONet.
- Published
- 2024
31. MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment
- Author
-
Devarakonda, Venkata Naren, Kaypak, Ali Umut, Yuan, Shuaihang, Krishnamurthy, Prashanth, Fang, Yi, and Khorrami, Farshad
- Subjects
Computer Science - Robotics - Abstract
LLMs have shown promising results in task planning due to their strong natural language understanding and reasoning capabilities. However, issues such as hallucinations, ambiguities in human instructions, environmental constraints, and limitations in the executing agent's capabilities often lead to flawed or incomplete plans. This paper proposes MultiTalk, an LLM-based task planning methodology that addresses these issues through a framework of introspective and extrospective dialogue loops. This approach helps ground generated plans in the context of the environment and the agent's capabilities, while also resolving uncertainties and ambiguities in the given task. These loops are enabled by specialized systems designed to extract and predict task-specific states, and flag mismatches or misalignments among the human user, the LLM agent, and the environment. Effective feedback pathways between these systems and the LLM planner foster meaningful dialogue. The efficacy of this methodology is demonstrated through its application to robotic manipulation tasks. Experiments and ablations highlight the robustness and reliability of our method, and comparisons with baselines further illustrate the superiority of MultiTalk in task planning for embodied agents., Comment: 7 pages, 3 figures
- Published
- 2024
32. EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges
- Author
-
Abramovich, Talor, Udeshi, Meet, Shao, Minghao, Lieret, Kilian, Xi, Haoran, Milner, Kimberly, Jancheska, Sofija, Yang, John, Jimenez, Carlos E., Khorrami, Farshad, Krishnamurthy, Prashanth, Dolan-Gavitt, Brendan, Shafique, Muhammad, Narasimhan, Karthik, Karri, Ramesh, and Press, Ofir
- Subjects
Computer Science - Artificial Intelligence - Abstract
Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. EnIGMA introduces new Agent-Computer Interfaces (ACIs) to improve the success rate on CTF challenges. We establish the novel Interactive Agent Tool concept, which enables LM agents to run interactive command-line utilities essential for these challenges. Empirical analysis of EnIGMA on over 350 CTF challenges from three different benchmarks indicates that providing a robust set of new tools with demonstration of their usage helps the LM solve complex problems and achieves state-of-the-art results on the NYU CTF and Intercode-CTF benchmarks. Finally, we discuss insights on ACI design and agent behavior on cybersecurity tasks that highlight the need to adapt real-world tools for LM agents.
- Published
- 2024
33. Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing
- Author
-
Snow, Luke and Krishnamurthy, Vikram
- Subjects
Computer Science - Machine Learning ,Computer Science - Multiagent Systems ,Electrical Engineering and Systems Science - Signal Processing - Abstract
We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.
- Published
- 2024
34. Combining Switching Mechanism with Re-Initialization and Anomaly Detection for Resiliency of Cyber-Physical Systems
- Author
-
Fu, Hao, Krishnamurthy, Prashanth, and Khorrami, Farshad
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Cyber-physical systems (CPS) play a pivotal role in numerous critical real-world applications that have stringent requirements for safety. To enhance the CPS resiliency against attacks, redundancy can be integrated in real-time controller implementations by designing strategies that switch among multiple controllers. However, existing switching strategies typically overlook remediation measures for compromised controllers, opting instead to simply exclude them. Such a solution reduces the CPS redundancy since only a subset of controllers are used. To address this gap, this work proposes a multi-controller switching strategy with periodic re-initialization to remove attacks. Controllers that finish re-initialization can be reused by the switching strategy, preserving the CPS redundancy and resiliency. The proposed switching strategy is designed to ensure that at each switching moment, a controller that has just completed re-initialization is available, minimizing the likelihood of compromise. Additionally, the controller's working period decreases with the number of involved controllers, reducing the controller's exposure time to attacks. An anomaly detector is used to detect CPS attacks during the controller's working period. Upon alarm activation, the current control signal is set to a predefined value, and a switch to an alternative controller occurs at the earliest switching moment. Our switching strategy is shown to be still effective even if the anomaly detector fails to detect (stealthy) attacks.
- Published
- 2024
35. HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
- Author
-
Bhat, Vineet, Krishnamurthy, Prashanth, Karri, Ramesh, and Khorrami, Farshad
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Robots interacting with humans through natural language can unlock numerous applications such as Referring Grasp Synthesis (RGS). Given a text query, RGS determines a stable grasp pose to manipulate the referred object in the robot's workspace. RGS comprises two steps: visual grounding and grasp pose estimation. Recent studies leverage powerful Vision-Language Models (VLMs) for visually grounding free-flowing natural language in real-world robotic execution. However, comparisons in complex, cluttered environments with multiple instances of the same object are lacking. This paper introduces HiFi-CS, featuring hierarchical application of Featurewise Linear Modulation (FiLM) to fuse image and text embeddings, enhancing visual grounding for complex attribute rich text queries encountered in robotic grasping. Visual grounding associates an object in 2D/3D space with natural language input and is studied in two scenarios: Closed and Open Vocabulary. HiFi-CS features a lightweight decoder combined with a frozen VLM and outperforms competitive baselines in closed vocabulary settings while being 100x smaller in size. Our model can effectively guide open-set object detectors like GroundedSAM to enhance open-vocabulary performance. We validate our approach through real-world RGS experiments using a 7-DOF robotic arm, achieving 90.33\% visual grounding accuracy in 15 tabletop scenes. We include our codebase in the supplementary material.
- Published
- 2024
36. Possibilities for enhanced electron-phonon interactions and high-$T_c$ superconductivity in engineered bimetallic nano-structured superlattices
- Author
-
Mandal, Shinjan, Soundararajan, Shrihari, Jain, Manish, and Krishnamurthy, H. R.
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science ,Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Superconductivity - Abstract
We explore theoretically the properties of engineered bimetallic nano-structured superlattices where an array of nano-clusters of a simple (single band) metal are embedded periodically inside another simple metal with a different work function. The exploration is done using a simplified tight-binding model with Coulomb interactions included, as well as density functional theory. Taking arrays of "Ag" clusters of fixed sizes and configurations (when unrelaxed) embedded periodically in an "Au" matrix as an example, we show that a significant enhancement of electron-phonon interactions ensues, implying possibilities for high-$T_c$ superconductivity. The enhancement stems from a strong coupling, via Coulomb interactions, between the dipolar charge distribution that forms at the Au-Ag interfaces and the breathing and other modes of vibration of the light Ag atoms caged inside the heavier Au matrix. The interface dipoles form because of the interplay between the mismatch of the local potential seen by the conduction electrons localised in Wannier orbitals at the Ag and Au sites (the Ag sites being slightly repulsive relative to the Au sites) and the (long-range) Coulomb repulsion between electrons occupying these Wannier orbitals. We also discuss the DC transport in such systems., Comment: 24 pages, 17 figures
- Published
- 2024
37. NanoFlow: Towards Optimal Large Language Model Serving Throughput
- Author
-
Zhu, Kan, Zhao, Yilong, Zhao, Liangyu, Zuo, Gefei, Gu, Yile, Xie, Dedong, Gao, Yufei, Xu, Qinyu, Tang, Tian, Ye, Zihao, Kamahori, Keisuke, Lin, Chien-Yu, Wang, Stephanie, Krishnamurthy, Arvind, and Kasikci, Baris
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The increasing usage of Large Language Models (LLMs) has resulted in a surging demand for planet-scale serving systems, where tens of thousands of GPUs continuously serve hundreds of millions of users. Consequently, throughput (under reasonable latency constraints) has emerged as a key metric that determines serving systems' performance. To boost throughput, various methods of inter-device parallelism (e.g., data, tensor, pipeline) have been explored. However, existing methods do not consider overlapping the utilization of different resources within a single device, leading to underutilization and sub-optimal performance. We propose NanoFlow, a novel serving framework that exploits intra-device parallelism, which overlaps the usage of resources including compute, memory, and network within a single device through operation co-scheduling. To exploit intra-device parallelism, NanoFlow introduces two key innovations: First, NanoFlow splits requests into nano-batches at the granularity of operations, which breaks the dependency of sequential operations in LLM inference and enables overlapping; then, to get benefit from overlapping, NanoFlow uses an operation-level pipeline with execution unit scheduling, which partitions the device's functional units and simultaneously executes different operations in each unit. NanoFlow automates the pipeline setup using a parameter search algorithm, which enables easily porting NanoFlow to different models. We implement NanoFlow on NVIDIA GPUs and evaluate end-to-end serving throughput on several popular models such as LLaMA-2-70B, Mixtral 8x7B, LLaMA-3-8B, etc.. With practical workloads, NanoFlow provides 1.91x throughput boost compared to state-of-the-art serving systems achieving 59% to 72% of optimal throughput across ported models.
- Published
- 2024
38. Wavelength-Agnostic Metasurface Design for Next-Generation 2D Photodetectors
- Author
-
Jamdar, Ayush M., Rituraj, Krishnamurthy, Srini, Bhallamudi, Vidya Praveen, and Krishnan, Sivarama
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
We explore a versatile technique for inverse designing 2D photonic crystal metasurfaces. These surfaces, known for their ability to manipulate light-matter interactions, can be precisely controlled to achieve specific functionalities. The key lies in efficiently optimizing the geometric patterns and dimensions of the metasurface. Through a composite method which exploits two well-established paradigms - Covariance Matrix Adaptation optimization and Rigorous Coupled Wave Analysis (RCWA), we demonstrate our ability to design and optimize resonances in metaelements to achieve desired optical performance such as near-perfect absorption at chosen wavelengths/optical modes, which otherwise proves to be challenging or even impossible with conventional inverse design implementations. We apply our method to design three-layered structures involving a monolayer absorber, transparent metasubstrate, and a back mirror to get near 100% absorption at one or two chosen wavelengths. For illustration, we choose black phosphorus and silicon metasurface to predict ~100% absorption in a monolayer at 1550 nm. The versatile technique can be applied to tailor reflectance and transmittance for any optical mode and wavelength. This computationally efficient design method paves the way for creating high-performance 2D metasurface-based devices with a variety of applications, including quantum technology components such as single photon sensors and biphoton sources, communication systems, and non-linear light conversion.
- Published
- 2024
39. Achieving the Tightest Relaxation of Sigmoids for Formal Verification
- Author
-
Chevalier, Samuel, Starkenburg, Duncan, and Dvijotham, Krishnamurthy
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
In the field of formal verification, Neural Networks (NNs) are typically reformulated into equivalent mathematical programs which are optimized over. To overcome the inherent non-convexity of these reformulations, convex relaxations of nonlinear activation functions are typically utilized. Common relaxations (i.e., static linear cuts) of "S-shaped" activation functions, however, can be overly loose, slowing down the overall verification process. In this paper, we derive tuneable hyperplanes which upper and lower bound the sigmoid activation function. When tuned in the dual space, these affine bounds smoothly rotate around the nonlinear manifold of the sigmoid activation function. This approach, termed $\alpha$-sig, allows us to tractably incorporate the tightest possible, element-wise convex relaxation of the sigmoid activation function into a formal verification framework. We embed these relaxations inside of large verification tasks and compare their performance to LiRPA and $\alpha$-CROWN, a state-of-the-art verification duo.
- Published
- 2024
40. In-Memory Learning Automata Architecture using Y-Flash Cell
- Author
-
Ghazal, Omar, Lan, Tian, Ojukwu, Shalman, Krishnamurthy, Komal, Yakovlev, Alex, and Shafik, Rishad
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Emerging Technologies ,Computer Science - Machine Learning - Abstract
The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a novel approach that utilizes floating-gate Y-Flash memristive devices manufactured with a standard 180 nm CMOS process. These devices offer attractive features, including analog tunability and moderate device-to-device variation; such characteristics are essential for reliable decision-making in ML applications. This paper uses a new machine learning algorithm, the Tsetlin Machine (TM), for in-memory processing architecture. The TM's learning element, Automaton, is mapped into a single Y-Flash cell, where the Automaton's range is transferred into the Y-Flash's conductance scope. Through comprehensive simulations, the proposed hardware implementation of the learning automata, particularly for Tsetlin machines, has demonstrated enhanced scalability and on-edge learning capabilities.
- Published
- 2024
41. 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs)
- Author
-
Rade, Jaydeep, Herron, Ethan, Sarkar, Soumik, Sarkar, Anwesha, and Krishnamurthy, Adarsh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in deep learning for predicting 3D protein structures have shown promise, particularly when leveraging inputs like protein sequences and Cryo-Electron microscopy (Cryo-EM) images. However, these techniques often fall short when predicting the structures of protein complexes (PCs), which involve multiple proteins. In our study, we investigate using atomic force microscopy (AFM) combined with deep learning to predict the 3D structures of PCs. AFM generates height maps that depict the PCs in various random orientations, providing a rich information for training a neural network to predict the 3D structures. We then employ the pre-trained UpFusion model (which utilizes a conditional diffusion model for synthesizing novel views) to train an instance-specific NeRF model for 3D reconstruction. The performance of UpFusion is evaluated through zero-shot predictions of 3D protein structures using AFM images. The challenge, however, lies in the time-intensive and impractical nature of collecting actual AFM images. To address this, we use a virtual AFM imaging process that transforms a `PDB' protein file into multi-view 2D virtual AFM images via volume rendering techniques. We extensively validate the UpFusion architecture using both virtual and actual multi-view AFM images. Our results include a comparison of structures predicted with varying numbers of views and different sets of views. This novel approach holds significant potential for enhancing the accuracy of protein complex structure predictions with further fine-tuning of the UpFusion network.
- Published
- 2024
42. AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs
- Author
-
Arshad, Muhammad Arbab, Jubery, Talukder Zaki, Roy, Tirtho, Nassiri, Rim, Singh, Asheesh K., Singh, Arti, Hegde, Chinmay, Ganapathysubramanian, Baskar, Balu, Aditya, Krishnamurthy, Adarsh, and Sarkar, Soumik
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Plant stress phenotyping traditionally relies on expert assessments and specialized models, limiting scalability in agriculture. Recent advances in multimodal large language models (LLMs) offer potential solutions to this challenge. We present AgEval, a benchmark comprising 12 diverse plant stress phenotyping tasks, to evaluate these models' capabilities. Our study assesses zero-shot and few-shot in-context learning performance of state-of-the-art models, including Claude, GPT, Gemini, and LLaVA. Results show significant performance improvements with few-shot learning, with F1 scores increasing from 46.24% to 73.37% in 8-shot identification for the best-performing model. Few-shot examples from other classes in the dataset have negligible or negative impacts, although having the exact category example helps to increase performance by 15.38%. We also quantify the consistency of model performance across different classes within each task, finding that the coefficient of variance (CV) ranges from 26.02% to 58.03% across models, implying that subject matter expertise is needed - of 'difficult' classes - to achieve reliability in performance. AgEval establishes baseline metrics for multimodal LLMs in agricultural applications, offering insights into their promise for enhancing plant stress phenotyping at scale. Benchmark and code can be accessed at: https://anonymous.4open.science/r/AgEval/
- Published
- 2024
43. New design paradigm for highly efficient and low noise photodetector
- Author
-
Chowdhury, Sagar, Rituraj, Krishnamurthy, Srini, and Bhallamudi, Vidya Praveen
- Subjects
Physics - Applied Physics ,Physics - Optics ,Quantum Physics - Abstract
Achieving high quantum efficiency (QE) with low dark count is essential for highly sensitive photodetectors (PDs), including single photon avalanche detectors (SPADs). However, high QE requires a thicker absorber region, which leads to high dark current and noise, which in turn affects the detectivity of PDs and the photodetection efficiency and dark count of SPADs.The holy grail of photodetector and avalanche photodiode designs is to achieve highest QE with thinnest absorber and still enable large avalanche to gain as needed. We have developed a new design paradigm which exploits the coupling between dielectric Mie resonance and transverse propagating waves in thin layers. The Mie resonance launches the incident light at an angle in an ultrathin absorber, and when coupled to transverse waves, the light propagates laterally and is fully absorbed owing to the longer optical path. Consequently, with appropriate choice of materials for a chosen wavelength, a high absorption(~90%) within typically <100 nm absorber thickness is possible. For illustration, we apply our approach to design Si-based detector operating at 810 nm and InGaAs-based detector operating at 1550 nm and predict that the dark current at room temperature is reduced at least by two orders of magnitude. In addition, the lateral distances are often in a few microns and hence these designs can potentially enable avalanching for a large optical gain., Comment: 6 figures
- Published
- 2024
44. Few-Shot Transfer Learning for Individualized Braking Intent Detection on Neuromorphic Hardware
- Author
-
Lutes, Nathan, Nadendla, Venkata Sriram Siddhardh, and Krishnamurthy, K.
- Subjects
Computer Science - Neural and Evolutionary Computing ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Objective: This work explores use of a few-shot transfer learning method to train and implement a convolutional spiking neural network (CSNN) on a BrainChip Akida AKD1000 neuromorphic system-on-chip for developing individual-level, instead of traditionally used group-level, models using electroencephalographic data. Main Results: Efficacy of the above methodology to develop individual-specific braking intention predictive models by rapidly adapting the group-level model in as few as three training epochs while achieving at least 90% accuracy, true positive rate and true negative rate is presented. Further, results show the energy-efficiency of the neuromorphic hardware through a power reduction of over 97% with only a $1.3* increase in latency when using the Akida AKD1000 processor for network inference compared to an Intel Xeon central processing unit. Similar results were obtained in a subsequent ablation study using a subset of five out of 19 channels., Comment: Journal of NeuroEngineering Submission
- Published
- 2024
45. Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
- Author
-
Huang, Audrey, Zhan, Wenhao, Xie, Tengyang, Lee, Jason D., Sun, Wen, Krishnamurthy, Akshay, and Foster, Dylan J.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model plateaus or degrades over the course of the alignment process. Overoptimization is often attributed to overfitting to an inaccurate reward model, and while it can be mitigated through online data collection, this is infeasible in many settings. This raises a fundamental question: Do existing offline alignment algorithms make the most of the data they have, or can their sample-efficiency be improved further? We address this question with a new algorithm for offline alignment, $\chi^2$-Preference Optimization ($\chi$PO). $\chi$PO is a one-line change to Direct Preference Optimization (DPO; Rafailov et al., 2023), which only involves modifying the logarithmic link function in the DPO objective. Despite this minimal change, $\chi$PO implicitly implements the principle of pessimism in the face of uncertainty via regularization with the $\chi^2$-divergence -- which quantifies uncertainty more effectively than KL-regularization -- and provably alleviates overoptimization, achieving sample-complexity guarantees based on single-policy concentrability -- the gold standard in offline reinforcement learning. $\chi$PO's simplicity and strong guarantees make it the first practical and general-purpose offline alignment algorithm that is provably robust to overoptimization.
- Published
- 2024
46. SENTAUR: Security EnhaNced Trojan Assessment Using LLMs Against Undesirable Revisions
- Author
-
Bhandari, Jitendra, Sadhukhan, Rajat, Krishnamurthy, Prashanth, Khorrami, Farshad, and Karri, Ramesh
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture - Abstract
A globally distributed IC supply chain brings risks due to untrusted third parties. The risks span inadvertent use of hardware Trojan (HT), inserted Intellectual Property (3P-IP) or Electronic Design Automation (EDA) flows. HT can introduce stealthy HT behavior, prevent an IC work as intended, or leak sensitive data via side channels. To counter HTs, rapidly examining HT scenarios is a key requirement. While Trust-Hub benchmarks are a good starting point to assess defenses, they encompass a small subset of manually created HTs within the expanse of HT designs. Further, the HTs may disappear during synthesis. We propose a large language model (LLM) framework SENTAUR to generate a suite of legitimate HTs for a Register Transfer Level (RTL) design by learning its specifications, descriptions, and natural language descriptions of HT effects. Existing tools and benchmarks are limited; they need a learning period to construct an ML model to mimic the threat model and are difficult to reproduce. SENTAUR can swiftly produce HT instances by leveraging LLMs without any learning period and sanitizing the HTs facilitating their rapid assessment. Evaluation of SENTAUR involved generating effective, synthesizable, and practical HTs from TrustHub and elsewhere, investigating impacts of payloads/triggers at the RTL. While our evaluation focused on HT insertion, SENTAUR can generalize to automatically transform an RTL code to have defined functional modifications.
- Published
- 2024
47. SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition
- Author
-
Goswami, Raktim Gautam, Patel, Naman, Krishnamurthy, Prashanth, and Khorrami, Farshad
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.
- Published
- 2024
48. Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing
- Author
-
Jignasu, Anushrut, Marshall, Kelly O., Mishra, Ankush Kumar, Rillo, Lucas Nerone, Ganapathysubramanian, Baskar, Balu, Aditya, Hegde, Chinmay, and Krishnamurthy, Adarsh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
G-code (Geometric code) or RS-274 is the most widely used computer numerical control (CNC) and 3D printing programming language. G-code provides machine instructions for the movement of the 3D printer, especially for the nozzle, stage, and extrusion of material for extrusion-based additive manufacturing. Currently there does not exist a large repository of curated CAD models along with their corresponding G-code files for additive manufacturing. To address this issue, we present SLICE-100K, a first-of-its-kind dataset of over 100,000 G-code files, along with their tessellated CAD model, LVIS (Large Vocabulary Instance Segmentation) categories, geometric properties, and renderings. We build our dataset from triangulated meshes derived from Objaverse-XL and Thingi10K datasets. We demonstrate the utility of this dataset by finetuning GPT-2 on a subset of the dataset for G-code translation from a legacy G-code format (Sailfish) to a more modern, widely used format (Marlin). SLICE-100K will be the first step in developing a multimodal foundation model for digital manufacturing., Comment: Replaced "SLICE-100K" with "Slice-100K", added acknowledgements, and updated main figure to better capture shadows
- Published
- 2024
49. Adversarial Robustness of VAEs across Intersectional Subgroups
- Author
-
Ramanaik, Chethan Krishnamurthy, Roy, Arjun, and Ntoutsi, Eirini
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Despite advancements in Autoencoders (AEs) for tasks like dimensionality reduction, representation learning and data generation, they remain vulnerable to adversarial attacks. Variational Autoencoders (VAEs), with their probabilistic approach to disentangling latent spaces, show stronger resistance to such perturbations compared to deterministic AEs; however, their resilience against adversarial inputs is still a concern. This study evaluates the robustness of VAEs against non-targeted adversarial attacks by optimizing minimal sample-specific perturbations to cause maximal damage across diverse demographic subgroups (combinations of age and gender). We investigate two questions: whether there are robustness disparities among subgroups, and what factors contribute to these disparities, such as data scarcity and representation entanglement. Our findings reveal that robustness disparities exist but are not always correlated with the size of the subgroup. By using downstream gender and age classifiers and examining latent embeddings, we highlight the vulnerability of subgroups like older women, who are prone to misclassification due to adversarial perturbations pushing their representations toward those of other subgroups.
- Published
- 2024
50. A study on the performance of Agro based stocks-An Evidence from select listed companies
- Author
-
Krishnamurthy, A. and Suresha, B.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.