428,707 results on '"Krishna, A."'
Search Results
2. Influence of integrated nutrient management practices on yield and economics of fodders in custard apple based horti-pastoral system
- Author
-
Jhonsonraju, S., Krishna, A., Madhavilata, A., Chaitanya, T., and Sekhar, Ch. Chandra
- Published
- 2023
- Full Text
- View/download PDF
3. Variation in seed germination of Dysoxylum binectariferum: An endangered medicinal tree species, from different Indian seed sources
- Author
-
Hosur, Suraj R., Krishna, A., Jagadish, M.R., and Vasudeva, R.
- Published
- 2023
- Full Text
- View/download PDF
4. Primary Beam Chromaticity in HIRAX: I. Characterization from Simulations and Power Spectrum Implications
- Author
-
Sampath, Ajith, Crichton, Devin, Moodley, Kavilan, Chiang, H. Cynthia, Acedo, Eloy De Lera, Dlamini, Simthembile, Gaddam, Sindhu, Gerodias, Kit M., Gueuning, Quentin, Gupta, N., Hitz, Pascal, Madhusudhan, Aditya Krishna Karigiri, Krishna, Shreyam Parth, Mugundhan, V., Retana-Montenegro, Edwin, Saliwanchik, Benjamin R. B., Santos, Mario G., and Walters, Anthony
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
The Hydrogen Intensity and Real-time Analysis eXperiment (HIRAX) is an upcoming radio interferometric telescope designed to constrain dark energy through the 21cm intensity mapping of Baryon Acoustic Oscillations (BAO). Instrumental systematics must be controlled and carefully characterized to measure the 21cm power spectrum with fidelity and achieve high-precision constraints on the cosmological parameters. The chromaticity of the primary beam is one such complicated systematic, which can leak the power of spectrally smooth foregrounds beyond the ideal horizon limits due to the complex spatial and spectral structures of the sidelobes and the mainlobe. This paper studies the chromaticity of the HIRAX Stokes I primary beam and its effects on accurate measurements of the 21cm power spectrum. To investigate the effect of chromaticity in the 21cm power spectrum, we present a physically motivated beam modeling technique, which uses a flexible basis derived from traditional optics that can account for higher-order radial and azimuthal structures in the primary beam. We investigate the impact of imperfect knowledge of the mainlobe and sidelobes chromaticity in the power spectrum space by subtracting a simple foreground model in simulated snapshot visibilities to recover the H$\textsc{i}$ power spectrum. Additionally, we find that modeling up to the octupolar azimuthal order feature (fourth-order angular variation) in the primary beam is sufficient to reduce the leakage outside the wedge with minimal bias., Comment: 20 pages, 12 figures. Prepared for submitting in the Astrophysical Journal (ApJ)
- Published
- 2024
5. Monitoring Krishna Flows in upper Krishna Basin to Forecast Reservoir Inflows Down Stream
- Author
-
Kumar, Tvnar, Venugopal, K., Krishna, A. Radha, and Srinivasu, N.
- Published
- 2021
6. Evidence for Local Symmetry Breaking in the Skyrmion-Hosting Ni2In-type Hexagonal Compounds
- Author
-
Singh, Anupam K., Singh, Sanjay, Dubey, Krishna K., Devi, Parul, Das, Pritam, Etter, Martin, Grendal, Ola. G., Dejoie, Catherine, Fitch, Andrew, Senyshyn, Anatoliy, Lee, Seung-Cheol, Bhattacharjee, Satadeep, and Pandey, Dhananjai
- Subjects
Condensed Matter - Materials Science - Abstract
Dzyaloshinskii-Moriya interaction (DMI) plays a crucial role to stabilize the exotic topologically stable skyrmion spin-textures in the noncentrosymmetric crystals. The recent discovery of biskyrmions and skyrmions in the globally centrosymmetric crystals has raised debate about the role of the DMI in causing the spin textures, since DMI vanishes in such crystal structures. Theoretical studies, on the other hand, suggest non-vanishing DMI even if there is local inversion symmetry breaking in an otherwise globally centrosymmetric crystal structure. Motivated by such theoretical predictions, we present here the results of a systematic crystal structure study of two skyrmion-hosting Ni2In-type centrosymmetric hexagonal compounds, MnNiGa and MnPtGa, using the atomic pair distribution function (PDF) technique. Our result provides information about structural correlations in the short-range (SR), medium-range (MR) and long-range (LR) regimes simultaneously. The analysis of the experimental PDFs, obtained from high flux, high energy and high-Q synchrotron x-ray powder diffraction patterns, reveal that the local SR structure of both MnNiGa and MnPtGa compounds corresponds to the noncentrosymmetric trigonal space group P3m1, while the structure in the MR+LR regimes remains hexagonal in the centrosymmetric P63/mmc space group. These findings are also supported by theoretical DFT calculations. Our results in conjunction with the previous theoretical predictions, provide a rationale for the genesis of skyrmions in centrosymmetric materials in terms of non-vanishing DMI due to local inversion symmetry breaking. We believe that our findings would encourage a systematic search of skyrmionic textures and other topological phenomena in a vast family of centrosymmetric materials.
- Published
- 2024
7. VisionArena: 230K Real World User-VLM Conversations with Preference Labels
- Author
-
Chou, Christopher, Dunlap, Lisa, Mashita, Koki, Mandal, Krishna, Darrell, Trevor, Stoica, Ion, Gonzalez, Joseph E., and Chiang, Wei-Lin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
With the growing adoption and capabilities of vision-language models (VLMs) comes the need for benchmarks that capture authentic user-VLM interactions. In response, we create VisionArena, a dataset of 230K real-world conversations between users and VLMs. Collected from Chatbot Arena - an open-source platform where users interact with VLMs and submit preference votes - VisionArena spans 73K unique users, 45 VLMs, and 138 languages. Our dataset contains three subsets: VisionArena-Chat, 200k single and multi-turn conversations between a user and a VLM; VisionArena-Battle, 30K conversations comparing two anonymous VLMs with user preference votes; and VisionArena-Bench, an automatic benchmark of 500 diverse user prompts that efficiently approximate the live Chatbot Arena model rankings. Additionally, we highlight the types of question asked by users, the influence of response style on preference, and areas where models often fail. We find open-ended tasks like captioning and humor are highly style-dependent, and current VLMs struggle with spatial reasoning and planning tasks. Lastly, we show finetuning the same base model on VisionArena-Chat outperforms Llava-Instruct-158K, with a 17-point gain on MMMU and a 46-point gain on the WildVision benchmark. Dataset at https://huggingface.co/lmarena-ai
- Published
- 2024
8. A stress tensor for asymptotically flat spacetime
- Author
-
Bhambure, Jay and Krishna, Hare
- Subjects
High Energy Physics - Theory ,General Relativity and Quantum Cosmology - Abstract
In this article, we propose a procedure for calculating the boundary stress tensor of a gravitational theory in asymptotic flat spacetime. As a case study, the stress tensor correctly reproduces the Brown-York charges for the Kerr blackhole i.e. mass and angular momentum. In asymptotic flat spacetime, there are asymptotic symmetries called BMS symmetries. We also compute the charges associated with these symmetries with the proposed stress tensor. The asymptotic charges can be compared with the Wald-Zoupas method. Our result for the stress tensor can be interpreted as the expectation value for the boundary stress tensor., Comment: 25 pages
- Published
- 2024
9. TURBOATTENTION: Efficient Attention Approximation For High Throughputs LLMs
- Author
-
Kang, Hao, Bharadwaj, Srikant, Hensman, James, Krishna, Tushar, Ruhle, Victor, and Rajmohan, Saravan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture - Abstract
Large language model (LLM) inference demands significant amount of computation and memory, especially in the key attention mechanism. While techniques, such as quantization and acceleration algorithms, like FlashAttention, have improved efficiency of the overall inference, they address different aspects of the problem: quantization focuses on weight-activation operations, while FlashAttention improves execution but requires high-precision formats. Recent Key-value (KV) cache quantization reduces memory bandwidth but still needs floating-point dequantization for attention operation. We present TurboAttention, a comprehensive approach to enable quantized execution of attention that simultaneously addresses both memory and computational efficiency. Our solution introduces two key innovations: FlashQ, a headwise attention quantization technique that enables both compression of KV cache and quantized execution of activation-activation multiplication, and Sparsity-based Softmax Approximation (SAS), which eliminates the need for dequantization to FP32 during exponentiation operation in attention. Experimental results demonstrate that TurboAttention achieves 1.2-1.8x speedup in attention, reduces the KV cache size by over 4.4x, and enables up to 2.37x maximum throughput over the FP16 baseline while outperforming state-of-the-art quantization and compression techniques across various datasets and models.
- Published
- 2024
10. Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming
- Author
-
Gao, Ziqi, Huang, Weikai, Zhang, Jieyu, Kembhavi, Aniruddha, and Krishna, Ranjay
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
DALL-E and Sora have gained attention by producing implausible images, such as "astronauts riding a horse in space." Despite the proliferation of text-to-vision models that have inundated the internet with synthetic visuals, from images to 3D assets, current benchmarks predominantly evaluate these models on real-world scenes paired with captions. We introduce Generate Any Scene, a framework that systematically enumerates scene graphs representing a vast array of visual scenes, spanning realistic to imaginative compositions. Generate Any Scene leverages 'scene graph programming', a method for dynamically constructing scene graphs of varying complexity from a structured taxonomy of visual elements. This taxonomy includes numerous objects, attributes, and relations, enabling the synthesis of an almost infinite variety of scene graphs. Using these structured representations, Generate Any Scene translates each scene graph into a caption, enabling scalable evaluation of text-to-vision models through standard metrics. We conduct extensive evaluations across multiple text-to-image, text-to-video, and text-to-3D models, presenting key findings on model performance. We find that DiT-backbone text-to-image models align more closely with input captions than UNet-backbone models. Text-to-video models struggle with balancing dynamics and consistency, while both text-to-video and text-to-3D models show notable gaps in human preference alignment. We demonstrate the effectiveness of Generate Any Scene by conducting three practical applications leveraging captions generated by Generate Any Scene: 1) a self-improving framework where models iteratively enhance their performance using generated data, 2) a distillation process to transfer specific strengths from proprietary models to open-source counterparts, and 3) improvements in content moderation by identifying and generating challenging synthetic data.
- Published
- 2024
11. Relative hyperbolicity of ascending HNN extension of groups
- Author
-
Krishna, Swathi
- Subjects
Mathematics - Group Theory ,Mathematics - Geometric Topology ,20F65 - Abstract
We prove that for a finitely generated group G with a free factor system and an injective endomorphism that preserves the free factor system, the ascending HNN extension of G is hyperbolic relative to a collection of maximal parabolic subgroups. As a corollary, we see that if an injective endomorphism of a finite rank free group F is exponentially growing, the ascending HNN extension of F is relatively hyperbolic., Comment: 19 pages
- Published
- 2024
12. SAT: Spatial Aptitude Training for Multimodal Language Models
- Author
-
Ray, Arijit, Duan, Jiafei, Tan, Reuben, Bashkirova, Dina, Hendrix, Rose, Ehsani, Kiana, Kembhavi, Aniruddha, Plummer, Bryan A., Krishna, Ranjay, Zeng, Kuo-Hao, and Saenko, Kate
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Graphics ,Computer Science - Robotics - Abstract
Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spatial reasoning, such as categorizing the relative positions of objects. Meanwhile, real-world deployment requires dynamic capabilities like perspective-taking and egocentric action recognition. As a roadmap to improving spatial intelligence, we introduce SAT, Spatial Aptitude Training, which goes beyond static relative object position questions to the more dynamic tasks. SAT contains 218K question-answer pairs for 22K synthetic scenes across a training and testing set. Generated using a photo-realistic physics engine, our dataset can be arbitrarily scaled and easily extended to new actions, scenes, and 3D assets. We find that even MLMs that perform relatively well on static questions struggle to accurately answer dynamic spatial questions. Further, we show that SAT instruction-tuning data improves not only dynamic spatial reasoning on SAT, but also zero-shot performance on existing real-image spatial benchmarks: $23\%$ on CVBench, $8\%$ on the harder BLINK benchmark, and $18\%$ on VSR. When instruction-tuned on SAT, our 13B model matches larger proprietary MLMs like GPT4-V and Gemini-3-1.0 in spatial reasoning. Our data/code is available at http://arijitray1993.github.io/SAT/ ., Comment: Project webpage: http://arijitray1993.github.io/SAT/
- Published
- 2024
13. A hybrid Finite Element and Material Point Method for modeling liquefaction-induced tailings dam failures
- Author
-
Sordo, Brent, Rathje, Ellen, and Kumar, Krishna
- Subjects
Physics - Geophysics - Abstract
This paper presents a hybrid Finite Element Method (FEM) and Material Point Method (MPM) approach for modeling liquefaction-induced tailings dam failures from initiation through runout. We apply this method to simulate the 1978 Mochikoshi tailings dam failure, which occurred due to seismic loading and liquefaction during an earthquake. Our approach leverages FEM to capture the initial failure mechanism and MPM to simulate the subsequent runout, exploiting the strength of each method in their respective phases of the failure process. We investigate the impact of the FEM-to-MPM transfer time on runout results, identifying an optimal transfer window. This window begins when liquefaction reaches a critical depth to fully trigger the failure and ends before excessive mesh deformation occurs. Our findings demonstrate that the properties of the liquefied tailings significantly influence runout predictions. Notably, we achieve runout distances comparable to the case history only when incorporating additional strain-softening beyond the initial liquefaction-induced strength reduction. Our results demonstrate that the hybrid FEM-MPM method effectively models tailings dam failures associated with complex failure mechanisms and large runouts. This approach offers a promising tool for predicting the runout of seismic liquefaction-induced tailings dam failures, improving risk assessment and mitigation strategies in tailings dam management.
- Published
- 2024
14. PAFFA: Premeditated Actions For Fast Agents
- Author
-
Krishna, Shambhavi, Chen, Zheng, Kumar, Vaibhav, Huang, Xiaojiang, Li, Yingjie, Yang, Fan, and Li, Xiang
- Subjects
Computer Science - Artificial Intelligence - Abstract
Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. To overcome these challenges, we introduce PAFFA (Premeditated Actions For Fast Agents), a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions. By pre-computing interaction patterns and employing two core methodologies - "Dist-Map" for task-agnostic element distillation and "Unravel" for incremental page-wise exploration - PAFFA reduces inference calls by 87% while maintaining robust performance even as website structures evolve. This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research., Comment: 9 pages
- Published
- 2024
15. Maya: An Instruction Finetuned Multilingual Multimodal Model
- Author
-
Alam, Nahid, Kanjula, Karthik Reddy, Guthikonda, Surya, Chung, Timothy, Vegesna, Bala Krishna S, Das, Abhipsha, Susevski, Anthony, Chan, Ryan Sze-Yin, Uddin, S M Iftekhar, Islam, Shayekh Bin, Santhosh, Roshan, A, Snegha, Sharma, Drishti, Liu, Chen, Chaturvedi, Isha, Winata, Genta Indra, S, Ashvanth., Mukherjee, Snehanshu, and Aji, Alham Fikri
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.
- Published
- 2024
16. ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
- Author
-
Zhang, Jieyu, Xue, Le, Song, Linxin, Wang, Jun, Huang, Weikai, Shu, Manli, Yan, An, Ma, Zixian, Niebles, Juan Carlos, savarese, silvio, Xiong, Caiming, Chen, Zeyuan, Krishna, Ranjay, and Xu, Ran
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
With the rise of multimodal applications, instruction data has become critical for training multimodal language models capable of understanding complex image-based queries. Existing practices rely on powerful but costly large language models (LLMs) or multimodal language models (MLMs) to produce instruction data. These are often prone to hallucinations, licensing issues and the generation process is often hard to scale and interpret. In this work, we present a programmatic approach that employs scene graphs as symbolic representations of images and human-written programs to systematically synthesize vision-centric instruction data. Our approach ensures the interpretability and controllability of the data generation process and scales efficiently while maintaining factual accuracy. By implementing a suite of 24 single-image, 14 multi-image instruction generators, and a scene graph generation pipeline, we build a scalable, cost-effective system: ProVision which produces diverse question-answer pairs concerning objects, attributes, relations, depth, etc., for any given image. Applied to Visual Genome and DataComp datasets, we generate over 10 million instruction data points, ProVision-10M, and leverage them in both pretraining and instruction tuning stages of MLMs. When adopted in the instruction tuning stage, our single-image instruction data yields up to a 7% improvement on the 2D split and 8% on the 3D split of CVBench, along with a 3% increase in performance on QBench2, RealWorldQA, and MMMU. Our multi-image instruction data leads to an 8% improvement on Mantis-Eval. Incorporation of our data in both pre-training and fine-tuning stages of xGen-MM-4B leads to an averaged improvement of 1.6% across 11 benchmarks., Comment: code: https://github.com/JieyuZ2/ProVision dataset: https://huggingface.co/datasets/Salesforce/ProVision-10M
- Published
- 2024
17. Enhancing Fenton-like Photo-degradation and Electrocatalytic Oxygen Evolution Reaction (OER) in Fe-doped Copper Oxide (CuO) Catalysts
- Author
-
Baral, Suresh Chandra, Sasmal, Dilip, Datta, Sayak, Ram, Mange, Haldar, Krishna Kanta, Mekki, A., and Sen, Somaditya
- Subjects
Physics - Applied Physics ,Condensed Matter - Materials Science ,Physics - Chemical Physics - Abstract
Although hydrogen generation by water electrolysis is the cheapest of all other available sources, water splitting still occurs with sluggish kinetics. It is a challenging barrier for H2 production on a large scale. Moreover, research is still underway to understand the oxygen evolution reaction (OER) and design the catalysts with improved OER performance. Herein, we report the synthesis, characterization, and OER performance of iron-doped copper oxide (CuO) as low-cost catalysts for water oxidation. The OER occurs at about 1.49 V versus the RHE with a Tafel slope of 69 mV/dec in a 1 M KOH solution. The overpotential of 338 mV at 10 mA/cm2 is among the lowest compared with other copper-based materials. The catalyst can deliver a stable current density of >10 mA/cm2 for more than 10 hours. Additionally, wastewater treatment, particularly synthetic dye wastewater, is vital for preventing water scarcity and adverse effects on human health and ecotoxicology. The as-synthesized catalysts are also utilized for Fenton-like photo-degradation under low-power visible household LED lights toward the most commonly industrially used simulated Methylene blue dye wastewater. Almost complete degradation of the MB dye has been achieved within 50 minutes of visible light irradiation with a first-order rate constant of 0.0973/min. This dual functionality feature can open new pathways as a non-noble, highly efficient, and robust catalyst for OER and wastewater treatments.
- Published
- 2024
18. TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
- Author
-
Ma, Zixian, Zhang, Jianguo, Liu, Zhiwei, Zhang, Jieyu, Tan, Juntao, Shu, Manli, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Xiong, Caiming, Krishna, Ranjay, and Savarese, Silvio
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
While open-source multi-modal language models perform well on simple question answering tasks, they often fail on complex questions that require multiple capabilities, such as fine-grained recognition, visual grounding, and reasoning, and that demand multi-step solutions. We present TACO, a family of multi-modal large action models designed to improve performance on such complex, multi-step, and multi-modal tasks. During inference, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator, then integrates both the thoughts and action outputs to produce coherent responses. To train TACO, we create a large dataset of over 1M synthetic CoTA traces generated with GPT-4o and Python programs. We then experiment with various data filtering and mixing techniques and obtain a final subset of 293K high-quality CoTA examples. This dataset enables TACO to learn complex reasoning and action paths, surpassing existing models trained on instruction tuning data with only direct answers. Our model TACO outperforms the instruction-tuned baseline across 8 benchmarks, achieving a 3.6% improvement on average, with gains of up to 15% in MMVet tasks involving OCR, mathematical reasoning, and spatial reasoning. Training on high-quality CoTA traces sets a new standard for complex multi-modal reasoning, highlighting the need for structured, multi-step instruction tuning in advancing open-source mutli-modal models' capabilities.
- Published
- 2024
19. A Multi-physics Model of Flow from Coronary Angiography: Insights into Microvascular Function
- Author
-
Yang, Haizhou, Zhang, Jiyang, Assi, Ismael, Nallamothu, Brahmajee K, Garikipati, Krishna, and Figueroa, C. Alberto
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
Coronary Artery Disease (CAD) and Coronary Microvascular Disease (CMD) can lead to insufficient blood flow to the myocardium, affecting millions of people globally. Coronary angiography, one of the most commonly used imaging modalities, offers valuable information that assists in diagnosing these diseases. However, these benefits are not fully understood or utilized in current clinical practice. In this study, a 3D-0D coupled multi-physics computational fluid dynamics (CFD) model was developed and calibrated to simulate and better understand the process of contrast injection and washout during clinical angiography. A contrast intensity profile (CIP) was introduced to capture the dynamics of coronary angiography data. Additionally, a sensitivity study was conducted to assess the influence of various coronary artery model parameters on CIP. The results demonstrate that the calibrated 3D-0D coupled multi-physics models are physiologically meaningful and produce accurate hemodynamic results. The sensitivity study further reveals that resistance has a greater impact on CIP than capacitance, with higher resistance amplifying this effect., Comment: 21 pages, 12 figures
- Published
- 2024
20. Nonlocality-Assisted Enhancement of Error-Free Communication in Noisy Classical Channels
- Author
-
Agarwal, Kunika, Naik, Sahil Gopalkrishna, Chakraborty, Ananya, Sen, Samrat, Ghosal, Pratik, Paul, Biswajit, Banik, Manik, and Patra, Ram Krishna
- Subjects
Quantum Physics - Abstract
The zero-error capacity of a noisy classical channel quantifies its ability to transmit information with absolute certainty, i.e., without any error. Unlike Shannon's standard channel capacity, which remains unaffected by pre-shared correlations, zero-error capacity can be enhanced through nonlocal correlations. In this work, we investigate zero-error communication utility of such correlations arising in the 2-2-m Bell scenario, where two parties have two inputs and m possible outcomes per input. For all m\geq2, we construct examples of noisy classical channels with zero zero-error capacity that, when assisted by extremal 2-2-m nonlocal correlations, can transmit one bit of information. While nonlocal correlations arising from quantum entangled states cannot achieve a positive zero-error capacity for these channels, they significantly enhance the probability of successfully transmitting a classical bit in a single use. Extending this analysis to the 2-m-2 Bell scenario, we identify channels with zero zero-error capacity that can nonetheless perfectly transmit log m bits of information when assisted by corresponding extremal nonlocal correlations. Our findings underscore the versatile utility of Bell nonlocal correlations in achieving zero-error communication., Comment: 4.5 + 11 pages; Comments are welcome
- Published
- 2024
21. NVILA: Efficient Frontier Visual Language Models
- Author
-
Liu, Zhijian, Zhu, Ligeng, Shi, Baifeng, Zhang, Zhuoyang, Lou, Yuming, Yang, Shang, Xi, Haocheng, Cao, Shiyi, Gu, Yuxian, Li, Dacheng, Li, Xiuyu, Fang, Yunhao, Chen, Yukang, Hsieh, Cheng-Yu, Huang, De-An, Cheng, An-Chieh, Nath, Vishwesh, Hu, Jinyi, Liu, Sifei, Krishna, Ranjay, Xu, Daguang, Wang, Xiaolong, Molchanov, Pavlo, Kautz, Jan, Yin, Hongxu, Han, Song, and Lu, Yao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This "scale-then-compress" approach enables NVILA to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of NVILA throughout its entire lifecycle, from training and fine-tuning to deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 4.5X, fine-tuning memory usage by 3.4X, pre-filling latency by 1.6-2.2X, and decoding latency by 1.2-2.8X. We will soon make our code and models available to facilitate reproducibility.
- Published
- 2024
22. Delay-Doppler Signal Processing with Zadoff-Chu Sequences
- Author
-
Mattu, Sandesh Rao, Khan, Imran Ali, Khammammetti, Venkatesh, Dabak, Beyza, Mohammed, Saif Khan, Narayanan, Krishna, and Calderbank, Robert
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Information Theory - Abstract
Much of the engineering behind current wireless systems has focused on designing an efficient and high-throughput downlink to support human-centric communication such as video streaming and internet browsing. This paper looks ahead to design of the uplink, anticipating the emergence of machine-type communication (MTC) and the confluence of sensing, communication, and distributed learning. We demonstrate that grant-free multiple access is possible even in the presence of highly time-varying channels. Our approach provides a pathway to standards adoption, since it is built on enhancing the 2-step random access procedure which is already part of the 5GNR standard. This 2-step procedure uses Zadoff-Chu (ZC) sequences as preambles that point to radio resources which are then used to upload data. We also use ZC sequences as preambles / pilots, but we process signals in the Delay-Doppler (DD) domain rather than the time-domain. We demonstrate that it is possible to detect multiple preambles in the presence of mobility and delay spread using a receiver with no knowledge of the channel other than the worst case delay and Doppler spreads. Our approach depends on the mathematical properties of ZC sequences in the DD domain. We derive a closed form expression for ZC pilots in the DD domain, we characterize the possible self-ambiguity functions, and we determine the magnitude of the possible cross-ambiguity functions. These mathematical properties enable detection of multiple pilots through solution of a compressed sensing problem. The columns of the compressed sensing matrix are the translates of individual ZC pilots in delay and Doppler. We show that columns in the design matrix satisfy a coherence property that makes it possible to detect multiple preambles in a single Zak-OTFS subframe using One-Step Thresholding (OST), which is an algorithm with low complexity., Comment: To be submitted to the IEEE for possible publication
- Published
- 2024
23. Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding
- Author
-
Naveen, Vakada, Sridhar, Arvind Krishna, Guo, Yinyi, and Visser, Erik
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This paper presents a comprehensive chatbot system designed to handle a wide range of audio-related queries by integrating multiple specialized audio processing models. The proposed system uses an intent classifier, trained on a diverse audio query dataset, to route queries about audio content to expert models such as Automatic Speech Recognition (ASR), Speaker Diarization, Music Identification, and Text-to-Audio generation. A 3.8 B LLM model then takes inputs from an Audio Context Detection (ACD) module extracting audio event information from the audio and post processes text domain outputs from the expert models to compute the final response to the user. We evaluated the system on custom audio tasks and MMAU sound set benchmarks. The custom datasets were motivated by target use cases not covered in industry benchmarks and included ACD-timestamp-QA (Question Answering) as well as ACD-temporal-QA datasets to evaluate timestamp and temporal reasoning questions, respectively. First we determined that a BERT based Intent Classifier outperforms LLM-fewshot intent classifier in routing queries. Experiments further show that our approach significantly improves accuracy on some custom tasks compared to state-of-the-art Large Audio Language Models and outperforms models in the 7B parameter size range on the sound testset of the MMAU benchmark, thereby offering an attractive option for on device deployment.
- Published
- 2024
24. A novel approach to differential expression analysis of co-occurrence networks for small-sampled microbiome data
- Author
-
Gadhia, Nandini, Smyrnakis, Michalis, Liu, Po-Yu, Blake, Damer, Hay, Melanie, Nguyen, Anh, Richards, Dominic, Xia, Dong, and Krishna, Ritesh
- Subjects
Quantitative Biology - Quantitative Methods ,94C15 92-08 ,J.3 ,I.2 - Abstract
Graph-based machine learning methods are useful tools in the identification and prediction of variation in genetic data. In particular, the comprehension of phenotypic effects at the cellular level is an accelerating research area in pharmacogenomics. In this article, a novel graph theoretic approach is proposed to infer a co-occurrence network from 16S microbiome data. The approach is specialised to handle datasets containing a small number of samples. Small datasets exacerbate the significant challenges faced by biological data, which exhibit properties such as sparsity, compositionality, and complexity of interactions. Methodologies are also proposed to enrich and statistically filter the inferred networks. The utility of the proposed method lies in that it extracts an informative network from small sampled data that is not only feature-rich, but also biologically meaningful and statistically significant. Although specialised for small data sets, which are abundant, it can be generally applied to any small-sampled dataset, and can also be extended to integrate multi-omics data. The proposed methodology is tested on a data set of chickens vaccinated against and challenged by the protozoan parasite Eimeria tenella. The raw genetic reads are processed, and networks inferred to describe the ecosystems of the chicken intestines under three different stages of disease progression. Analysis of the expression of network features derive biologically intuitive conclusions from purely statistical methods. For example, there is a clear evolution in the distribution of node features in line with the progression of the disease. The distributions also reveal clusters of species interacting mutualistically and parasitically, as expected. Moreover, a specific sub-network is found to persist through all experimental conditions, representative of a persistent microbiome., Comment: 12 pages, 7 figures, under review for a special issue of ACM/IEEE TCBB journal
- Published
- 2024
25. Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
- Author
-
Bigverdi, Mahtab, Luo, Zelun, Hsieh, Cheng-Yu, Shen, Ethan, Chen, Dongping, Shapiro, Linda G., and Krishna, Ranjay
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Multimodal language models (MLMs) still face challenges in fundamental visual perception tasks where specialized models excel. Tasks requiring reasoning about 3D structures benefit from depth estimation, and reasoning about 2D object instances benefits from object detection. Yet, MLMs can not produce intermediate depth or boxes to reason over. Finetuning MLMs on relevant data doesn't generalize well and outsourcing computation to specialized vision tools is too compute-intensive and memory-inefficient. To address this, we introduce Perception Tokens, intrinsic image representations designed to assist reasoning tasks where language is insufficient. Perception tokens act as auxiliary reasoning tokens, akin to chain-of-thought prompts in language models. For example, in a depth-related task, an MLM augmented with perception tokens can reason by generating a depth map as tokens, enabling it to solve the problem effectively. We propose AURORA, a training method that augments MLMs with perception tokens for improved reasoning over visual inputs. AURORA leverages a VQVAE to transform intermediate image representations, such as depth maps into a tokenized format and bounding box tokens, which is then used in a multi-task training framework. AURORA achieves notable improvements across counting benchmarks: +10.8% on BLINK, +11.3% on CVBench, and +8.3% on SEED-Bench, outperforming finetuning approaches in generalization across datasets. It also improves on relative depth: over +6% on BLINK. With perception tokens, AURORA expands the scope of MLMs beyond language-based reasoning, paving the way for more effective visual reasoning capabilities.
- Published
- 2024
26. Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models
- Author
-
Mackraz, Natalie, Sivakumar, Nivedha, Khorshidi, Samira, Patel, Krishna, Theobald, Barry-John, Zappella, Luca, and Apostoloff, Nicholas
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.
- Published
- 2024
27. Many-MobileNet: Multi-Model Augmentation for Robust Retinal Disease Classification
- Author
-
Wang, Hao, Zhu, Wenhui, Dong, Xuanzhao, Chen, Yanxi, Li, Xin, Qiu, Peijie, Chen, Xiwen, Vasa, Vamsi Krishna, Xiong, Yujian, Dumitrascu, Oana M., Razi, Abolfazl, and Wang, Yalin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we propose Many-MobileNet, an efficient model fusion strategy for retinal disease classification using lightweight CNN architecture. Our method addresses key challenges such as overfitting and limited dataset variability by training multiple models with distinct data augmentation strategies and different model complexities. Through this fusion technique, we achieved robust generalization in data-scarce domains while balancing computational efficiency with feature extraction capabilities.
- Published
- 2024
28. SAVER: A Toolbox for Sampling-Based, Probabilistic Verification of Neural Networks
- Author
-
Sivaramakrishnan, Vignesh, Kalagarla, Krishna C., Devonport, Rosalyn, Pilipovsky, Joshua, Tsiotras, Panagiotis, and Oishi, Meeko
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We present a neural network verification toolbox to 1) assess the probability of satisfaction of a constraint, and 2) synthesize a set expansion factor to achieve the probability of satisfaction. Specifically, the tool box establishes with a user-specified level of confidence whether the output of the neural network for a given input distribution is likely to be contained within a given set. Should the tool determine that the given set cannot satisfy the likelihood constraint, the tool also implements an approach outlined in this paper to alter the constraint set to ensure that the user-defined satisfaction probability is achieved. The toolbox is comprised of sampling-based approaches which exploit the properties of signed distance function to define set containment., Comment: 7 pages, 8 figures, submitted to the 28th ACM International Conference on Hybrid Systems: Computation and Control
- Published
- 2024
29. Correspondence and Inverse Correspondence for Input/Output Logic and Region-Based Theories of Space
- Author
-
De Domenico, Andrea, Farjami, Ali, Manoorkar, Krishna, Palmigiano, Alessandra, Panettiere, Mattia, and Wang, Xiaolong
- Subjects
Computer Science - Logic in Computer Science ,Mathematics - Logic - Abstract
We further develop the algebraic approach to input/output logic initiated in \cite{wollic22}, where subordination algebras and a family of their generalizations were proposed as a semantic environment of various input/output logics. In particular: we extend the modal characterizations of a finite number of well known conditions on normative and permission systems, as well as on subordination, precontact, and dual precontact algebras developed in \cite{de2024obligations}, to those corresponding to the infinite class of {\em clopen-analytic inequalities} in a modal language consisting both of positive and of negative unary modal operators; we characterize the syntactic shape of first-order conditions on algebras endowed with subordination, precontact, and dual precontact relations which guarantees these conditions to be the first-order correspondents of axioms in the modal language above; we introduce algorithms for computing the first-order correspondents of modal axioms on algebras endowed with subordination, precontact, and dual precontact relations, and conversely, for computing the modal axioms of which the conditions satisfying the suitable syntactic shape are the first-order correspondents; finally, we extend Celani's dual characterization results between subordination lattices and subordination spaces to a wider environment which also encompasses precontact and dual precontact relations, and relative to an infinite class of first order conditions relating subordination, precontact and dual precontact relations on distributive lattices. The modal characterizations established in the present paper pave the way to establishing faithful embeddings for infinite classes of input/output logics, and hence to their implementation in LogiKEy, Isabelle/HOL, Lean, or other interactive systems.
- Published
- 2024
30. Negative Token Merging: Image-based Adversarial Feature Guidance
- Author
-
Singh, Jaskirat, Li, Lindsey, Shi, Weijia, Krishna, Ranjay, Choi, Yejin, Koh, Pang Wei, Cohen, Michael F., Gould, Stephen, Zheng, Liang, and Zettlemoyer, Luke
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Graphics ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Text-based adversarial guidance using a negative prompt has emerged as a widely adopted approach to steer diffusion models away from producing undesired concepts. While useful, performing adversarial guidance using text alone can be insufficient to capture complex visual concepts or avoid specific visual elements like copyrighted characters. In this paper, for the first time we explore an alternate modality in this direction by performing adversarial guidance directly using visual features from a reference image or other images in a batch. We introduce negative token merging (NegToMe), a simple but effective training-free approach which performs adversarial guidance through images by selectively pushing apart matching visual features between reference and generated images during the reverse diffusion process. By simply adjusting the used reference, NegToMe enables a diverse range of applications. Notably, when using other images in same batch as reference, we find that NegToMe significantly enhances output diversity (e.g., racial, gender, visual) by guiding features of each image away from others. Similarly, when used w.r.t. copyrighted reference images, NegToMe reduces visual similarity to copyrighted content by 34.57%. NegToMe is simple to implement using just few-lines of code, uses only marginally higher (<4%) inference time and is compatible with different diffusion architectures, including those like Flux, which don't natively support the use of a negative prompt. Code is available at https://negtome.github.io
- Published
- 2024
31. Tracing Hierarchical Star Formation out to Kiloparsec Scales in Nearby Spiral Galaxies with UVIT
- Author
-
Shashank, Gairola, Subramanian, Smitha, M., Sreedevi, Menon, Shyam H, Mondal, Chayan, Krishna, Sriram, Das, Mousumi, and Subramaniam, Annapurni
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Molecular clouds fragment under the action of supersonic turbulence & gravity which results in a scale-free hierarchical distribution of star formation (SF) within galaxies. Recent studies suggest that the hierarchical distribution of SF in nearby galaxies shows a dependence on host galaxy properties. In this context, we study the nature of hierarchical SF from a few tens of pc up to several kpc in 4 nearby spiral galaxies NGC1566, NGC5194, NGC5457 & NGC7793, by leveraging the large FoV & high resolution FUV+NUV observations from the UltraViolet Imaging Telescope (UVIT). Using the two-point correlation function, we infer that the young star-forming clumps (SFCs) in the galaxies are arranged in a fractal-like hierarchical distribution, but only up to a maximum scale ($l_{corr}$) & it ranges from 0.5 kpc to 3.1 kpc. The flocculent spiral NGC7793 has $\sim$5 times smaller $l_{corr}$ than the 3 grand design spirals, possibly due to its lower mass, low pressure environment & lack of strong spiral arms. $l_{corr}$ being much smaller than the galaxy size suggests that the SF hierarchy does not extend to the full galaxy size & it is likely an effect set by multiple physical mechanisms in the galaxy. The hierarchical distribution of SFCs dissipates within 10 to 50 Myr, signifying their migration away from their birthplaces over time. Our results suggest that the global hierarchical properties of SF in galaxies are not universal & significant variations exist in the local & global hierarchy parameters of a galaxy. This study also demonstrates the capabilities of UVIT in characterizing the SF hierarchy in nearby galaxies. In the future, a bigger sample can be employed to further understand the role of large-scale galaxy properties (morphology, environment) & physical processes (feedback, turbulence, shear & ISM conditions) on determining the non-universal hierarchical properties of SF in galaxies., Comment: 18 pages, 11 figures, Accepted for publication in Astronomy and Astrophysics (A&A), after language correction
- Published
- 2024
32. Effect of 2009 major SSW event on the mesospheric CO2 cooling
- Author
-
Kumar, Akash, Krishna, MV Sunil, and Ranjan, Alok K
- Subjects
Physics - Space Physics - Abstract
Carbon dioxide (CO2), an important trace species that is gradually increasing in the atmosphere due to anthropogenic activities, causes enhanced warming in the lower atmosphere. The increased concentration of CO2 in the upper atmosphere results in enhanced radiative cooling rates leading to the contraction of the upper atmosphere. Due to its long lifetime and large vertical gradient, CO2 concentration is also influenced by large dynamic events. We report a startling case of variability in CO2 density and its infrared radiative cooling rates in the mesosphere and lower thermospher during a major sudden stratospheric warming (SSW) event. A counter-intuitive connection between CO2 density and resulting CO2 radiative cooling has been observed during the 2009 major SSW event. The behaviour of CO2 cooling rates during such a dramatic events draw attention to our current understanding of CO2 infrared cooling variation and its connection to changes in CO2 concentration. The significance of temperature and atomic oxygen variability in the observed cooling patterns despite changes in CO2 concentration, is also highlighted., Comment: 10 pages, 8 figures
- Published
- 2024
33. SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies
- Author
-
Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, and Anumanchipalli, Gopala Krishna
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speech is a hierarchical collection of text, prosody, emotions, dysfluencies, etc. Automatic transcription of speech that goes beyond text (words) is an underexplored problem. We focus on transcribing speech along with non-fluencies (dysfluencies). The current state-of-the-art pipeline SSDM suffers from complex architecture design, training complexity, and significant shortcomings in the local sequence aligner, and it does not explore in-context learning capacity. In this work, we propose SSDM 2.0, which tackles those shortcomings via four main contributions: (1) We propose a novel \textit{neural articulatory flow} to derive highly scalable speech representations. (2) We developed a \textit{full-stack connectionist subsequence aligner} that captures all types of dysfluencies. (3) We introduced a mispronunciation prompt pipeline and consistency learning module into LLM to leverage dysfluency \textit{in-context pronunciation learning} abilities. (4) We curated Libri-Dys and open-sourced the current largest-scale co-dysfluency corpus, \textit{Libri-Co-Dys}, for future research endeavors. In clinical experiments on pathological speech transcription, we tested SSDM 2.0 using nfvPPA corpus primarily characterized by \textit{articulatory dysfluencies}. Overall, SSDM 2.0 outperforms SSDM and all other dysfluency transcription models by a large margin. See our project demo page at \url{https://berkeley-speech-group.github.io/SSDM2.0/}.
- Published
- 2024
34. Cross-Domain Recommendation Meets Large Language Models
- Author
-
Vajjala, Ajay Krishna, Meher, Dipak, Zhu, Ziwei, and Rosenblum, David S.
- Subjects
Computer Science - Information Retrieval - Abstract
Cross-domain recommendation (CDR) has emerged as a promising solution to the cold-start problem, faced by single-domain recommender systems. However, existing CDR models rely on complex neural architectures, large datasets, and significant computational resources, making them less effective in data-scarce scenarios or when simplicity is crucial. In this work, we leverage the reasoning capabilities of large language models (LLMs) and explore their performance in the CDR domain across multiple domain pairs. We introduce two novel prompt designs tailored for CDR and demonstrate that LLMs, when prompted effectively, outperform state-of-the-art CDR baselines across various metrics and domain combinations in the rating prediction and ranking tasks. This work bridges the gap between LLMs and recommendation systems, showcasing their potential as effective cross-domain recommenders., Comment: 12 pages
- Published
- 2024
35. Observation of a non-reciprocal skyrmion Hall effect of hybrid chiral skyrmion tubes in synthetic antiferromagnetic multilayers
- Author
-
Dohi, Takaaki, Bhukta, Mona, Kammerbauer, Fabian, Bharadwaj, Venkata Krishna, Zarzuela, Ricardo, Sud, Aakanksha, Syskaki, Maria-Andromachi, Tran, Duc Minh, Wintz, Sebastian, Weigand, Markus, Finizio, Simone, Raabe, Jörg, Frömter, Robert, Sinova, Jairo, and Kläui, Mathias
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Topological spin textures in magnetic materials beyond two-dimensional skyrmions have attracted attention for electronics beyond CMOS technologies. In particular, three-dimensional (3D) topological spin textures are promising due to the expected complex non-linear dynamics as well as high static and dynamic thermal stability. In multilayer heterostructures, a hybrid chiral skyrmion tube is a well-known example of a 3D topological spin texture, exhibiting an intriguing chirality transition along the thickness direction. This transition progresses from left-handed to right-handed N\'eel-type chirality, passing through a Bloch-type intermediate state. Such an exotic spin configuration potentially exhibits distinctly different dynamics from that of the common skyrmion tube that exhibits a homogeneous chirality; yet these dynamics have not been ascertained so far. Here we reveal the distinct features of current-induced dynamics that result from the hybrid chiral skyrmion tube structure in synthetic antiferromagnetic (SyAFM) multilayers. Strikingly, the SyAFM hybrid chiral skyrmion tubes exhibit a non-reciprocal skyrmion Hall effect in the flow regime. The non-reciprocity can even be tuned by the degree of magnetic compensation in the SyAFM systems. Our theoretical modeling qualitatively corroborates that the non-reciprocity stems from the dynamic oscillation of skyrmion helicity during its current-induced motion. The findings highlight the critical role of the internal degrees of freedom of these complex skyrmion tubes for their current-induced dynamics.
- Published
- 2024
36. Field tunable plasmonic lenses for optical microscopy
- Author
-
Wadhwa, Divyansh, Singh, Gurharinder, and Balasubramanian, Krishna Bharadwaj
- Subjects
Physics - Optics - Abstract
This study examines the behavior and tunability of plasmonic lenses created from arrays of nanoslits, applicable in sub-wavelength optical microscopy and other high-resolution imaging systems. We performed simulations on COMSOL Multiphysics to assess power flow and focal shifts in plasmonic lenses with differing slit designs, refractive indices, and angular distributions. The findings indicate that the confinement can be regulated by adjusting these parameters., Comment: 5 pages, 4 images, reformatted some images and captions, and updated the text accordingly, results, theory and all else remain same
- Published
- 2024
37. Detecting cosmological recombination lines with a non-ideal antenna -- a first step to practical realization
- Author
-
Krishna, Dhashin and Rao, Mayuri Sathyanarayana
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
Photons emitted during the formation of primordial hydrogen and helium atoms over the Epoch of Recombination are expected to be preserved as additive distortions to the Cosmic Microwave Background (CMB) spectrum. The 'ripple' like spectral features from Cosmological Recombination Radiation (CRR) have never been detected, and are expected to be 9 orders of magnitude fainter than the CMB. Array of Precision Spectrometers for the Epoch of Recombination - APSERa - is an upcoming ground-based experiment to detect the CRR signal over 2-6 GHz. While astrophysical foregrounds may be theoretically separated from the CRR signal using their inherently different spectral characteristics, instrument generated systematics present a practical problem. We present a first ever study to detect the CRR lines in the presence of a non-ideal antenna adopting a toy model for antenna beam chromaticity. Using Euclidean distance and Pearson correlation coefficient as metrics to distinguish between CRR signal presence and absence in a simulation pipeline, we demonstrate that it is indeed possible to detect the signal using a chromatic antenna. Furthermore, we show that there are different tolerances to the antenna non-ideality based on the type of chromaticity, observing location, and LST. These can inform antenna and experiment design for a practical detection., Comment: 14 pages, 18 figures, Submitted to the Journal of Astrophysics and Astronomy
- Published
- 2024
38. Kirchhoff's analogy for a planar ferromagnetic rod
- Author
-
Avatar, G. R. Krishna Chand and Dabade, Vivekanand
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Soft Condensed Matter ,Mathematical Physics ,Mathematics - Analysis of PDEs ,74B20, 74F15, 74G60 - Abstract
Kirchhoff's kinetic analogy relates the equilibrium solutions of an elastic rod or strip to the motion of a spinning top. In this analogy, time is replaced by the arc length parameter in the phase portrait to determine the equilibrium configurations of the rod. Predicted equilibrium solutions from the phase portrait for specific boundary value problems, as well as certain localized solutions, have been experimentally observed. In this study, we employ the kinetic analogy to investigate the equilibrium solutions of planar soft ferromagnetic rods subjected to transverse and longitudinal external magnetic fields. Our analysis reveals a subcritical pitchfork bifurcation in the phase portrait of a ferromagnetic rod subjected to transverse external magnetic field as the axial load is decreased continuously from a large compressive load. Similarly, a supercritical pitchfork bifurcation is observed in the case of longitudinal external magnetic field. We predict equilibrium configurations for a free-standing soft ferromagnetic elastic rod and the same subjected to canonical boundary conditions. Furthermore, we observe novel localized equilibrium solutions arising from homoclinic and heteroclinic orbits, which are absent in the phase portraits of purely elastic rods., Comment: Submitted to Journal of Applied Mechanics (ASME)
- Published
- 2024
39. Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks
- Author
-
Nyachhyon, Jinu, Sharma, Mridul, Thapa, Prajwal, and Bal, Bal Krishna
- Subjects
Computer Science - Computation and Language - Abstract
The Nepali language has distinct linguistic features, especially its complex script (Devanagari script), morphology, and various dialects, which pose a unique challenge for natural language processing (NLP) evaluation. While the Nepali Language Understanding Evaluation (Nep-gLUE) benchmark provides a foundation for evaluating models, it remains limited in scope, covering four tasks. This restricts their utility for comprehensive assessments of NLP models. To address this limitation, we introduce eight new datasets, creating a new benchmark, the Nepali Language Understanding Evaluation (NLUE) benchmark, which covers a total of 12 tasks for evaluating the performance of models across a diverse set of Natural Language Understanding (NLU) tasks. The added tasks include single-sentence classification, similarity and paraphrase tasks, and Natural Language Inference (NLI) tasks. On evaluating the models using added tasks, we observe that the existing models fall short in handling complex NLU tasks effectively. This expanded benchmark sets a new standard for evaluating, comparing, and advancing models, contributing significantly to the broader goal of advancing NLP research for low-resource languages.
- Published
- 2024
40. Monotonicity of limit wave speed of periodic traveling wave solutions via Abelian integral
- Author
-
Patra, Krishna and Rao, Chidella Srinivasa
- Subjects
Mathematics - Analysis of PDEs - Abstract
In this article, we investigate monotonicity of limit wave speed of periodic traveling wave solutions for a perturbed generalized KdV equation via Abelian integral. We have answered an open problem outlined by Yan et al. (2014) and the conjecture proposed by Ouyang et al. (2022). Geometric singular perturbation theory allows for the reduction of a three-dimensional dynamical system to a near-Hamiltonian planar system. Furthermore, utilizing the monotonic behavior of the ratio of Abelian integrals, we develop a method to show the existence of at most one isolated periodic traveling wave which is much simpler proof than that in Yan et al.(2014). Finally, we present numerical simulations that perfectly match the theoretical outcomes., Comment: 22 pages, 5 figures
- Published
- 2024
41. Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation
- Author
-
Ali, Nurshat Fateh, Mohtasim, Md. Mahdi, Mosharrof, Shakil, and Krishna, T. Gopi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
This research presents and compares multiple approaches to automate the generation of literature reviews using several Natural Language Processing (NLP) techniques and retrieval-augmented generation (RAG) with a Large Language Model (LLM). The ever-increasing number of research articles provides a huge challenge for manual literature review. It has resulted in an increased demand for automation. Developing a system capable of automatically generating the literature reviews from only the PDF files as input is the primary objective of this research work. The effectiveness of several Natural Language Processing (NLP) strategies, such as the frequency-based method (spaCy), the transformer model (Simple T5), and retrieval-augmented generation (RAG) with Large Language Model (GPT-3.5-turbo), is evaluated to meet the primary objective. The SciTLDR dataset is chosen for this research experiment and three distinct techniques are utilized to implement three different systems for auto-generating the literature reviews. The ROUGE scores are used for the evaluation of all three systems. Based on the evaluation, the Large Language Model GPT-3.5-turbo achieved the highest ROUGE-1 score, 0.364. The transformer model comes in second place and spaCy is at the last position. Finally, a graphical user interface is created for the best system based on the large language model., Comment: Key Words : T5, SpaCy, Large Language Model, GPT, ROUGE, Literature Review, Natural Language Processing, Retrieval-augmented generation
- Published
- 2024
42. Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment
- Author
-
Chen, Dongping, Chen, Ruoxi, Pu, Shu, Liu, Zhaoyi, Wu, Yanru, Chen, Caixi, Liu, Benlin, Huang, Yue, Wan, Yao, Zhou, Pan, and Krishna, Ranjay
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Many real-world user queries (e.g. "How do to make egg fried rice?") could benefit from systems capable of generating responses with both textual steps with accompanying images, similar to a cookbook. Models designed to generate interleaved text and images face challenges in ensuring consistency within and across these modalities. To address these challenges, we present ISG, a comprehensive evaluation framework for interleaved text-and-image generation. ISG leverages a scene graph structure to capture relationships between text and image blocks, evaluating responses on four levels of granularity: holistic, structural, block-level, and image-specific. This multi-tiered evaluation allows for a nuanced assessment of consistency, coherence, and accuracy, and provides interpretable question-answer feedback. In conjunction with ISG, we introduce a benchmark, ISG-Bench, encompassing 1,150 samples across 8 categories and 21 subcategories. This benchmark dataset includes complex language-vision dependencies and golden answers to evaluate models effectively on vision-centric tasks such as style transfer, a challenging area for current models. Using ISG-Bench, we demonstrate that recent unified vision-language models perform poorly on generating interleaved content. While compositional approaches that combine separate language and image models show a 111% improvement over unified models at the holistic level, their performance remains suboptimal at both block and image levels. To facilitate future work, we develop ISG-Agent, a baseline agent employing a "plan-execute-refine" pipeline to invoke tools, achieving a 122% performance improvement.
- Published
- 2024
43. Experimental investigation of coherence contributions to a nonequilibrium thermodynamic process in a driven quantum system
- Author
-
Shende, Krishna, Dorai, Kavita, and Arvind
- Subjects
Quantum Physics - Abstract
The work done when a system at thermal equilibrium is externally driven by a unitary control parameter leads to irreversible entropy production. The entropy produced can be thought of as a combination of coherence generation and a population mismatch between the target equilibrium state and the actually achieved final state. We experimentally explored this out-of-equilibrium process in an NMR quantum processor and studied the contribution of coherence to irreversible entropy generation. We verified a generalized Clausius inequality, which affirms that irreversible entropy production is lower-bounded., Comment: 6 pages, 5 figures
- Published
- 2024
44. RealSeal: Revolutionizing Media Authentication with Real-Time Realism Scoring
- Author
-
Radharapu, Bhaktipriya and Krishna, Harish
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
The growing threat of deepfakes and manipulated media necessitates a radical rethinking of media authentication. Existing methods for watermarking synthetic data fall short, as they can be easily removed or altered, and current deepfake detection algorithms do not achieve perfect accuracy. Provenance techniques, which rely on metadata to verify content origin, fail to address the fundamental problem of staged or fake media. This paper introduces a groundbreaking paradigm shift in media authentication by advocating for the watermarking of real content at its source, as opposed to watermarking synthetic data. Our innovative approach employs multisensory inputs and machine learning to assess the realism of content in real-time and across different contexts. We propose embedding a robust realism score within the image metadata, fundamentally transforming how images are trusted and circulated. By combining established principles of human reasoning about reality, rooted in firmware and hardware security, with the sophisticated reasoning capabilities of contemporary machine learning systems, we develop a holistic approach that analyzes information from multiple perspectives. This ambitious, blue sky approach represents a significant leap forward in the field, pushing the boundaries of media authenticity and trust. By embracing cutting-edge advancements in technology and interdisciplinary research, we aim to establish a new standard for verifying the authenticity of digital media., Comment: Best Paper Award, Blue Sky Track at 26th ACM International Conference on Multimodal Interaction, Nov 2024, San Jose, Costa Rica
- Published
- 2024
- Full Text
- View/download PDF
45. OPMOS: Ordered Parallel Multi-Objective Shortest-Path
- Author
-
Gold, Leo, Bienkowski, Adam, Sidoti, David, Pattipati, Krishna, and Khan, Omer
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Hardware Architecture ,Computer Science - Data Structures and Algorithms ,Computer Science - Performance - Abstract
The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph. To solve the NP-hard MOS problem, the literature explores heuristic multi-objective A*-style algorithmic approaches. A generalized MOS algorithm maintains a "frontier" of partial paths at each node and performs ordered processing to ensure that Pareto-optimal paths are generated to reach the goal node. The algorithm becomes computationally intractable as the number of objectives increases due to a rapid increase in the non-dominated paths, and the concomitantly large increase in Pareto-optimal solutions. While prior works have focused on algorithmic methods to reduce the complexity, we tackle this challenge by exploiting parallelism using an algorithm-architecture approach. The key insight is that MOS algorithms rely on the ordered execution of partial paths to maintain high work efficiency. The OPMOS framework, proposed herein, unlocks ordered parallelism and efficiently exploits the concurrent execution of multiple paths in MOS. Experimental evaluation using the NVIDIA GH200 Superchip shows the performance scaling potential of OPMOS on work efficiency and parallelism using a real-world application to ship routing., Comment: 15 pages
- Published
- 2024
46. One Diffusion to Generate Them All
- Author
-
Le, Duong H., Pham, Tuan, Lee, Sangho, Clark, Christopher, Kembhavi, Aniruddha, Mandt, Stephan, Krishna, Ranjay, and Lu, Jiasen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps, while also handling tasks like image deblurring, upscaling, and reverse processes such as depth estimation and segmentation. Additionally, OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs. Our model takes a straightforward yet effective approach by treating all tasks as frame sequences with varying noise scales during training, allowing any frame to act as a conditioning image at inference time. Our unified training framework removes the need for specialized architectures, supports scalable multi-task training, and adapts smoothly to any resolution, enhancing both generalization and scalability. Experimental results demonstrate competitive performance across tasks in both generation and prediction such as text-to-image, multiview generation, ID preservation, depth estimation and camera pose estimation despite relatively small training dataset. Our code and checkpoint are freely available at https://github.com/lehduong/OneDiffusion, Comment: two first authors contribute equally
- Published
- 2024
47. Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
- Author
-
Sajith, Aryan and Kathala, Krishna Chaitanya Rao
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
This study investigates the relative impact of training data quality versus quantity on the performance of small language models (SLMs), utilizing the TinyStories dataset for empirical analysis. Analysis of dataset variations with respect to size (25% and 50% of the original size) and duplication (controlled rates of 25%, 50%, 75%, and 100%) were performed. Model performance was evaluated based on the validation loss, accuracy, and perplexity metrics. Results indicate training data quality plays a more significant role in the overall performance of SLMs, especially given scale of this experiment. Minimal duplication positively impacted model accuracy (+0.87% increase in accuracy at 25% duplication) without significantly increasing perplexity (+0.52% increase going from 0% to 25% duplication) but excessive duplication led to pronounced performance degradation (-40% drop in accuracy at 100% duplication). The implications of this exploration extend beyond just model performance; training large-scale models imposes significant financial and computational burdens, which can be prohibitive for organizations, individuals, and the public at large, especially in developing countries. Additionally, the energy consumption associated with large-scale training raises environmental concerns. Understanding the relative importance of data quality versus quantity could democratize AI technology, making advanced models more accessible and sustainable for all., Comment: 10 pages, 4 figures
- Published
- 2024
48. Development of Pre-Trained Transformer-based Models for the Nepali Language
- Author
-
Thapa, Prajwal, Nyachhyon, Jinu, Sharma, Mridul, and Bal, Bal Krishna
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Transformer-based pre-trained language models have dominated the field of Natural Language Processing (NLP) for quite some time now. However, the Nepali language, spoken by approximately 32 million people worldwide, remains significantly underrepresented in this domain. This underrepresentation is primarily attributed to the scarcity of monolingual data corpora and limited available resources for the Nepali language. While existing efforts have predominantly concentrated on basic encoder-based models, there is a notable gap in the exploration of decoder-based architectures. To address this gap, we have collected 27.5 GB of Nepali text data, approximately 2.4x larger than any previously available Nepali language corpus. Leveraging this data, we pre-trained three different models i.e., BERT, RoBERTa, and GPT-2, exclusively for the Nepali Language. Furthermore, we performed instruction tuning and explored its potential for monolingual Nepali data, providing a foundation for future research. Our models outperformed the existing best model by 2 points on Nep-gLUE benchmark, scoring 95.60 and also outperformed existing models on text generation tasks, demonstrating improvements in both understanding and generating Nepali text.
- Published
- 2024
49. Capacitive Touch Sensor Modeling With a Physics-informed Neural Network and Maxwell's Equations
- Author
-
Mo, Ganyong, Narayanan, Krishna Kumar, Castells-Rufas, David, and Carrabina, Jordi
- Subjects
Physics - Computational Physics ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Maxwell's equations are the fundamental equations for understanding electric and magnetic field interactions and play a crucial role in designing and optimizing sensor systems like capacitive touch sensors, which are widely prevalent in automotive switches and smartphones. Ensuring robust functionality and stability of the sensors in dynamic environments necessitates profound domain expertise and computationally intensive multi-physics simulations. This paper introduces a novel approach using a Physics-Informed Neural Network (PINN) based surrogate model to accelerate the design process. The PINN model solves the governing electrostatic equations describing the interaction between a finger and a capacitive sensor. Inputs include spatial coordinates from a 3D domain encompassing the finger, sensor, and PCB, along with finger distances. By incorporating the electrostatic equations directly into the neural network's loss function, the model captures the underlying physics. The learned model thus serves as a surrogate sensor model on which inference can be carried out in seconds for different experimental setups without the need to run simulations. Efficacy results evaluated on unseen test cases demonstrate the significant potential of PINNs in accelerating the development and design optimization of capacitive touch sensors., Comment: ESM'2024 (The 38th annual European Simulation and Modelling Conference)
- Published
- 2024
50. Radio Halo Detection in MWA Data using Deep Neural Networks and Generative Data Augmentation
- Author
-
Mishra, Ashutosh K., Tolley, Emma, Krishna, Shreyam Parth, and Kneib, Jean-Paul
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Detecting diffuse radio emission, such as from halos, in galaxy clusters is crucial for understanding large-scale structure formation in the universe. Traditional methods, which rely on X-ray and Sunyaev-Zeldovich (SZ) cluster pre-selection, introduce biases that limit our understanding of the full population of diffuse radio sources. In this work, we provide a possible resolution for this astrophysical tension by developing a machine learning (ML) framework capable of unbiased detection of diffuse emission, using a limited real dataset like those from the Murchison Widefield Array (MWA). We generate for the first time radio halo images using Wasserstein Generative Adversarial Networks (WGANs) and Denoising Diffusion Probabilistic Models (DDPMs), and apply them to train a neural network classifier independent of pre-selection methods. The halo images generated by DDPMs are of higher quality than those produced by WGANs. The diffusion-supported classifier with a multi-head attention block achieved the best average validation accuracy of 95.93% over 10 runs, using 36 clusters for training and 10 for testing, without further hyperparameter tuning. Using our classifier, we rediscovered 9/12 halos (75% detection rate) from the MeerKAT Galaxy Cluster Legacy Survey (MGCLS) Catalogue, and 5/8 halos (63% detection rate) from the Planck Sunyaev-Zeldovich Catalogue 2 (PSZ2) within the GaLactic and Extragalactic All-sky MWA (GLEAM) survey. In addition, we identify 11 potential new halos, minihalos, or candidates in the COSMOS field using XMM-chandra-detected clusters in GLEAM data. This work demonstrates the potential of ML for unbiased detection of diffuse emission and provides labeled datasets for further study., Comment: 16 pages, Submitted to MNRAS
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.