524,481 results on '"Reddy AT"'
Search Results
2. Advanced Underwater Image Quality Enhancement via Hybrid Super-Resolution Convolutional Neural Networks and Multi-Scale Retinex-Based Defogging Techniques
- Author
-
Gogireddy, Yugandhar Reddy and Gogireddy, Jithendra Reddy
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The difficulties of underwater image degradation due to light scattering, absorption, and fog-like particles which lead to low resolution and poor visibility are discussed in this study report. We suggest a sophisticated hybrid strategy that combines Multi-Scale Retinex (MSR) defogging methods with Super-Resolution Convolutional Neural Networks (SRCNN) to address these problems. The Retinex algorithm mimics human visual perception to reduce uneven lighting and fogging, while the SRCNN component improves the spatial resolution of underwater photos.Through the combination of these methods, we are able to enhance the clarity, contrast, and colour restoration of underwater images, offering a reliable way to improve image quality in difficult underwater conditions. The research conducts extensive experiments on real-world underwater datasets to further illustrate the efficacy of the suggested approach. In terms of sharpness, visibility, and feature retention, quantitative evaluation which use metrics like the Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR) demonstrates notable advances over conventional techniques.In real-time underwater applications like marine exploration, underwater robotics, and autonomous underwater vehicles, where clear and high-resolution imaging is crucial for operational success, the combination of deep learning and conventional image processing techniques offers a computationally efficient framework with superior results.
- Published
- 2024
3. A New Class of Three Nucleon Forces and their Implications
- Author
-
Cirigliano, V., Dawid, M., Dekens, W., and Reddy, S.
- Subjects
Nuclear Theory ,Astrophysics - High Energy Astrophysical Phenomena ,High Energy Physics - Lattice ,High Energy Physics - Phenomenology - Abstract
We identify a new class of three-nucleon forces that arises in the low-energy effective theory of nuclear interactions including pions. We estimate their contribution to the energy of neutron and nuclear matter and find that it can be as important as the leading-order three-nucleon forces previously considered in the literature. The magnitude of this force is set by the strength of the coupling of pions to two nucleons and is presently not well constrained by experiments. The implications for nuclei, nuclear matter, and the equation of state of neutron matter are briefly discussed., Comment: 6 pages, 4 figures
- Published
- 2024
4. Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age
- Author
-
AlDahoul, Nouar, Tan, Myles Joshua Toledo, Kasireddy, Harishwar Reddy, and Zaki, Yasir
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Technologies for recognizing facial attributes like race, gender, age, and emotion have several applications, such as surveillance, advertising content, sentiment analysis, and the study of demographic trends and social behaviors. Analyzing demographic characteristics based on images and analyzing facial expressions have several challenges due to the complexity of humans' facial attributes. Traditional approaches have employed CNNs and various other deep learning techniques, trained on extensive collections of labeled images. While these methods demonstrated effective performance, there remains potential for further enhancements. In this paper, we propose to utilize vision language models (VLMs) such as generative pre-trained transformer (GPT), GEMINI, large language and vision assistant (LLAVA), PaliGemma, and Microsoft Florence2 to recognize facial attributes such as race, gender, age, and emotion from images with human faces. Various datasets like FairFace, AffectNet, and UTKFace have been utilized to evaluate the solutions. The results show that VLMs are competitive if not superior to traditional techniques. Additionally, we propose "FaceScanPaliGemma"--a fine-tuned PaliGemma model--for race, gender, age, and emotion recognition. The results show an accuracy of 81.1%, 95.8%, 80%, and 59.4% for race, gender, age group, and emotion classification, respectively, outperforming pre-trained version of PaliGemma, other VLMs, and SotA methods. Finally, we propose "FaceScanGPT", which is a GPT-4o model to recognize the above attributes when several individuals are present in the image using a prompt engineered for a person with specific facial and/or physical attributes. The results underscore the superior multitasking capability of FaceScanGPT to detect the individual's attributes like hair cut, clothing color, postures, etc., using only a prompt to drive the detection and recognition tasks., Comment: 52 pages, 13 figures
- Published
- 2024
5. Relational Weight Optimization for Enhancing Team Performance in Multi-Agent Multi-Armed Bandits
- Author
-
Kotturu, Monish Reddy, Movahed, Saniya Vahedian, Robinette, Paul, Jerath, Kshitij, Redlich, Amanda, and Azadeh, Reza
- Subjects
Computer Science - Multiagent Systems - Abstract
We introduce an approach to improve team performance in a Multi-Agent Multi-Armed Bandit (MAMAB) framework using Fastest Mixing Markov Chain (FMMC) and Fastest Distributed Linear Averaging (FDLA) optimization algorithms. The multi-agent team is represented using a fixed relational network and simulated using the Coop-UCB2 algorithm. The edge weights of the communication network directly impact the time taken to reach distributed consensus. Our goal is to shrink the timescale on which the convergence of the consensus occurs to achieve optimal team performance and maximize reward. Through our experiments, we show that the convergence to team consensus occurs slightly faster in large constrained networks., Comment: Accepted for publication in Modeling, Estimation, and Control Conference (MECC) 2024
- Published
- 2024
6. Pi in the Sky: Neutron Stars with Exceptionally Light QCD Axions
- Author
-
Kumamoto, Mia, Huang, Junwu, Drischler, Christian, Baryakhtar, Masha, and Reddy, Sanjay
- Subjects
High Energy Physics - Phenomenology ,Astrophysics - High Energy Astrophysical Phenomena ,Nuclear Theory - Abstract
We present a comprehensive study of axion condensed neutron stars that arise in models of an exceptionally light axion that couples to quantum chromodynamics (QCD). These axions solve the strong-charge-parity (CP) problem, but have a mass-squared lighter than that due to QCD by a factor of $\varepsilon<1$. Inside dense matter, the axion potential is altered, and much of the matter in neutron stars resides in the axion condensed phase where the strong-CP parameter $\theta =\pi$ and CP remains a good symmetry. In these regions, masses and interactions of nuclei are modified, in turn changing the equation of state (EOS), structure and phenomenology of the neutron stars. We take first steps toward the study of the EOS of neutron star matter at $\theta =\pi$ within chiral effective field theory and use relativistic mean field theory to deduce the resulting changes to nuclear matter and the neutron star low-density EOS. We derive constraints on the exceptionally light axion parameter space based on observations of the thermal relaxation of accreting neutron stars, isolated neutron star cooling, and pulsar glitches, excluding the region up to $5 \times 10^{-7} \lesssim \varepsilon \lesssim 0.2$ for $ m_a \gtrsim 2\times 10^{-9}\,{\rm eV} $. We comment on potential changes to the neutron star mass-radius relationship, and discuss the possibility of novel, nuclear-density compact objects with $\theta =\pi$ that are stabilized not by gravity but by the axion potential., Comment: 45 pages, 19 figures
- Published
- 2024
7. Turn-by-Turn Indoor Navigation for the Visually Impaired
- Author
-
Srinivasaiah, Santosh, Nekkanti, Sai Kumar, and Nedhunuri, Rohith Reddy
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Navigating indoor environments presents significant challenges for visually impaired individuals due to complex layouts and the absence of GPS signals. This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera, leveraging multimodal models, deep learning algorithms, and large language models (LLMs). The smartphone's camera captures real-time images of the surroundings, which are then sent to a nearby Raspberry Pi capable of running on-device LLM models, multimodal models, and deep learning algorithms to detect and recognize architectural features, signage, and obstacles. The interpreted visual data is then translated into natural language instructions by an LLM running on the Raspberry Pi, which is sent back to the user, offering intuitive and context-aware guidance via audio prompts. This solution requires minimal workload on the user's device, preventing it from being overloaded and offering compatibility with all types of devices, including those incapable of running AI models. This approach enables the client to not only run advanced models but also ensure that the training data and other information do not leave the building. Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces, highlighting its potential for widespread application
- Published
- 2024
8. Knowledge Graph Enhanced Language Agents for Recommendation
- Author
-
Guo, Taicheng, Liu, Chaochun, Wang, Hai, Mannam, Varun, Wang, Fang, Chen, Xin, Zhang, Xiangliang, and Reddy, Chandan K.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval ,Computer Science - Multiagent Systems - Abstract
Language agents have recently been used to simulate human behavior and user-item interactions for recommendation systems. However, current language agent simulations do not understand the relationships between users and items, leading to inaccurate user profiles and ineffective recommendations. In this work, we explore the utility of Knowledge Graphs (KGs), which contain extensive and reliable relationships between users and items, for recommendation. Our key insight is that the paths in a KG can capture complex relationships between users and items, eliciting the underlying reasons for user preferences and enriching user profiles. Leveraging this insight, we propose Knowledge Graph Enhanced Language Agents(KGLA), a framework that unifies language agents and KG for recommendation systems. In the simulated recommendation scenario, we position the user and item within the KG and integrate KG paths as natural language descriptions into the simulation. This allows language agents to interact with each other and discover sufficient rationale behind their interactions, making the simulation more accurate and aligned with real-world cases, thus improving recommendation performance. Our experimental results show that KGLA significantly improves recommendation performance (with a 33%-95% boost in NDCG@1 among three widely used benchmarks) compared to the previous best baseline method.
- Published
- 2024
9. Infogent: An Agent-Based Framework for Web Information Aggregation
- Author
-
Reddy, Revanth Gangi, Mukherjee, Sagnik, Kim, Jeonghwan, Wang, Zhenhailong, Hakkani-Tur, Dilek, and Ji, Heng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Despite seemingly performant web agents on the task-completion benchmarks, most existing methods evaluate the agents based on a presupposition: the web navigation task consists of linear sequence of actions with an end state that marks task completion. In contrast, our work focuses on web navigation for information aggregation, wherein the agent must explore different websites to gather information for a complex query. We consider web information aggregation from two different perspectives: (i) Direct API-driven Access relies on a text-only view of the Web, leveraging external tools such as Google Search API to navigate the web and a scraper to extract website contents. (ii) Interactive Visual Access uses screenshots of the webpages and requires interaction with the browser to navigate and access information. Motivated by these diverse information access settings, we introduce Infogent, a novel modular framework for web information aggregation involving three distinct components: Navigator, Extractor and Aggregator. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7% under Direct API-Driven Access on FRAMES, and improves over an existing information-seeking web agent by 4.3% under Interactive Visual Access on AssistantBench., Comment: Preprint
- Published
- 2024
10. Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play
- Author
-
Li, Sha, Reddy, Revanth Gangi, Nguyen, Khanh Duy, Wang, Qingyun, Fung, May, Han, Chi, Han, Jiawei, Natarajan, Kartik, Voss, Clare R., and Ji, Heng
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society. Relying on historical events to project the future is insufficient as such events are sparse and do not cover all possible conditions and nuanced situations. Simulation of these complex events can help better prepare and reduce the negative impact. We develop a controllable complex news event simulator guided by both the event schema representing domain knowledge about the scenario and user-provided assumptions representing case-specific conditions. As event dynamics depend on the fine-grained social and cultural context, we further introduce a geo-diverse commonsense and cultural norm-aware knowledge enhancement component. To enhance the coherence of the simulation, apart from the global timeline of events, we take an agent-based approach to simulate the individual character states, plans, and actions. By incorporating the schema and cultural norms, our generated simulations achieve much higher coherence and appropriateness and are received favorably by participants from a humanitarian assistance organization., Comment: Accepted as EMNLP 2024 Demo
- Published
- 2024
11. Ergodic Risk Sensitive Control of Markovian Multiclass Many-Server Queues with Abandonment
- Author
-
Anugu, Sumith Reddy and Pang, Guodong
- Subjects
Mathematics - Probability ,Mathematics - Optimization and Control - Abstract
We study the optimal scheduling problem for a Markovian multiclass queueing network with abandonment in the Halfin--Whitt regime, under the long run average (ergodic) risk sensitive cost criterion. The objective is to prove asymptotic optimality for the optimal control arising from the corresponding ergodic risk sensitive control (ERSC) problem for the limiting diffusion. In particular, we show that the optimal ERSC value associated with the diffusion-scaled queueing process converges to that of the limiting diffusion in the asymptotic regime. The challenge that ERSC poses is that one cannot express the ERSC cost as an expectation over the mean empirical measure associated with the queueing process, unlike in the usual case of a long run average (ergodic) cost. We develop a novel approach by exploiting the variational representations of the limiting diffusion and the Poisson-driven queueing dynamics, which both involve certain auxiliary controls. The ERSC costs for both the diffusion-scaled queueing process and the limiting diffusion can be represented as the integrals of an extended running cost over a mean empirical measure associated with the corresponding extended processes using these auxiliary controls. For the lower bound proof, we exploit the connections of the ERSC problem for the limiting diffusion with a two-person zero-sum stochastic differential game. We also make use of the mean empirical measures associated with the extended limiting diffusion and diffusion-scaled processes with the auxiliary controls. One major technical challenge in both lower and upper bound proofs, is to establish the tightness of the aforementioned mean empirical measures for the extended processes. We identify nearly optimal controls appropriately in both cases so that the existing ergodicity properties of the limiting diffusion and diffusion-scaled queueing processes can be used.
- Published
- 2024
12. Can a Machine Distinguish High and Low Amount of Social Creak in Speech?
- Author
-
Laukkanen, Anne-Maria, Kadiri, Sudarsana Reddy, Narayanan, Shrikanth, and Alku, Paavo
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Objectives: ncreased prevalence of social creak particularly among female speakers has been reported in several studies. The study of social creak has been previously conducted by combining perceptual evaluation of speech with conventional acoustical parameters such as the harmonic-to-noise ratio and cepstral peak prominence. In the current study, machine learning (ML) was used to automatically distinguish speech of low amount of social creak from speech of high amount of social creak. Methods: The amount of creak in continuous speech samples produced in Finnish by 90 female speakers was first perceptually assessed by two voice specialists. Based on their assessments, the speech samples were divided into two categories (low $vs$. high amount of creak). Using the speech signals and their creak labels, seven different ML models were trained. Three spectral representations were used as feature for each model. Results: The results show that the best performance (accuracy of 71.1\%) was obtained by the following two systems: an Adaboost classifier using the mel-spectrogram feature and a decision tree classifier using the mel-frequency cepstral coefficient feature. Conclusions: The study of social creak is becoming increasingly popular in sociolinguistic and vocological research. The conventional human perceptual assessment of the amount of creak is laborious and therefore ML technology could be used to assist researchers studying social creak. The classification systems reported in this study could be considered as baselines in future ML-based studies on social creak., Comment: Accepted in Journal of Voice
- Published
- 2024
13. KatzBot: Revolutionizing Academic Chatbot for Enhanced Communication
- Author
-
Kumar, Sahil, Paikar, Deepa, Vutukuri, Kiran Sai, Ali, Haider, Ainala, Shashidhar Reddy, Krishnan, Aditya Murli, and Zhang, Youshan
- Subjects
Computer Science - Computation and Language - Abstract
Effective communication within universities is crucial for addressing the diverse information needs of students, alumni, and external stakeholders. However, existing chatbot systems often fail to deliver accurate, context-specific responses, resulting in poor user experiences. In this paper, we present KatzBot, an innovative chatbot powered by KatzGPT, a custom Large Language Model (LLM) fine-tuned on domain-specific academic data. KatzGPT is trained on two university-specific datasets: 6,280 sentence-completion pairs and 7,330 question-answer pairs. KatzBot outperforms established existing open source LLMs, achieving higher accuracy and domain relevance. KatzBot offers a user-friendly interface, significantly enhancing user satisfaction in real-world applications. The source code is publicly available at \url{https://github.com/AiAI-99/katzbot}.
- Published
- 2024
14. Are Language Model Logits Calibrated?
- Author
-
Lovering, Charles, Krumdick, Michael, Lai, Viet Dac, Kumar, Nilesh, Reddy, Varshini, Koncel-Kedziorski, Rik, and Tanner, Chris
- Subjects
Computer Science - Artificial Intelligence - Abstract
Some information is factual (e.g., "Paris is in France"), whereas other information is probabilistic (e.g., "the coin flip will be a [Heads/Tails]."). We believe that good Language Models (LMs) should understand and reflect this nuance. Our work investigates this by testing if LMs' output probabilities are calibrated to their textual contexts. We define model "calibration" as the degree to which the output probabilities of candidate tokens are aligned with the relative likelihood that should be inferred from the given context. For example, if the context concerns two equally likely options (e.g., heads or tails for a fair coin), the output probabilities should reflect this. Likewise, context that concerns non-uniformly likely events (e.g., rolling a six with a die) should also be appropriately captured with proportionate output probabilities. We find that even in simple settings the best LMs (1) are poorly calibrated, and (2) have systematic biases (e.g., preferred colors and sensitivities to word orderings). For example, gpt-4o-mini often picks the first of two options presented in the prompt regardless of the options' implied likelihood, whereas Llama-3.1-8B picks the second. Our other consistent finding is mode-collapse: Instruction-tuned models often over-allocate probability mass on a single option. These systematic biases introduce non-intuitive model behavior, making models harder for users to understand., Comment: 10 pages (main), 24 pages (appendix), under review
- Published
- 2024
15. Coexistent Topological and Chiral Phonons in Chiral RhGe: An ab initio study
- Author
-
Reddy, P. V. Sreenivasa and Guo, Guang-Yu
- Subjects
Condensed Matter - Materials Science - Abstract
The CoSi-family of materials hosts unconventional multifold chiral fermions, such as spin-1 and spin-3/2 fermions, leading to intriguing phenomena like long Fermi arc surface states and exotic transport properties, as shown by electronic structure calculations. Recent interest on the phonon behavior in chiral materials is growing in condensed matter physics due to their unique characteristics, including topological phonons, protected surface states and the chiral nature of phonons with non-zero angular momentum. This chiral behavior also enables phonon modes to generate magnetic moments. Therefore, investigating the chiral phonon behavior in chiral CoSi-family materials could provide innovative opportunities in the development of phononic devices. In this study, we explore the topological and chiral phonon behavior in chiral RhGe using first-principles calculations. RhGe hosts multiple double-Weyl points in both its acoustic and optical phonon branches, including spin-1 Weyl points at the $\Gamma$ point and charge-2 Dirac points at the R point in the Brillouin zone (BZ). The topological nature of the phonons in RhGe is revealed by the presence of topologically protected nontrivial phonon surface states and corresponding iso-frequency contours observed in the (001) and (111) surface BZ. Furthermore, phonon angular momentum calculations confirm the chiral nature of phonons in RhGe, with some phonon modes exhibiting finite magnetic moments. Our findings thus indicate that the coexistence of topological and chiral phonon modes in chiral RhGe not only deepens our understanding of the phonon behavior in chiral CoSi-family but also opens new pathways for developing advanced materials and devices., Comment: 11 pages, 6 figures
- Published
- 2024
16. Towards Safer Heuristics With XPlain
- Author
-
Karimi, Pantea, Pirelli, Solal, Kakarla, Siva Kesava Reddy, Beckett, Ryan, Segarra, Santiago, Li, Beibin, Namyar, Pooria, and Arzani, Behnaz
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Networking and Internet Architecture ,Computer Science - Performance - Abstract
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic's impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set), and they do not explain why. We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
- Published
- 2024
17. Geometry-influenced cooling performance of lithium-ion battery
- Author
-
Dubey, Dwijendra, Mishra, A., Ghosh, Subrata, Reddy, M. V., and Pandey, Ramesh
- Subjects
Physics - Applied Physics - Abstract
Battery geometry (shape and size) is one of the important parameters which governs the battery capacity and thermal behavior. In the dynamic conditions or during the operation, the performance of batteries become much more complex. Herein, the changes in thermal behavior of lithium-ion battery (LIB)by altering the geometry i.e., length to diameter ratio (l/d), is investigated. The geometries considered are named as large geometry (LG), datum geometry (DG) and small geometry (SG) with the l/d ratio of 5.25, 3.61, and 2.38, respectively. A three-dimensional (3D) multi-partition thermal model is adopted, and the numerical results are validated by the published experimental data. For three different cooling approaches such as radial, both-tab and mixed cooling, the average battery temperature and temperature heterogeneity are thoroughly examined considering the heat transfer coefficients (h) of50 and 100 W/m2K at discharge rates of 1, 2 and 3C. Amongst, the minimum average battery temperature is exhibited by DG, the minimum radial temperature heterogeneity is obtained from LG, and substantial outperformance in terms of faster cooling rate is identified for SG, irrespective of the cooling approach employed, Comment: 39 pages, 12 Figures
- Published
- 2024
- Full Text
- View/download PDF
18. Utilizing Spatiotemporal Data Analytics to Pinpoint Outage Location
- Author
-
Mandati, Reddy, Chen, Po-Chen, Anderson, Vladyslav, Sapkota, Bishwa, Warren, Michael Jarrell, Besharati, Bobby, Agarwal, Ankush, and Johnston III, Samuel
- Subjects
Computer Science - Databases ,Computer Science - Computers and Society - Abstract
Understanding the exact fault location in the post-event analysis is the key to improving the accuracy of outage management. Unfortunately, the fault location is not generally well documented during the restoration process, creating a big challenge for post-event analysis. By utilizing various data source systems, including outage management system (OMS) data, asset geospatial information system (GIS) data, and vehicle location data, this paper creates a novel method to pinpoint the outage location accurately to create additional insights for distribution operations and performance teams during the post-event analysis.
- Published
- 2024
19. Integrating Artificial Intelligence Models and Synthetic Image Data for Enhanced Asset Inspection and Defect Identification
- Author
-
Mandati, Reddy, Anderson, Vladyslav, Chen, Po-chen, Agarwal, Ankush, Dokic, Tatjana, Barnard, David, Finn, Michael, Cromer, Jesse, Mccauley, Andrew, Tutaj, Clay, Dave, Neha, Besharati, Bobby, Barnett, Jamie, and Krall, Timothy
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires significant manual labeling effort. We propose a novel solution that combines synthetic asset defect images with manually labeled drone images. This solution has several benefits: improves performance of defect detection, reduces the number of hours spent on manual labeling, and enables the capability to generate realistic images of rare defects where not enough real-world data is available. We employ a workflow that combines 3D modeling tools such as Maya and Unreal Engine to create photorealistic 3D models and 2D renderings of defective assets and their surroundings. These synthetic images are then integrated into our training pipeline augmenting the real data. This study implements an end-to-end Artificial Intelligence solution to detect assets and asset defects from the combined imagery repository. The unique contribution of this research lies in the application of advanced computer vision models and the generation of photorealistic 3D renderings of defective assets, aiming to transform the asset inspection process. Our asset detection model has achieved an accuracy of 92 percent, we achieved a performance lift of 67 percent when introducing approximately 2,000 synthetic images of 2k resolution. In our tests, the defect detection model achieved an accuracy of 73 percent across two batches of images. Our analysis demonstrated that synthetic data can be successfully used in place of real-world manually labeled data to train defect detection model.
- Published
- 2024
20. Data Adaptive Few-shot Multi Label Segmentation with Foundation Model
- Author
-
Reddy, Gurunath, Shanbhag, Dattesh, and Anand, Deepa
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The high cost of obtaining accurate annotations for image segmentation and localization makes the use of one and few shot algorithms attractive. Several state-of-the-art methods for few-shot segmentation have emerged, including text-based prompting for the task but suffer from sub-optimal performance for medical images. Leveraging sub-pixel level features of existing Vision Transformer (ViT) based foundation models for identifying similar region of interest (RoI) based on a single template image have been shown to be very effective for one shot segmentation and localization in medical images across modalities. However, such methods rely on assumption that template image and test image are well matched and simple correlation is sufficient to obtain correspondences. In practice, however such an approach can fail to generalize in clinical data due to patient pose changes, inter-protocol variations even within a single modality or extend to 3D data using single template image. Moreover, for multi-label tasks, the RoI identification has to be performed sequentially. In this work, we propose foundation model (FM) based adapters for single label, multi-label localization and segmentation to address these concerns. We demonstrate the efficacy of the proposed method for multiple segmentation and localization tasks for both 2D and 3D data as we well as clinical data with different poses and evaluate against the state of the art few shot segmentation methods.
- Published
- 2024
21. GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes
- Author
-
Powers, Maximus, Mavani, Umang, Jonala, Harshitha Reddy, Tiwari, Ansh, and Wei, Hua
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,I.2.7 - Abstract
The detection of bias in natural language processing (NLP) is a critical challenge, particularly with the increasing use of large language models (LLMs) in various domains. This paper introduces GUS-Net, an innovative approach to bias detection that focuses on three key types of biases: (G)eneralizations, (U)nfairness, and (S)tereotypes. GUS-Net leverages generative AI and automated agents to create a comprehensive synthetic dataset, enabling robust multi-label token classification. Our methodology enhances traditional bias detection methods by incorporating the contextual encodings of pre-trained models, resulting in improved accuracy and depth in identifying biased entities. Through extensive experiments, we demonstrate that GUS-Net outperforms state-of-the-art techniques, achieving superior performance in terms of accuracy, F1-score, and Hamming Loss. The findings highlight GUS-Net's effectiveness in capturing a wide range of biases across diverse contexts, making it a valuable tool for social bias detection in text. This study contributes to the ongoing efforts in NLP to address implicit bias, providing a pathway for future research and applications in various fields. The Jupyter notebooks used to create the dataset and model are available at: https://github.com/Ethical-Spectacle/fair-ly/tree/main/resources. Warning: This paper contains examples of harmful language, and reader discretion is recommended.
- Published
- 2024
22. From CAD to URDF: Co-Design of a Jet-Powered Humanoid Robot Including CAD Geometry
- Author
-
Vanteddu, Punith Reddy, Nava, Gabriele, Bergonti, Fabio, L'Erario, Giuseppe, Paolino, Antonello, and Pucci, Daniele
- Subjects
Computer Science - Robotics - Abstract
Co-design optimization strategies usually rely on simplified robot models extracted from CAD. While these models are useful for optimizing geometrical and inertial parameters for robot control, they might overlook important details essential for prototyping the optimized mechanical design. For instance, they may not account for mechanical stresses exerted on the optimized geometries and the complexity of assembly-level design. In this paper, we introduce a co-design framework aimed at improving both the control performance and mechanical design of our robot. Specifically, we identify the robot links that significantly influence control performance. The geometric characteristics of these links are parameterized and optimized using a multi-objective evolutionary algorithm to achieve optimal control performance. Additionally, an automated Finite Element Method (FEM) analysis is integrated into the framework to filter solutions not satisfying the required structural safety margin. We validate the framework by applying it to enhance the mechanical design for flight performance of the jet-powered humanoid robot iRonCub., Comment: IROS 2024
- Published
- 2024
23. MorCode: Face Morphing Attack Generation using Generative Codebooks
- Author
-
PN, Aravinda Reddy, Ramachandra, Raghavendra, Venkatesh, Sushma, Rao, Krothapalli Sreenivasa, Mitra, Pabitra, and Krishna, Rakesh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Face recognition systems (FRS) can be compromised by face morphing attacks, which blend textural and geometric information from multiple facial images. The rapid evolution of generative AI, especially Generative Adversarial Networks (GAN) or Diffusion models, where encoded images are interpolated to generate high-quality face morphing images. In this work, we present a novel method for the automatic face morphing generation method \textit{MorCode}, which leverages a contemporary encoder-decoder architecture conditioned on codebook learning to generate high-quality morphing images. Extensive experiments were performed on the newly constructed morphing dataset using five state-of-the-art morphing generation techniques using both digital and print-scan data. The attack potential of the proposed morphing generation technique, \textit{MorCode}, was benchmarked using three different face recognition systems. The obtained results indicate the highest attack potential of the proposed \textit{MorCode} when compared with five state-of-the-art morphing generation methods on both digital and print scan data.
- Published
- 2024
24. Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
- Author
-
Wu, Tong, Zhang, Shujian, Song, Kaiqiang, Xu, Silei, Zhao, Sanqiang, Agrawal, Ravi, Indurthi, Sathish Reddy, Xiang, Chong, Mittal, Prateek, and Zhou, Wenxuan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures., Comment: Preprint
- Published
- 2024
25. Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection
- Author
-
PN, Aravinda Reddy, Ramachandra, Raghavendra, Rao, Krothapalli Sreenivasa, and Rathod, Pabitra Mitra Vinod
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Deepfakes pose a critical threat to biometric authentication systems by generating highly realistic synthetic media. Existing multimodal deepfake detectors often struggle to adapt to diverse data and rely on simple fusion methods. To address these challenges, we propose Gumbel-Rao Monte Carlo Bi-modal Neural Architecture Search (GRMC-BMNAS), a novel architecture search framework that employs Gumbel-Rao Monte Carlo sampling to optimize multimodal fusion. It refines the Straight through Gumbel Softmax (STGS) method by reducing variance with Rao-Blackwellization, stabilizing network training. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Crucial features are efficiently identified from backbone networks, while within the cell structure, a weighted fusion operation integrates information from various sources. By varying parameters such as temperature and number of Monte carlo samples yields an architecture that maximizes classification performance and better generalisation capability. Experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrate an impressive AUC percentage of 95.4\%, achieved with minimal model parameters.
- Published
- 2024
26. Structured Spatial Reasoning with Open Vocabulary Object Detectors
- Author
-
Nejatishahidin, Negar, Vongala, Madhukar Reddy, and Kosecka, Jana
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify their location is key to successful completion of these tasks. Several recent works have used powerful Vision and Language Models (VLMs) to unlock this capability in robotic agents. In this paper we introduce a structured probabilistic approach that integrates rich 3D geometric features with state-of-the-art open-vocabulary object detectors to enhance spatial reasoning for robotic perception. The approach is evaluated and compared against zero-shot performance of the state-of-the-art Vision and Language Models (VLMs) on spatial reasoning tasks. To enable this comparison, we annotate spatial clauses in real-world RGB-D Active Vision Dataset [1] and conduct experiments on this and the synthetic Semantic Abstraction [2] dataset. Results demonstrate the effectiveness of the proposed method, showing superior performance of grounding spatial relations over state of the art open-source VLMs by more than 20%.
- Published
- 2024
27. Application of Manifold Learning to Selection of Different Galaxy Populations and Scaling Relation Analysis
- Author
-
Sanjaripour, Sogol, Hemmati, Shoubaneh, Mobasher, Bahram, Canalizo, Gabriela, Barish, Barry, Shivaei, Irene, Coil, Alison L., Chartab, Nima, Jafariyazani, Marziye, Reddy, Naveen A., and Azadi, Mojegan
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
The growing volume of data produced by large astronomical surveys necessitates the development of efficient analysis techniques capable of effectively managing high-dimensional datasets. This study addresses this need by demonstrating some applications of manifold learning and dimensionality reduction techniques, specifically the Self-Organizing Map (SOM), on the optical+NIR SED space of galaxies, with a focus on sample comparison, selection biases, and predictive power using a small subset. To this end, we utilize a large photometric sample from the five CANDELS fields and a subset with spectroscopic measurements from the KECK MOSDEF survey in two redshift bins at $z\sim1.5$ and $z\sim2.2$. We trained SOM with the photometric data and mapped the spectroscopic data onto it as our study case. We found that MOSDEF targets do not cover all SED shapes existing in the SOM. Our findings reveal that Active Galactic Nuclei (AGN) within the MOSDEF sample are mapped onto the more massive regions of the SOM, confirming previous studies and known selection biases towards higher-mass, less dusty galaxies. Furthermore, SOM were utilized to map measured spectroscopic features, examining the relationship between metallicity variations and galaxy mass. Our analysis confirmed that more massive galaxies exhibit lower [OIII]/H$\beta$ and [OIII]/[OII] ratios and higher H$\alpha$/H$\beta$ ratios, consistent with the known mass-metallicity relation. These findings highlight the effectiveness of SOM in analyzing and visualizing complex, multi-dimensional datasets, emphasizing their potential in data-driven astronomical studies.
- Published
- 2024
28. Robots in the Middle: Evaluating LLMs in Dispute Resolution
- Author
-
Tan, Jinzhe, Westermann, Hannes, Pottanigari, Nikhil Reddy, Šavelka, Jaromír, Meeùs, Sébastien, Godet, Mia, and Benyekhlef, Karim
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Computation and Language - Abstract
Mediation is a dispute resolution method featuring a neutral third-party (mediator) who intervenes to help the individuals resolve their dispute. In this paper, we investigate to which extent large language models (LLMs) are able to act as mediators. We investigate whether LLMs are able to analyze dispute conversations, select suitable intervention types, and generate appropriate intervention messages. Using a novel, manually created dataset of 50 dispute scenarios, we conduct a blind evaluation comparing LLMs with human annotators across several key metrics. Overall, the LLMs showed strong performance, even outperforming our human annotators across dimensions. Specifically, in 62% of the cases, the LLMs chose intervention types that were rated as better than or equivalent to those chosen by humans. Moreover, in 84% of the cases, the intervention messages generated by the LLMs were rated as better than or equal to the intervention messages written by humans. LLMs likewise performed favourably on metrics such as impartiality, understanding and contextualization. Our results demonstrate the potential of integrating AI in online dispute resolution (ODR) platforms.
- Published
- 2024
29. UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters
- Author
-
Reddy, Kovvuri Sai Gopal, Saran, Bodduluri, Adityaja, A. Mudit, Shigwan, Saurabh J., Kumar, Nitin, and Mukherjee, Snehasis
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The data-hungry approach of supervised classification drives the interest of the researchers toward unsupervised approaches, especially for problems such as medical image segmentation, where labeled data are difficult to get. Motivated by the recent success of Vision transformers (ViT) in various computer vision tasks, we propose an unsupervised segmentation framework with a pre-trained ViT. Moreover, by harnessing the graph structure inherent within the image, the proposed method achieves a notable performance in segmentation, especially in medical images. We further introduce a modularity-based loss function coupled with an Auto-Regressive Moving Average (ARMA) filter to capture the inherent graph topology within the image. Finally, we observe that employing Scaled Exponential Linear Unit (SELU) and SILU (Swish) activation functions within the proposed Graph Neural Network (GNN) architecture enhances the performance of segmentation. The proposed method provides state-of-the-art performance (even comparable to supervised methods) on benchmark image segmentation datasets such as ECSSD, DUTS, and CUB, as well as challenging medical image segmentation datasets such as KVASIR, CVC-ClinicDB, ISIC-2018. The github repository of the code is available on \url{https://github.com/ksgr5566/UnSeGArmaNet}., Comment: Accepted at BMVC-2024. arXiv admin note: text overlap with arXiv:2405.06057
- Published
- 2024
30. VolDen: a tool to extract number density from the column density of filamentary molecular clouds
- Author
-
K, Ashesh A., Eswaraiah, Chakali, Reddy, P Ujwal, and Wang, Jia-Wei
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Gas volume density is one of the critical parameters, along with dispersions in magnetic field position angles and non-thermal gas motions, for estimating the magnetic field strength using the Davis-Chandrasekhar-Fermi (DCF) relation or through its modified versions for a given region of interest. We present VolDen an novel python-based algorithm to extract the number density map from the column density map for an elongated interstellar filament. VolDen uses the workflow of RadFil to prepare the radial profiles across the spine. The user has to input the column density map and pre-computed spine along with the essential RadFil parameters (such as distance to the filament, the distance between two consecutive radial profile cuts, etc.) to extract the radial column density profiles. The thickness and volume density values are then calculated by modeling the column density profiles with a Plummer-like profile and introducing a cloud boundary condition. The cloud boundary condition was verified through an accompanying N-PDF column density analysis. In this paper, we discuss the workflow of VolDen and apply it to two filamentary clouds. We chose LDN 1495 as our primary target owing to its nearby distance and elongated morphology. In addition, the distant filament RCW 57A is chosen as the secondary target to compare our results with the published results. Upon publication, a complete tutorial of VolDen and the codes will be available via GitHub., Comment: 14 pages, 10 figures, and Accepted for publication in Journal of Astrophysics and Astronomy (JOAA)
- Published
- 2024
31. Synthetic Generation of Dermatoscopic Images with GAN and Closed-Form Factorization
- Author
-
Mekala, Rohan Reddy, Pahde, Frederik, Baur, Simon, Chandrashekar, Sneha, Diep, Madeline, Wenzel, Markus, Wisotzky, Eric L., Yolcu, Galip Ümit, Lapuschkin, Sebastian, Ma, Jackie, Eisert, Peter, Lindvall, Mikael, Porter, Adam, and Samek, Wojciech
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
In the realm of dermatological diagnoses, where the analysis of dermatoscopic and microscopic skin lesion images is pivotal for the accurate and early detection of various medical conditions, the costs associated with creating diverse and high-quality annotated datasets have hampered the accuracy and generalizability of machine learning models. We propose an innovative unsupervised augmentation solution that harnesses Generative Adversarial Network (GAN) based models and associated techniques over their latent space to generate controlled semiautomatically-discovered semantic variations in dermatoscopic images. We created synthetic images to incorporate the semantic variations and augmented the training data with these images. With this approach, we were able to increase the performance of machine learning models and set a new benchmark amongst non-ensemble based models in skin lesion classification on the HAM10000 dataset; and used the observed analytics and generated models for detailed studies on model explainability, affirming the effectiveness of our solution., Comment: This preprint has been submitted to the Workshop on Synthetic Data for Computer Vision (SyntheticData4CV 2024 is a side event on 18th European Conference on Computer Vision 2024). This preprint has not undergone peer review or any post-submission improvements or corrections
- Published
- 2024
32. CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints
- Author
-
Atmakuru, Anirudh, Nainani, Jatin, Bheemreddy, Rohith Siddhartha Reddy, Lakkaraju, Anirudh, Yao, Zonghai, Zamani, Hamed, and Chang, Haw-Shiuan
- Subjects
Computer Science - Computation and Language - Abstract
Evaluating the creativity of large language models (LLMs) in story writing is difficult because LLM-generated stories could seemingly look creative but be very similar to some existing stories in their huge and proprietary training corpus. To overcome this challenge, we introduce a novel benchmark dataset with varying levels of prompt specificity: CS4 ($\mathbf{C}$omparing the $\mathbf{S}$kill of $\mathbf{C}$reating $\mathbf{S}$tories by $\mathbf{C}$ontrolling the $\mathbf{S}$ynthesized $\mathbf{C}$onstraint $\mathbf{S}$pecificity). By increasing the number of requirements/constraints in the prompt, we can increase the prompt specificity and hinder LLMs from retelling high-quality narratives in their training data. Consequently, CS4 empowers us to indirectly measure the LLMs' creativity without human annotations. Our experiments on LLaMA, Gemma, and Mistral not only highlight the creativity challenges LLMs face when dealing with highly specific prompts but also reveal that different LLMs perform very differently under different numbers of constraints and achieve different balances between the model's instruction-following ability and narrative coherence. Additionally, our experiments on OLMo suggest that Learning from Human Feedback (LHF) can help LLMs select better stories from their training data but has limited influence in boosting LLMs' ability to produce creative stories that are unseen in the training corpora. The benchmark is released at https://github.com/anirudhlakkaraju/cs4_benchmark.
- Published
- 2024
33. AraSync: Precision Time Synchronization in Rural Wireless Living Lab
- Author
-
Nadim, Md, Islam, Taimoor Ul, Reddy, Salil, Zhang, Tianyi, Meng, Zhibo, Afzal, Reshal, Babu, Sarath, Ahmad, Arsalan, Qiao, Daji, Arora, Anish, and Zhang, Hongwei
- Subjects
Computer Science - Networking and Internet Architecture ,Computer Science - Performance - Abstract
Time synchronization is a critical component in network operation and management, and it is also required by Ultra-Reliable, Low-Latency Communications (URLLC) in next-generation wireless systems such as those of 5G, 6G, and Open RAN. In this context, we design and implement AraSync as an end-to-end time synchronization system in the ARA wireless living lab to enable advanced wireless experiments and applications involving stringent time constraints. We make use of Precision Time Protocol (PTP) at different levels to achieve synchronization accuracy in the order of nanoseconds. Along with fiber networks, AraSync enables time synchronization across the AraHaul wireless x-haul network consisting of long-range, high-capacity mmWave and microwave links. In this paper, we present the detailed design and implementation of AraSync, including its hardware and software components and the PTP network topology. Further, we experimentally characterize the performance of AraSync from spatial and temporal dimensions. Our measurement and analysis of the clock offset and mean path delay show the impact of the wireless channel and weather conditions on the PTP synchronization accuracy., Comment: 8 pages, 10 figures, accepted in ACM WiNTECH 2024 (The 18th ACM Workshop on Wireless Network Testbeds, Experimental evaluation & Characterization 2024)
- Published
- 2024
- Full Text
- View/download PDF
34. IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
- Author
-
Aravapalli, Akhilesh, Marreddy, Mounika, Oota, Subba Reddy, Mamidi, Radhika, and Gupta, Manish
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? However, these studies have mainly focused on BERT and the English language. In this paper, we investigate similar questions regarding encoding capability and robustness for 8 linguistic properties across 13 different perturbations in 6 Indic languages, using 9 multilingual Transformer models (7 universal and 2 Indic-specific). To conduct this study, we introduce a novel multilingual benchmark dataset, IndicSentEval, containing approximately $\sim$47K sentences. Surprisingly, our probing analysis of surface, syntactic, and semantic properties reveals that while almost all multilingual models demonstrate consistent encoding performance for English, they show mixed results for Indic languages. As expected, Indic-specific multilingual models capture linguistic properties in Indic languages better than universal models. Intriguingly, universal models broadly exhibit better robustness compared to Indic-specific models, particularly under perturbations such as dropping both nouns and verbs, dropping only verbs, or keeping only nouns. Overall, this study provides valuable insights into probing and perturbation-specific strengths and weaknesses of popular multilingual Transformer-based models for different Indic languages. We make our code and dataset publicly available [https://tinyurl.com/IndicSentEval}]., Comment: 23 pages, 11 figures
- Published
- 2024
35. Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments
- Author
-
Baddam, Vasanth Reddy, Gumussoy, Suat, Boker, Almuatazbellah, and Eldardiry, Hoda
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Machine Learning - Abstract
Many real-world problems, such as controlling swarms of drones and urban traffic, naturally lend themselves to modeling as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods often suffer from scalability challenges, primarily due to the introduction of communication among agents. Consequently, a key challenge lies in adapting the success of deep learning in single-agent RL to the multi-agent setting. In response to this challenge, we propose an approach that fundamentally reimagines multi-agent environments. Unlike conventional methods that model each agent individually with separate networks, our approach, the Bottom Up Network (BUN), adopts a unique perspective. BUN treats the collective of multi-agents as a unified entity while employing a specialized weight initialization strategy that promotes independent learning. Furthermore, we dynamically establish connections among agents using gradient information, enabling coordination when necessary while maintaining these connections as limited and sparse to effectively manage the computational budget. Our extensive empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs., Comment: 13 pages, 24 figures
- Published
- 2024
36. VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
- Author
-
Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, and Roux, Nicolas Le
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.
- Published
- 2024
37. Optimal One- and Two-Sided Multi-level ASK Modulation or RIS-Assisted Noncoherent Communication Systems
- Author
-
Mukhopadhyay, Srijika, Reddy, Badri Ramanjaneya, Dash, Soumya P., Alexandropoulos, George C., and Aissa, Sonia
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
In this paper, we analyze the performance of one- and two-sided amplitude shift keying (ASK) modulations in single-input single-output wireless communication aided by a reconfigurable intelligent surface (RIS). Two scenarios are considered for the channel conditions: a blocked direct channel between the transmitter and the receiver, and an unblocked one. For the receiver, a noncoherent maximum likelihood detector is proposed, which detects the transmitted data signal based on statistical knowledge of the channel. The system's performance is then evaluated by deriving the symbol error probability (SEP) for both scenarios using the proposed noncoherent receiver structures. We also present a novel optimization framework to obtain the optimal one- and two-sided ASK modulation schemes that minimize the SEP under constraints on the available average transmit power for both the blocked and unblocked direct channel scenarios. Our extensive numerical investigations showcase that the considered RIS-aided communication system achieves superior error performance with both derived SEP-optimal ASK modulation schemes as compared to respective traditional ASK modulation. It is also demonstrated that, between the two proposed modulation schemes, the two-sided one yields the best SEP. The error performance is further analyzed for different system parameters, providing a comprehensive performance investigation of RIS-assisted noncoherent wireless communication systems., Comment: 13 pages, 6 figures
- Published
- 2024
38. An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
- Author
-
Reddy, Arun, Shah, Ketul, Rivera, Corban, Paul, William, De Melo, Celso M., and Chellappa, Rama
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable "training-free" classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors classification, where the training data points are derived from synthetic videos only. We compare these results with another training-free approach -- zero-shot classification using text descriptions of each gesture. In our experiments with the RoCoG-v2 dataset, we find that using synthetic training videos yields significantly lower classification accuracy on real test videos compared to using a relatively small number of real training videos. We also observe that video backbones that were fine-tuned on classification tasks serve as superior feature extractors, and that the choice of fine-tuning data has a substantial impact on k-nearest neighbors performance. Lastly, we find that zero-shot text-based classification performs poorly on the gesture recognition task, as gestures are not easily described through natural language., Comment: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II (SPIE Defense + Commercial Sensing, 2024)
- Published
- 2024
- Full Text
- View/download PDF
39. Domain adaptation in application to gravitational lens finding
- Author
-
Parul, Hanna, Gleyzer, Sergei, Reddy, Pranath, and Toomey, Michael W.
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
The next decade is expected to see a tenfold increase in the number of strong gravitational lenses, driven by new wide-field imaging surveys. To discover these rare objects, efficient automated detection methods need to be developed. In this work, we assess the performance of three domain adaptation techniques -- Adversarial Discriminative Domain Adaptation (ADDA), Wasserstein Distance Guided Representation Learning (WDGRL), and Supervised Domain Adaptation (SDA) -- in enhancing lens-finding algorithms trained on simulated data when applied to observations from the Hyper Suprime-Cam Subaru Strategic Program. We find that WDGRL combined with an ENN-based encoder provides the best performance in an unsupervised setting and that supervised domain adaptation is able to enhance the model's ability to distinguish between lenses and common similar-looking false positives, such as spiral galaxies, which is crucial for future lens surveys., Comment: 10 pages, 5 figures. Submitted to ApJ. Comments are welcome!
- Published
- 2024
40. The AURORA Survey: An Extraordinarily Mature, Star-forming Galaxy at $z\sim 7$
- Author
-
Shapley, Alice E., Sanders, Ryan L., Topping, Michael W., Reddy, Naveen A., Pahl, Anthony J., Oesch, Pascal A., Berg, Danielle A., Bouwens, Rychard J., Brammer, Gabriel, Carnall, Adam C., Cullen, Fergus, Davé, Romeel, Dunlop, James S., Ellis, Richard S., Schreiber, N. M. Förster, Furlanetto, Steven R ., Glazebrook, Karl, Illingworth, Garth D., Jones, Tucker, Kriek, Mariska, McLeod, Derek J., McLure, Ross J., Narayanan, Desika, Pettini, Max, Schaerer, Daniel, Stark, Daniel P., Steidel, Charles C., Tang, Mengtao, Clarke, Leonardo, Donnan, Callum T., and Kehoe, Emily
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We present the properties of a massive, large, dusty, metal-rich, star-forming galaxy at z_spec=6.73. GOODSN-100182 was observed with JWST/NIRSpec as part of the AURORA survey, and is also covered by public multi-wavelength HST and JWST imaging. While the large mass of GOODSN-100182 (~10^10 M_sun) was indicated prior to JWST, NIRCam rest-optical imaging now reveals the presence of an extended disk (r_eff~1.5 kpc). In addition, the NIRSpec R~1000 spectrum of GOODSN-100182 includes the detection of a large suite of rest-optical nebular emission lines ranging in wavelength from [OII]3727 up to [NII]6583. The ratios of Balmer lines suggest significant dust attenuation (E(B-V)_gas=0.40+0.10/-0.09), consistent with the red rest-UV slope inferred for GOODSN-100182 (beta=-0.50+/-0.09). The star-formation rate based on dust-corrected H-alpha emission is log(SFR(H-alpha)/ M_sun/yr)=2.02+0.13/-0.14, well above the z~7 star-forming main sequence in terms of specific SFR. Strikingly, the ratio of [NII]6583/H-alpha emission suggests almost solar metallicity, as does the ratio ([OIII]5007/H-beta)/([NII]6583/H-alpha) and the detection of the faint [FeII]4360 emission feature, whereas the [OIII]5007/[OII]3727 ratio suggests roughly 50% solar metallicity. Overall, the excitation and ionization properties of GOODSN-100182 more closely resemble those of typical star-forming galaxies at z~2-3 rather than z~7. Based on public spectroscopy of the GOODS-N field, we find that GOODSN-100182 resides within a significant galaxy overdensity, and is accompanied by a spectroscopically-confirmed neighbor galaxy. GOODSN-100182 demonstrates the existence of mature, chemically-enriched galaxies within the first billion years of cosmic time, whose properties must be explained by galaxy formation models., Comment: 16 pages, 13 figures, submitted to ApJ
- Published
- 2024
41. Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis
- Author
-
Reddy, Shukesh, Poddar, Nishit, Das, Srijan, and Das, Abhijit
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as local patterns along with the primary task for robust and unbiased face analysis. We experimented with our hypothesis on three major paradigms of face analysis: face attribute and face-based emotion analysis, and deepfake detection. Our experiment results exhibit that better feature representation can be gleaned from our proposed model for fair and bias-less face analysis.
- Published
- 2024
42. Measuring Software Development Waste in Open-Source Software Projects
- Author
-
Varanasi, Dhiraj SM, D, Divij, Karre, Sai Anirudh, and Reddy, Y Raghu
- Subjects
Computer Science - Software Engineering - Abstract
Software Development Waste (SDW) is defined as any resource-consuming activity that does not add value to the client or the organization developing the software. SDW impacts the overall efficiency and productivity of a software project as the scale and size of the project grows. Although engineering leaders usually put in effort to minimize waste, the lack of definitive measures to track and manage SDW is a cause of concern. To address this gap, we propose five measures, namely Stale Forks, Project Diversification Index, PR Rejection Rate, Backlog Inversion Index, and Feature Fulfillment Rate to potentially identify unused artifacts, building the wrong feature/product, mismanagement of backlog types of SDW. We apply these measures on ten open-source projects and share our observations to apply them in practice for managing SDW., Comment: 9 pages, This manuscript is a pre-publication version of the paper that was published at IEEE SEAA 2024
- Published
- 2024
- Full Text
- View/download PDF
43. Mitigating Selection Bias with Node Pruning and Auxiliary Options
- Author
-
Choi, Hyeong Kyu, Xu, Weijie, Xue, Chi, Eckman, Stephanie, and Reddy, Chandan K.
- Subjects
Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) often show unwarranted preference for certain choice options when responding to multiple-choice questions, posing significant reliability concerns in LLM-automated systems. To mitigate this selection bias problem, previous solutions utilized debiasing methods to adjust the model's input and/or output. Our work, in contrast, investigates the model's internal representation of the selection bias. Specifically, we introduce a novel debiasing approach, Bias Node Pruning (BNP), which eliminates the linear layer parameters that contribute to the bias. Furthermore, we present Auxiliary Option Injection (AOI), a simple yet effective input modification technique for debiasing, which is compatible even with black-box LLMs. To provide a more systematic evaluation of selection bias, we review existing metrics and introduce Choice Kullback-Leibler Divergence (CKLD), which addresses the insensitivity of the commonly used metrics to label imbalance. Experiments show that our methods are robust and adaptable across various datasets when applied to three LLMs.
- Published
- 2024
44. Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
- Author
-
Sidhu, Mankeerat, Chopra, Hetarth, Blume, Ansel, Kim, Jeonghwan, Reddy, Revanth Gangi, and Ji, Heng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we introduce SearchDet, a training-free long-tail object detection framework that significantly enhances open-vocabulary object detection performance. SearchDet retrieves a set of positive and negative images of an object to ground, embeds these images, and computes an input image-weighted query which is used to detect the desired concept in the image. Our proposed method is simple and training-free, yet achieves over 48.7% mAP improvement on ODinW and 59.1% mAP improvement on LVIS compared to state-of-the-art models such as GroundingDINO. We further show that our approach of basing object detection on a set of Web-retrieved exemplars is stable with respect to variations in the exemplars, suggesting a path towards eliminating costly data annotation and training procedures.
- Published
- 2024
45. Transitioning Together: Collaborative Work in Adolescent Chronic Illness Management
- Author
-
Zehrung, Rachael, Reddy, Madhu, and Chen, Yunan
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Adolescents with chronic illnesses need to learn self-management skills in preparation for the transition from pediatric to adult healthcare, which is associated with negative health outcomes for youth. However, few studies have explored how adolescents in a pre-transition stage practice self-management and collaborative management with their parents. Through interviews with 15 adolescents (aged 15-17), we found that adolescents managed mundane self-care tasks and experimented with lifestyle changes to be more independent, which sometimes conflicted with their parents' efforts to ensure their safety. Adolescents and their parents also performed shared activities that provided adolescents with the opportunity to learn and practice self-management skills. Based on our findings, we discuss considerations for technology design to facilitate transition and promote parent-adolescent collaboration in light of these tensions., Comment: 24 pages. CSCW 2024
- Published
- 2024
46. Towards sub-millisecond latency real-time speech enhancement models on hearables
- Author
-
Dementyev, Artem, Reddy, Chandan K. A., Wisdom, Scott, Chatlani, Navin, Hershey, John R., and Lyon, Richard F.
- Subjects
Computer Science - Sound ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 644k parameters to generate FIR taps. We benchmark that our system can run on low-power DSP with 388 MIPS and mean end-to-end latency of 3.35 ms. We provide a comparison with baseline low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables.
- Published
- 2024
47. Stacking and Analyzing $z\approx 2$ MOSDEF Galaxies by Spectral Types: Implications for Dust Geometry and Galaxy Evolution
- Author
-
Lorenz, Brian, Kriek, Mariska, Shapley, Alice E., Sanders, Ryan L., Coil, Alison L., Leja, Joel, Mobasher, Bahram, Nelson, Erica, Price, Sedona H., Reddy, Naveen A., Runco, Jordan N., Suess, Katherine A., Shivaei, Irene, Siana, Brian, and Weisz, Daniel R.
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We examine star-formation and dust properties for a sample of 660 galaxies at $1.37\leq z\leq 2.61$ in the MOSDEF survey by dividing them into groups with similarly-shaped spectral energy distributions (SEDs). For each group, we combine the galaxy photometry into a finely-sampled composite SED, and stack their spectra. This method enables the study of more complete galaxy samples, including galaxies with very faint emission lines. We fit these composite SEDs with Prospector to measure the stellar attenuation and SED-based star-formation rates (SFRs). We also derive emission-line properties from the spectral stacks, including Balmer decrements, dust-corrected SFRs, and metallicities. We find that stellar attenuation correlates most strongly with mass, while nebular attenuation correlates strongly with both mass and SFR. Furthermore, the excess of nebular compared to stellar attenuation correlates most strongly with SFR. The highest SFR group has 2 mag of excess nebular attenuation. Our results are consistent with a model in which star-forming regions become more dusty as galaxy mass increases. To explain the increasing excess nebular attenuation, we require a progressively larger fraction of star formation to occur in highly-obscured regions with increasing SFR. This highly-obscured star formation could occur in dusty clumps or central starbursts. Additionally, as each galaxy group represents a different evolutionary stage, we study their locations on the UVJ and SFR-mass diagrams. As mass increases, metallicity and dust attenuation increase, while sSFR decreases. However, the most massive group moves towards the quiescent region of the UVJ diagram, while showing less obscuration, potentially indicating removal of dust., Comment: 21 pages, 7 figures, accepted for publication in ApJ
- Published
- 2024
48. Detecting and Measuring Confounding Using Causal Mechanism Shifts
- Author
-
Reddy, Abbavaram Gowtham and Balasubramanian, Vineeth N
- Subjects
Computer Science - Artificial Intelligence - Abstract
Detecting and measuring confounding effects from data is a key challenge in causal inference. Existing methods frequently assume causal sufficiency, disregarding the presence of unobserved confounding variables. Causal sufficiency is both unrealistic and empirically untestable. Additionally, existing methods make strong parametric assumptions about the underlying causal generative process to guarantee the identifiability of confounding variables. Relaxing the causal sufficiency and parametric assumptions and leveraging recent advancements in causal discovery and confounding analysis with non-i.i.d. data, we propose a comprehensive approach for detecting and measuring confounding. We consider various definitions of confounding and introduce tailored methodologies to achieve three objectives: (i) detecting and measuring confounding among a set of variables, (ii) separating observed and unobserved confounding effects, and (iii) understanding the relative strengths of confounding bias between different sets of variables. We present useful properties of a confounding measure and present measures that satisfy those properties. Empirical results support the theoretical analysis.
- Published
- 2024
49. An H{\alpha} view of galaxy build-up in the first 2 Gyr: luminosity functions at z~4-6.5 from NIRCam/grism spectroscopy
- Author
-
Covelo-Paz, Alba, Giovinazzo, Emma, Oesch, Pascal A., Meyer, Romain A., Weibel, Andrea, Brammer, Gabriel, Fudamoto, Yoshinobu, Kerutt, Josephine, Lin, Jamie, Matharu, Jasleen, Naidu, Rohan P., Velichko, Anna, Bollo, Victoria, Bouwens, Rychard, Chisholm, John, Illingworth, Garth D., Kramarenko, Ivan, Magee, Daniel, Maseda, Michael, Matthee, Jorryt, Nelson, Erica, Reddy, Naveen, Schaerer, Daniel, Stefanon, Mauro, and Xiao, Mengyuan
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
The H{\alpha} nebular emission line is an optimal tracer for recent star formation in galaxies. With the advent of JWST, this line has recently become observable at z>3 for the first time. We present a catalog of 1013 H{\alpha} emitters at 3.7
3 obtained based purely on spectroscopic data, robustly tracing galaxy star formation rates (SFRs) beyond the peak of the cosmic star formation history. We compare our results with theoretical predictions from three different simulations and find good agreement at z~4-6. The UV LFs of this spectroscopically-confirmed sample are in good agreement with pre-JWST measurements obtained with photometrically-selected objects. Finally, we derive SFR functions and integrate these to compute the evolution of the cosmic star-formation rate densities across z~4-6, finding values in good agreement with recent UV estimates from Lyman-break galaxies, which imply a continuous decrease in SFR density by a factor of 3x over z~4 to z~6. Our work shows the power of NIRCam grism observations to efficiently provide new tests for early galaxy formation models based on emission line statistics., Comment: 17 pages, 14 figures - Published
- 2024
50. Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
- Author
-
Ashvin, Aditya, Lahiri, Rimita, Kommineni, Aditya, Bishop, Somer, Lord, Catherine, Kadiri, Sudarsana Reddy, and Narayanan, Shrikanth
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
The ability to reliably transcribe child-adult conversations in a clinical setting is valuable for diagnosis and understanding of numerous developmental disorders such as Autism Spectrum Disorder. Recent advances in deep learning architectures and availability of large scale transcribed data has led to development of speech foundation models that have shown dramatic improvements in ASR performance. However, the ability of these models to translate well to conversational child-adult interactions is under studied. In this work, we provide a comprehensive evaluation of ASR performance on a dataset containing child-adult interactions from autism diagnostic sessions, using Whisper, Wav2Vec2, HuBERT, and WavLM. We find that speech foundation models show a noticeable performance drop (15-20% absolute WER) for child speech compared to adult speech in the conversational setting. Then, we employ LoRA on the best performing zero shot model (whisper-large) to probe the effectiveness of fine-tuning in a low resource setting, resulting in ~8% absolute WER improvement for child speech and ~13% absolute WER improvement for adult speech., Comment: 5 pages, 3 figures, 4 tables
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.