18,102 results on '"Vikas, P"'
Search Results
2. Biometric and fatty acid profile of the brine shrimp Artemia franciscana enriched with marine microalgal species belonging to prymnesiophytes and eustigmatophytes
- Author
-
Vikas, P A
- Published
- 2023
- Full Text
- View/download PDF
3. Verification and Validation of Autonomous Systems
- Author
-
Shetiya, Sneha Sudhir, Vyas, Vikas, and Renukuntla, Shreyas
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
This paper describes how to proficiently prevent software defects in autonomous vehicles, discover and correct defects if they are encountered, and create a higher level of assurance in the software product development phase. It also describes how to ensure high assurance on software reliability.
- Published
- 2024
4. Controlling the degree of entanglement in downconversion by targeted birth zone activation
- Author
-
Bhat, Vikas S, Chatterjee, Rounak, Bajar, Kiran, and Mujumdar, Sushil
- Subjects
Quantum Physics ,Physics - Optics - Abstract
We explore the consequences of varying the pump beam waist that illuminates a nonlinear crystal, realizing spontaneous parametric down-conversion (SPDC). The coherence is transferred from the marginal one-photon wavefunction to the two-photon wavefunction where it manifests into entanglement in the form of spatial correlation. We interpret this as a consequence of the number of independent emitters, called the biphoton birth zones, targeted by the pump beam on the crystal. The birth zone number $N$ characterises the number of such birth zones that fit along a diameter of the region illuminated by the pump waist. To experimentally observe the duality between the one- and two-photon interference, we employ a double slit and analyse their visibilities $V_m$ and $V_\text{12}$ respectively. We demonstrate the conservation of the quantity $V_m^2+V_\text{12}^2$. Finally, we identify three regimes of entanglement of the down-converted photons based on $N$. We show that changing the pump waist lets us actively control the degree of entanglement letting us access these regimes. We provide implications of each regime, and mention experimental use cases thereof., Comment: 11 pages, 5 figures
- Published
- 2024
5. Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy
- Author
-
Dumitru, Razvan-Gabriel, Clotan, Paul-Ioan, Yadav, Vikas, Peteleaza, Darius, and Surdeanu, Mihai
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,I.2.7 ,I.2.0 - Abstract
This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), enhancing the traditional methodology established by SliceGPT. By transitioning from constant to dynamic slicing, our method leverages the newly proposed Layer Redundancy (LR) score, which assesses how much change each layer changes its input by measuring the cosine similarity of the input to the output of the layer. We use this score to prune parts of individual layers based on redundancy in such a way that the average pruned percentage for all layers is a fixed value. We conducted extensive experiments using models like Llama3-8B and Mistral-7B on multiple datasets, evaluating different slicing bases and percentages to determine optimal configurations that balance efficiency and performance. Our findings show that our dynamic slicing approach not only maintains but, in many cases, enhances model performance compared to the baseline established by constant slicing methods. For instance, in several settings, we see performance improvements of up to 5% over the SliceGPT baseline. Additionally, a perplexity decrease by as much as 7% was observed across multiple benchmarks, validating the effectiveness of our method. The code, model weights, and datasets are open-sourced at https://github.com/RazvanDu/DynamicSlicing., Comment: Accepted at EMNLP Findings 2024
- Published
- 2024
6. Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages
- Author
-
Nguyen, Hoang, Mahajan, Khyati, Yadav, Vikas, Yu, Philip S., Hashemi, Masoud, and Maheshwary, Rishabh
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Multilingual LLMs have achieved remarkable benchmark performance, but we find they continue to underperform on non-Latin script languages across contemporary LLM families. This discrepancy arises from the fact that LLMs are pretrained with orthographic scripts, which are dominated by Latin characters that obscure their shared phonology with non-Latin scripts. We propose leveraging phonemic transcriptions as complementary signals to induce script-invariant representations. Our study demonstrates that integrating phonemic signals improves performance across both non-Latin and Latin languages, with a particularly significant impact on closing the performance gap between the two. Through detailed experiments, we show that phonemic and orthographic scripts retrieve distinct examples for in-context learning (ICL). This motivates our proposed Mixed-ICL retrieval strategy, where further aggregation leads to our significant performance improvements for both Latin script languages (up to 12.6%) and non-Latin script languages (up to 15.1%) compared to randomized ICL retrieval.
- Published
- 2024
7. Diffusion Models as Cartoonists! The Curious Case of High Density Regions
- Author
-
Karczewski, Rafał, Heinonen, Markus, and Garg, Vikas
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-probability sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost.
- Published
- 2024
8. On modeling fracture of soft polymers
- Author
-
Konale, Aditya and Srivastava, Vikas
- Subjects
Condensed Matter - Soft Condensed Matter - Abstract
Soft polymers are ubiquitous materials found in nature and as engineering materials with properties varying from rate-independent to significantly rate-dependent depending on the crosslinking mechanisms. Current fracture toughness measures such as energy release rate are non-unique for rate-dependent soft materials for varying loading profiles and specimen geometries. Works on modeling fracture in rate-dependent soft polymers are limited to crack tip stress field analyses or crack tip driving force approaches for specific pre-cracked geometries. We have developed a model to predict damage initiation and growth in soft polymers based on a generalized multi-mechanism gradient damage framework. We propose and show that a critical value of stress work $\mathit{W}_{cr}$ can uniquely quantify the total energy dissipation per unit referential volume associated with the complete failure of a material point under a loading mode. $\mathit{W}_{cr}$ can be evaluated using homogeneous deformation experiments and without a constitutive model. $\mathit{W}_{cr}$ is demonstrated to be approximately constant with strain rate for two rate-dependent soft polymers and different loading modes for an elastomer. We propose the energetic contribution to $\mathit{W}_{cr}$ as a suitable damage initiation criterion. The proposed initiation criterion in the damage model enabled successful predictions of fracture in an important rate-stiffening soft polymer Polyborosiloxane in a variety of experiments involving different specimen geometries and loading conditions. The model also provides a consistent energy density estimate for fracture-associated microstructural processes in Polyborosiloxane. The broader applicability of the fracture model is shown by its ability to predict fracture in an elastomer (EPDM) and another viscous soft polymer (EPS25 vitrimer)., Comment: A version of this work has been submitted for peer review
- Published
- 2024
9. Diffusion Twigs with Loop Guidance for Conditional Graph Generation
- Author
-
Mercatali, Giangiacomo, Verma, Yogesh, Freitas, Andre, and Garg, Vikas
- Subjects
Computer Science - Machine Learning - Abstract
We introduce a novel score-based diffusion framework named Twigs that incorporates multiple co-evolving flows for enriching conditional generation tasks. Specifically, a central or trunk diffusion process is associated with a primary variable (e.g., graph structure), and additional offshoot or stem processes are dedicated to dependent variables (e.g., graph properties or labels). A new strategy, which we call loop guidance, effectively orchestrates the flow of information between the trunk and the stem processes during sampling. This approach allows us to uncover intricate interactions and dependencies, and unlock new generative capabilities. We provide extensive experiments to demonstrate strong performance gains of the proposed method over contemporary baselines in the context of conditional graph generation, underscoring the potential of Twigs in challenging generative tasks such as inverse molecular design and molecular optimization., Comment: NeurIPS 2024. Code is available at https://github.com/Aalto-QuML/Diffusion_twigs
- Published
- 2024
10. Novel Clinical-Grade Prostate Cancer Detection and Grading Model: Development and Prospective Validation Using Real World Data, with Performance Assessment on IHC Requested Cases
- Author
-
Nateghi, Ramin, Zhou, Ruoji, Saft, Madeline, Schnauss, Marina, Neill, Clayton, Alam, Ridwan, Handa, Nicole, Huang, Mitchell, Li, Eric V, Goldstein, Jeffery A, Schaeffer, Edward M, Nadim, Menatalla, Pourakpour, Fattaneh, Isaila, Bogdan, Felicelli, Christopher, Mehta, Vikas, Nezami, Behtash G, Ross, Ashley, Yang, Ximing, and Cooper, Lee AD
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Artificial intelligence may assist healthcare systems in meeting increasing demand for pathology services while maintaining diagnostic quality and reducing turnaround time and costs. We aimed to investigate the performance of an institutionally developed system for prostate cancer detection, grading, and workflow optimization and to contrast this with commercial alternatives. From August 2021 to March 2023, we scanned 21,396 slides from 1,147 patients with positive biopsies. We developed models for cancer detection, grading, and screening of equivocal cases for IHC ordering. We compared a task-specific model trained using the PANDA dataset of prostate cancer biopsies with one built using features extracted by the general-purpose histology foundation model, UNI and compare their performance in an unfiltered prospectively collected dataset that reflects our patient population (1737 slides,95 patients). We evaluated the contributions of a bespoke model designed to improve sensitivity in detecting small cancer foci and scoring of broader patterns observed at lower resolution. We found high concordance between the developed systems and pathologist reference in detection (AUC 98.5, sensitivity 95.0, and specificity 97.8), ISUP grading (quadratic Cohen's kappa 0.869), grade group 3 or higher (AUC 97.5, sensitivity 94.9, specificity 96.6) and comparable to published data from commercial systems. Screening could reduce IHC ordering for equivocal cases by 44.5% with an overall error rate of 1.8% (1.4% false positive, 0.4% false negative rates). Institutions like academic medical centers that have high scanning volumes and report abstraction capabilities can develop accurate computational pathology models for internal use. These models have the potential to aid in quality control role and to improve workflow in the pathology lab to help meet future challenges in prostate cancer diagnosis.
- Published
- 2024
11. Coherence Preserving Leakage Detection and Cooling in Alkaline Earth Atoms
- Author
-
Omanakuttan, Sivaprasad, Buchemmavari, Vikas, Martin, Michael J., and Deutsch, Ivan H
- Subjects
Quantum Physics ,Physics - Atomic Physics - Abstract
Optically trapped atoms in arrays of optical tweezers have emerged as a powerful platform for quantum information processing given the recent demonstrations of high-fidelity quantum logic gates and on-demand reconfigurable geometry. Both in gate operations and atomic transport, additional errors will occur due to leakage out of the computation space, atomic motional heating, or loss of an atom out of a trap completely. In this work, we address these error channels in a unified manner through laser fluorescence that can detect and cool the atom without disturbing the quantum information encoded therein. As only the electrons in the atom couple directly to the laser field, such quantum nondemolition (QND) processes are made possible by encoding quantum information in the nuclear spin of alkaline earth-like atoms and avoiding the effects of the hyperfine interaction which couples it to the electrons. By detuning a fluorescence laser off-resonantly from the $\mathrm{^1S_0} \rightarrow \mathrm{^1P_1}$ transition, far compared to the (small) hyperfine spitting, optical pumping between nuclear states falls off rapidly with detuning, scaling as $~1/\Delta^4$. In contrast, Rayleigh scattering falls off as $~1/\Delta^2$. We also consider a resonant leakage detection protocol off the $^1\mathrm{P}_1$ line. This is achieved by disabling the hyperfine coupling via a strong AC stark effect and canceling the residual lightshifts via dressing. The same scheme can be used to recool the atoms towards the vibrational ground state for the quantum information encoded in the ground state of alkaline earth atoms while preserving the coherence. These advances could significantly improve the prospect of neutral atoms for fault-tolerant quantum computation., Comment: Comments and suggestions are Welcome!
- Published
- 2024
12. LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
- Author
-
Shen, Xiaoqian, Xiong, Yunyang, Zhao, Changsheng, Wu, Lemeng, Chen, Jun, Zhu, Chenchen, Liu, Zechun, Xiao, Fanyi, Varadarajan, Balakrishnan, Bordes, Florian, Liu, Zhuang, Xu, Hu, Kim, Hyunwoo J., Soran, Bilge, Krishnamoorthi, Raghuraman, Elhoseiny, Mohamed, and Chandra, Vikas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal Large Language Models (MLLMs) have shown promising progress in understanding and analyzing video content. However, processing long videos remains a significant challenge constrained by LLM's context size. To address this limitation, we propose LongVU, a spatiotemporal adaptive compression mechanism thats reduces the number of video tokens while preserving visual details of long videos. Our idea is based on leveraging cross-modal query and inter-frame dependencies to adaptively reduce temporal and spatial redundancy in videos. Specifically, we leverage DINOv2 features to remove redundant frames that exhibit high similarity. Then we utilize text-guided cross-modal query for selective frame feature reduction. Further, we perform spatial token reduction across frames based on their temporal dependencies. Our adaptive compression strategy effectively processes a large number of frames with little visual information loss within given context length. Our LongVU consistently surpass existing methods across a variety of video understanding benchmarks, especially on hour-long video understanding tasks such as VideoMME and MLVU. Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance., Comment: Project page: https://vision-cair.github.io/LongVU
- Published
- 2024
13. Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
- Author
-
Brokman, Jonathan, Hofman, Omer, Rachmil, Oren, Singh, Inderjeet, Pahuja, Vikas, Priya, Rathina Sabapathy Aishvariya, Giloni, Amit, Vainshtein, Roman, and Kojima, Hisashi
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
This report presents a comparative analysis of open-source vulnerability scanners for conversational large language models (LLMs). As LLMs become integral to various applications, they also present potential attack surfaces, exposed to security risks such as information leakage and jailbreak attacks. Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features and practical use of these scanners, outline unifying principles of their design and perform quantitative evaluations to compare them. These evaluations uncover significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development. Additionally, we contribute a preliminary labelled dataset, which serves as an initial step to bridge this gap. Based on the above, we provide strategic recommendations to assist organizations choose the most suitable scanner for their red-teaming needs, accounting for customizability, test suite comprehensiveness, and industry-specific use cases., Comment: 15 pages, 11 figures
- Published
- 2024
14. Hotel Booking Cancellation Prediction Using Applied Bayesian Models
- Author
-
Jishan, Md Asifuzzaman, Singh, Vikas, Ghosh, Ayan Kumar, Alam, Md Shahabub, Mahmud, Khan Raqib, and Paul, Bijan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
This study applies Bayesian models to predict hotel booking cancellations, a key challenge affecting resource allocation, revenue, and customer satisfaction in the hospitality industry. Using a Kaggle dataset with 36,285 observations and 17 features, Bayesian Logistic Regression and Beta-Binomial models were implemented. The logistic model, applied to 12 features and 5,000 randomly selected observations, outperformed the Beta-Binomial model in predictive accuracy. Key predictors included the number of adults, children, stay duration, lead time, car parking space, room type, and special requests. Model evaluation using Leave-One-Out Cross-Validation (LOO-CV) confirmed strong alignment between observed and predicted outcomes, demonstrating the model's robustness. Special requests and parking availability were found to be the strongest predictors of cancellation. This Bayesian approach provides a valuable tool for improving booking management and operational efficiency in the hotel industry.
- Published
- 2024
15. Detectability of Supernova Remnants with the Southern Wide-field Gamma-ray Observatory
- Author
-
Scharrer, Nick, Spencer, Samuel T., Joshi, Vikas, and Mitchell, Alison M. W.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
Supernova remnants (SNRs) remain prime candidates for hadronic particle acceleration within our galaxy, accounting for much of the Cosmic Ray flux. Next-generation instruments such as the Southern Wide-field Gamma-ray Observatory (SWGO) will be of crucial importance in identifying new candidate SNRs. SWGO will observe two-thirds of the gamma-ray sky, covering the energy range between a few hundreds GeV and a PeV. In this work, we apply a model of SNR evolution to predict their gamma-ray spectra. Furthermore, we use our model in combination with the target SWGO sensitivity range to explore the SNR emission phase space and quantify detection prospects for SWGO. Finally, we validate our model for sources observed with current-generation instruments, fitting it using a Monte-Carlo Markov Chain technique to the observed gamma-ray emission from four SNRs. We anticipate that at least 8 SNRs will be detected by SWGO within 1 year., Comment: 24 pages, 16 figures. Submitted to JCAP
- Published
- 2024
16. Prior Information-Aided ADMM for Multi-User Detection in Codebook-Based Grant-Free NOMA: Dynamic Scenarios
- Author
-
Vikas, Vinjamoori, Deka, Kuntal, and Rajesh, A.
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Code-domain non-orthogonal multiple access (CD-NOMA) systems offer key benefits such as high spectral efficiency, low latency, high reliability, and massive connectivity. NOMA's ability to handle overloading allows multiple devices to share a single resource element (RE) for data transmission. In CD-NOMA, different users are assigned distinct codewords, which are leveraged during multi-user detection (MUD). Codebook-based NOMA systems outperform spread-sequence (SS)-based NOMA due to the coding gains provided by the codebooks. Sparse code multiple access (SCMA) and dense code multiple access (DCMA) are two prominent examples of such systems. Additionally, NOMA is seen as a crucial technology for enabling grant-free access, especially in massive machine-type communications (mMTC). One of the main challenges in deploying grant-free NOMA systems is accurately detecting both user activity and transmitted data, particularly when user activity fluctuates dynamically across the transmission frame. This paper introduces codebook-based grant-free NOMA systems modeled using a block sparsity signal structure. The joint activity and data detection (JADD) problem in these systems is formulated as group LASSO and sparse group LASSO block compressive sensing (BCS) problems. To address these, a robust prior information-aided alternating direction method of multipliers (ADMM) algorithm is proposed. Extensive numerical experiments and theoretical analysis show the efficiency of the proposed algorithm, making it a suitable solution for mMTC networks.
- Published
- 2024
17. Signature of Vertical Mixing in Hydrogen-dominated Exoplanet Atmospheres
- Author
-
Soni, Vikas and Acharyya, Kinsuk
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
Vertical mixing is a crucial disequilibrium process in exoplanet atmospheres, significantly impacting chemical abundance and observed spectra. While current state-of-the-art observations have detected its signatures, the effect of vertical mixing on atmospheric spectra varies widely based on planetary parameters. In this study, we explore the influence of disequilibrium chemistry across a parameter space that includes eddy diffusion, surface gravity, internal and equilibrium temperature, and metallicity. We also assess the effectiveness of retrieval models in constraining the eddy diffusion coefficient. By running numerous 1D chemical kinetics models, we investigate the impact of vertical mixing on the transmission spectrum. We also built a custom fast-forward disequilibrium model, which includes vertical mixing using the quenching approximation and calculates the model abundance orders of magnitude faster than the chemical kinetics model. We coupled this forward model with an open source atmospheric retrieval code and used it on the JWST simulated output data of our chemical kinetics model and retrieved eddy diffusion coefficient, internal temperature and atmospheric metallicity. We find that there is a narrow region in the parameters space in which vertical mixing has a large effect on the atmospheric transmission spectrum. In this region of the parameter space, the retrieval model can put high constraints on the transport strength and provide optimal exoplanets to study vertical mixing. Also, the NH3 abundance can be used to constrain the internal temperature for equilibrium temperature T_equi > 1400 K., Comment: 28 pages, 12 figures, 5 table, accepted for publication in the Astrophysical Journal
- Published
- 2024
18. Agent-as-a-Judge: Evaluate Agents with Agents
- Author
-
Zhuge, Mingchen, Zhao, Changsheng, Ashley, Dylan, Wang, Wenyi, Khizbullin, Dmitrii, Xiong, Yunyang, Liu, Zechun, Chang, Ernie, Krishnamoorthi, Raghuraman, Tian, Yuandong, Shi, Yangyang, Chandra, Vikas, and Schmidhuber, Jürgen
- Subjects
Computer Science - Artificial Intelligence - Abstract
Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour. To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We apply the Agent-as-a-Judge to the task of code generation. To overcome issues with existing benchmarks and provide a proof-of-concept testbed for Agent-as-a-Judge, we present DevAI, a new benchmark of 55 realistic automated AI development tasks. It includes rich manual annotations, like a total of 365 hierarchical user requirements. We benchmark three of the popular agentic systems using Agent-as-a-Judge and find it dramatically outperforms LLM-as-a-Judge and is as reliable as our human evaluation baseline. Altogether, we believe that Agent-as-a-Judge marks a concrete step forward for modern agentic systems -- by providing rich and reliable reward signals necessary for dynamic and scalable self-improvement., Comment: The project can be found at https://github.com/metauto-ai/agent-as-a-judge. The dataset is released at https://huggingface.co/DEVAI-benchmark
- Published
- 2024
19. Study of $\beta^+$/EC-decay properties of $sd$ shell nuclei using nuclear shell model
- Author
-
Surender, Kumar, Vikas, and Srivastava, Praveen C.
- Subjects
Nuclear Theory - Abstract
Our study employs the nuclear shell model to systematically compute the half-lives of $\beta$ -decay for nuclei in the mass range of $A = 18-39$, encompassing the majority of $sd$ shell nuclei. This analysis utilizes the USDB and SDNN Hamiltonians. The theoretical outcomes contain calculations of various parameters such as $Q$ -values, half-lives, excitation energy, log$ft$ values, and branching ratios. We explore these results with axial-vector coupling constant for weak interactions, denoted as $g_A$$(= 1.27)$, and $\kappa$ value $(= 6289)$. We perform calculations of Gamow Teller matrix elements for 116 decay processes to calculate the quenching factor; we found a quenching factor of $q = 0.794\pm0.05 $ for the USDB interaction and $q = 0.815\pm0.04 $ for the SDNN interaction. We have also calculated superallowed transitions $0^+ \rightarrow 0^+$ for seven nuclei. Further, we have also included the electron capture phase space factor for the required nuclei to calculate the half-lives. This inclusion leads to small contribution in results, particularly for nuclei where electron capture (EC) plays a significant role. The overall results are in agreement with the experimental data., Comment: 16 pages, 4 figures
- Published
- 2024
- Full Text
- View/download PDF
20. Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing
- Author
-
Erbas, Ismail, Amarnath, Aporva, Pandey, Vikas, Swaminathan, Karthik, Wang, Naigang, and Intes, Xavier
- Subjects
Physics - Optics ,Computer Science - Artificial Intelligence ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Fluorescence lifetime imaging (FLI) is a widely used technique in the biomedical field for measuring the decay times of fluorescent molecules, providing insights into metabolic states, protein interactions, and ligand-receptor bindings. However, its broader application in fast biological processes, such as dynamic activity monitoring, and clinical use, such as in guided surgery, is limited by long data acquisition times and computationally demanding data processing. While deep learning has reduced post-processing times, time-resolved data acquisition remains a bottleneck for real-time applications. To address this, we propose a method to achieve real-time FLI using an FPGA-based hardware accelerator. Specifically, we implemented a GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with time-resolved cameras. The GRU model balances accurate processing with the resource constraints of FPGAs, which have limited DSP units and BRAM. The limited memory and computational resources on the FPGA require efficient scheduling of operations and memory allocation to deploy deep learning models for low-latency applications. We address these challenges by using STOMP, a queue-based discrete-event simulator that automates and optimizes task scheduling and memory management on hardware. By integrating a GRU-based Seq2Seq model and its compressed version, called Seq2SeqLite, generated through knowledge distillation, we were able to process multiple pixels in parallel, reducing latency compared to sequential processing. We explore various levels of parallelism to achieve an optimal balance between performance and resource utilization. Our results indicate that the proposed techniques achieved a 17.7x and 52.0x speedup over manual scheduling for the Seq2Seq model and the Seq2SeqLite model, respectively., Comment: 7 pages, 6 figures
- Published
- 2024
21. On Instruction-Finetuning Neural Machine Translation Models
- Author
-
Raunak, Vikas, Grundkiewicz, Roman, and Junczys-Dowmunt, Marcin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models. Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translation-specific tasks. We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions. We also show that through instruction finetuning, traditionally disparate tasks such as formality-controlled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo. To the best of our knowledge, our work is among the first to demonstrate the instruction-following capabilities of traditional NMT models, which allows for faster, cheaper and more efficient serving of customized translations., Comment: WMT'24
- Published
- 2024
22. EgoQR: Efficient QR Code Reading in Egocentric Settings
- Author
-
Moslehpour, Mohsen, Lu, Yichao, Chuang, Pierce, Shenoy, Ashish, Chatterjee, Debojeet, Harpale, Abhay, Jayakumar, Srihari, Bhardwaj, Vikas, Nam, Seonghyeon, and Kumar, Anuj
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
QR codes have become ubiquitous in daily life, enabling rapid information exchange. With the increasing adoption of smart wearable devices, there is a need for efficient, and friction-less QR code reading capabilities from Egocentric point-of-views. However, adapting existing phone-based QR code readers to egocentric images poses significant challenges. Code reading from egocentric images bring unique challenges such as wide field-of-view, code distortion and lack of visual feedback as compared to phones where users can adjust the position and framing. Furthermore, wearable devices impose constraints on resources like compute, power and memory. To address these challenges, we present EgoQR, a novel system for reading QR codes from egocentric images, and is well suited for deployment on wearable devices. Our approach consists of two primary components: detection and decoding, designed to operate on high-resolution images on the device with minimal power consumption and added latency. The detection component efficiently locates potential QR codes within the image, while our enhanced decoding component extracts and interprets the encoded information. We incorporate innovative techniques to handle the specific challenges of egocentric imagery, such as varying perspectives, wider field of view, and motion blur. We evaluate our approach on a dataset of egocentric images, demonstrating 34% improvement in reading the code compared to an existing state of the art QR code readers., Comment: Submitted to ICLR 2025
- Published
- 2024
23. Scaling Parameter-Constrained Language Models with Quality Data
- Author
-
Chang, Ernie, Paltenghi, Matteo, Li, Yang, Lin, Pin-Jie, Zhao, Changsheng, Huber, Patrick, Liu, Zechun, Rabatin, Rastislav, Shi, Yangyang, and Chandra, Vikas
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality., Comment: Accepted to EMNLP 2024 Industry Track, 18 pages, 9 figures, 4 tables
- Published
- 2024
24. Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging
- Author
-
Erbas, Ismail, Pandey, Vikas, Amarnath, Aporva, Wang, Naigang, Swaminathan, Karthik, Radev, Stefan T., and Intes, Xavier
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Machine Learning ,Quantitative Biology - Quantitative Methods - Abstract
Fluorescence lifetime imaging (FLI) is an important technique for studying cellular environments and molecular interactions, but its real-time application is limited by slow data acquisition, which requires capturing large time-resolved images and complex post-processing using iterative fitting algorithms. Deep learning (DL) models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. Model compression is thus crucial for practical deployment for real-time inference generation. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards. We perform an empirical evaluation of various compression techniques, including weight reduction, knowledge distillation (KD), post-training quantization (PTQ), and quantization-aware training (QAT), to reduce model size and computational load while preserving inference accuracy. Our compressed RNN model, Seq2SeqLite, achieves a balance between computational efficiency and prediction accuracy, particularly at 8-bit precision. By applying KD, the model parameter size was reduced by 98\% while retaining performance, making it suitable for concurrent real-time FLI analysis on FPGA during data capture. This work represents a big step towards integrating hardware-accelerated real-time FLI analysis for fast biological processes., Comment: 8 pages, 2 figures
- Published
- 2024
25. ACE: Abstractions for Communicating Efficiently
- Author
-
Thomas, Jonathan D., Silvi, Andrea, Dubhashi, Devdatt, Garg, Vikas, and Johansson, Moa
- Subjects
Computer Science - Computation and Language - Abstract
A central but unresolved aspect of problem-solving in AI is the capability to introduce and use abstractions, something humans excel at. Work in cognitive science has demonstrated that humans tend towards higher levels of abstraction when engaged in collaborative task-oriented communication, enabling gradually shorter and more information-efficient utterances. Several computational methods have attempted to replicate this phenomenon, but all make unrealistic simplifying assumptions about how abstractions are introduced and learned. Our method, Abstractions for Communicating Efficiently (ACE), overcomes these limitations through a neuro-symbolic approach. On the symbolic side, we draw on work from library learning for proposing abstractions. We combine this with neural methods for communication and reinforcement learning, via a novel use of bandit algorithms for controlling the exploration and exploitation trade-off in introducing new abstractions. ACE exhibits similar tendencies to humans on a collaborative construction task from the cognitive science literature, where one agent (the architect) instructs the other (the builder) to reconstruct a scene of block-buildings. ACE results in the emergence of an efficient language as a by-product of collaborative communication. Beyond providing mechanistic insights into human communication, our work serves as a first step to providing conversational agents with the ability for human-like communicative abstractions., Comment: 9 pages, 9 figures
- Published
- 2024
26. Learning-Based Image Compression for Machines
- Author
-
Gupta, Kartik, Faria, Kimberley, and Mehta, Vikas
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
While learning based compression techniques for images have outperformed traditional methods, they have not been widely adopted in machine learning pipelines. This is largely due to lack of standardization and lack of retention of salient features needed for such tasks. Decompression of images have taken a back seat in recent years while the focus has shifted to an image's utility in performing machine learning based analysis on top of them. Thus the demand for compression pipelines that incorporate such features from images has become ever present. The methods outlined in the report build on the recent work done on learning based image compression techniques to incorporate downstream tasks in them. We propose various methods of finetuning and enhancing different parts of pretrained compression encoding pipeline and present the results of our investigation regarding the performance of vision tasks using compression based pipelines.
- Published
- 2024
27. Chasing the Shadows: TTPs in Action to Attribute Advanced Persistent Threats
- Author
-
Rani, Nanda, Saha, Bikash, Maurya, Vikas, and Shukla, Sandeep Kumar
- Subjects
Computer Science - Cryptography and Security - Abstract
The current state of Advanced Persistent Threats (APT) attribution primarily relies on time-consuming manual processes. These include mapping incident artifacts onto threat attribution frameworks and employing expert reasoning to uncover the most likely responsible APT groups. This research aims to assist the threat analyst in the attribution process by presenting an attribution method named CAPTAIN (Comprehensive Advanced Persistent Threat AttrIbutioN). This novel APT attribution approach leverages the Tactics, Techniques, and Procedures (TTPs) employed by various APT groups in past attacks. CAPTAIN follows two significant development steps: baseline establishment and similarity measure for attack pattern matching. This method starts by maintaining a TTP database of APTs seen in past attacks as baseline behaviour of threat groups. The attribution process leverages the contextual information added by TTP sequences, which reflects the sequence of behaviours threat actors demonstrated during the attack on different kill-chain stages. Then, it compares the provided TTPs with established baseline to identify the most closely matching threat group. CAPTAIN introduces a novel similarity measure for APT group attack-pattern matching that calculates the similarity between TTP sequences. The proposed approach outperforms traditional similarity measures like Cosine, Euclidean, and Longest Common Subsequence (LCS) in performing attribution. Overall, CAPTAIN performs attribution with the precision of 61.36% (top-1) and 69.98% (top-2), surpassing the existing state-of-the-art attribution methods., Comment: Under Review
- Published
- 2024
28. Target-Aware Language Modeling via Granular Data Sampling
- Author
-
Chang, Ernie, Lin, Pin-Jie, Li, Yang, Zhao, Changsheng, Kim, Daeil, Rabatin, Rastislav, Liu, Zechun, Shi, Yangyang, and Chandra, Vikas
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. However, there are instances where we desire a model that excels in specific areas without markedly compromising performance in other areas. A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining data for domain-specific use cases. In this work, we revisit importance sampling with n-gram features consisting of multi-granular tokens, which strikes a good balance between sentence compression and representation capabilities. We observed the sampled data to have a high correlation with the target downstream task performance while preserving its effectiveness on other tasks. This leads to the proposed data sampling paradigm where language models can be pretrained more efficiently on selected documents. On eight benchmarks we demonstrate with $\sim$1% of the data, pretrained models perform on par with the full RefinedWeb data and outperform randomly selected samples for model sizes ranging from 125M to 1.5B., Comment: Accepted to EMNLP 2024 Main Conference, 9 pages, 6 figures, 3 tables
- Published
- 2024
29. Robust Single-Photon Generation for Quantum Information Enabled by Stimulated Adiabatic Rapid Passage
- Author
-
Karli, Yusuf, Schwarz, René, Kappe, Florian, Vajner, Daniel A., Krämer, Ria G., Bracht, Thomas K., da Silva, Saimon F. Covre, Richter, Daniel, Nolte, Stefan, Rastelli, Armando, Reiter, Doris E., Weihs, Gregor, Heindel, Tobias, and Remesh, Vikas
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Optics - Abstract
The generation of single photons using solid-state quantum emitters is pivotal for advancing photonic quantum technologies, particularly in quantum communication. As the field continuously advances towards practical use cases and beyond shielded laboratory environments, specific demands are placed on the robustness of quantum light sources during operation. In this context, the robustness of the quantum light generation process against intrinsic and extrinsic effects is a major challenge. Here, we present a robust scheme for the coherent generation of indistinguishable single-photon states with very low photon number coherence (PNC) using a three-level system in a semiconductor quantum dot. Our novel approach combines the advantages of adiabatic rapid passage (ARP) and stimulated two-photon excitation (sTPE). We demonstrate robust quantum light generation while maintaining the prime quantum-optical quality of the emitted light state. Moreover, we highlight the immediate advantages for the implementation of various quantum cryptographic protocols., Comment: 12 pages, 6 figures
- Published
- 2024
30. Primordial Stochastic Gravitational Wave Backgrounds from a Sharp Feature in Three-field Inflation II: The Inflationary Era
- Author
-
Aragam, Vikas, Paban, Sonia, and Rosati, Robert
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
We study the contribution of large scalar perturbations sourced by a sharp feature during cosmic inflation to the stochastic gravitational wave background (SGWB), extending our previous work to include the SGWB sourced during the inflationary era. We focus in particular on three-field inflation, since the third dynamical field is the first not privileged by the perturbations' equations of motion and allows a more direct generalization to $N$-field inflation. For the first time, we study the three-field isocurvature perturbations sourced during the feature and include the effects of isocurvature masses. In addition to a two-field limit, we find that the third field's dynamics during the feature can source large isocurvature transients which then later decay, leaving an inflationary-era-sourced SGWB as their only observable signature. We find that the inflationary-era signal shape near the peak is largely independent of the number of dynamical fields and has a greatly enhanced amplitude sourced by the large isocurvature transient, suppressing the radiation-era contribution and opening a new window of detectable parameter space with small adiabatic enhancement. The largest enhancements we study could easily violate backreaction constraints, but much of parameter space remains under perturbative control. These SGWBs could be visible in LISA and other gravitational wave experiments, leaving an almost universal signature of sharp features during multi-field inflation, even when the sourcing isocurvature decays to unobservability shortly afterwards., Comment: 22 pages, 5 figures. Supplementary code available at https://github.com/rjrosati/3field-sharp-feature . v2: improved citations
- Published
- 2024
31. Advanced Gaze Analytics Dashboard
- Author
-
Jayawardena, Gavindya, Ashok, Vikas, and Jayarathna, Sampath
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Eye movements can provide informative cues to understand human visual scan/search behavior and cognitive load during varying tasks. Visualizations of real-time gaze measures during tasks, provide an understanding of human behavior as the experiment is being conducted. Even though existing eye tracking analysis tools provide calculation and visualization of eye-tracking data, none of them support real-time visualizations of advanced gaze measures, such as ambient or focal processing, or eye-tracked measures of cognitive load. In this paper, we present an eye movements analytics dashboard that enables visualizations of various gaze measures, fixations, saccades, cognitive load, ambient-focal attention, and gaze transitions analysis by extracting eye movements from participants utilizing common off-the-shelf eye trackers. We validate the proposed eye movement visualizations by using two publicly available eye-tracking datasets. We showcase that, the proposed dashboard could be utilized to visualize advanced eye movement measures generated using multiple data sources.
- Published
- 2024
32. High-yield large-scale suspended graphene membranes over closed cavities for sensor applications
- Author
-
Lukas, Sebastian, Esteki, Ardeshir, Rademacher, Nico, Jangra, Vikas, Gross, Michael, Wang, Zhenxing, Ngo, Ha Duong, Bäuscher, Manuel, Mackowiak, Piotr, Höppner, Katrin, Wehenkel, Dominique, van Rijn, Richard, and Lemme, Max C.
- Subjects
Physics - Applied Physics - Abstract
Suspended membranes of monoatomic graphene exhibit great potential for applications in electronic and nanoelectromechanical devices. In this work, a "hot and dry" transfer process is demonstrated to address the fabrication and patterning challenges of large-area graphene membranes on top of closed, sealed cavities. Here, "hot" refers to the use of high temperature during transfer, promoting the adhesion. Additionally, "dry" refers to the absence of liquids when graphene and target substrate are brought into contact. The method leads to higher yields of intact suspended monolayer CVD graphene and artificially stacked double-layer CVD graphene membranes than previously reported. The yield evaluation is performed using neural-network-based object detection in SEM images, ascertaining high yields of intact membranes with large statistical accuracy. The suspended membranes are examined by Raman tomography and AFM. The method is verified by applying the suspended graphene devices as piezoresistive pressure sensors. Our technology advances the application of suspended graphene membranes and can be extended to other two-dimensional (2D) materials., Comment: 30 pages of manuscript plus 17 pages of Supporting Information
- Published
- 2024
33. Imagen 3
- Author
-
Imagen-Team-Google, Baldridge, Jason, Bauer, Jakob, Bhutani, Mukul, Brichtova, Nicole, Bunner, Andrew, Chan, Kelvin, Chen, Yichang, Dieleman, Sander, Du, Yuqing, Eaton-Rosen, Zach, Fei, Hongliang, de Freitas, Nando, Gao, Yilin, Gladchenko, Evgeny, Colmenarejo, Sergio Gómez, Guo, Mandy, Haig, Alex, Hawkins, Will, Hu, Hexiang, Huang, Huilian, Igwe, Tobenna Peter, Kaplanis, Christos, Khodadadeh, Siavash, Kim, Yelin, Konyushkova, Ksenia, Langner, Karol, Lau, Eric, Luo, Shixin, Mokrá, Soňa, Nandwani, Henna, Onoe, Yasumasa, Oord, Aäron van den, Parekh, Zarana, Pont-Tuset, Jordi, Qi, Hang, Qian, Rui, Ramachandran, Deepak, Rane, Poorva, Rashwan, Abdullah, Razavi, Ali, Riachi, Robert, Srinivasan, Hansa, Srinivasan, Srivatsan, Strudel, Robin, Uria, Benigno, Wang, Oliver, Wang, Su, Waters, Austin, Wolff, Chris, Wright, Auriel, Xiao, Zhisheng, Xiong, Hao, Xu, Keyang, van Zee, Marc, Zhang, Junlin, Zhang, Katie, Zhou, Wenlei, Zolna, Konrad, Aboubakar, Ola, Akbulut, Canfer, Akerlund, Oscar, Albuquerque, Isabela, Anderson, Nina, Andreetto, Marco, Aroyo, Lora, Bariach, Ben, Barker, David, Ben, Sherry, Berman, Dana, Biles, Courtney, Blok, Irina, Botadra, Pankil, Brennan, Jenny, Brown, Karla, Buckley, John, Bunel, Rudy, Bursztein, Elie, Butterfield, Christina, Caine, Ben, Carpenter, Viral, Casagrande, Norman, Chang, Ming-Wei, Chang, Solomon, Chaudhuri, Shamik, Chen, Tony, Choi, John, Churbanau, Dmitry, Clement, Nathan, Cohen, Matan, Cole, Forrester, Dektiarev, Mikhail, Du, Vincent, Dutta, Praneet, Eccles, Tom, Elue, Ndidi, Feden, Ashley, Fruchter, Shlomi, Garcia, Frankie, Garg, Roopal, Ge, Weina, Ghazy, Ahmed, Gipson, Bryant, Goodman, Andrew, Górny, Dawid, Gowal, Sven, Gupta, Khyatti, Halpern, Yoni, Han, Yena, Hao, Susan, Hayes, Jamie, Hertz, Amir, Hirst, Ed, Hou, Tingbo, Howard, Heidi, Ibrahim, Mohamed, Ike-Njoku, Dirichi, Iljazi, Joana, Ionescu, Vlad, Isaac, William, Jana, Reena, Jennings, Gemma, Jenson, Donovon, Jia, Xuhui, Jones, Kerry, Ju, Xiaoen, Kajic, Ivana, Ayan, Burcu Karagol, Kelly, Jacob, Kothawade, Suraj, Kouridi, Christina, Ktena, Ira, Kumakaw, Jolanda, Kurniawan, Dana, Lagun, Dmitry, Lavitas, Lily, Lee, Jason, Li, Tao, Liang, Marco, Li-Calis, Maggie, Liu, Yuchi, Alberca, Javier Lopez, Lu, Peggy, Lum, Kristian, Ma, Yukun, Malik, Chase, Mellor, John, Mosseri, Inbar, Murray, Tom, Nematzadeh, Aida, Nicholas, Paul, Oliveira, João Gabriel, Ortiz-Jimenez, Guillermo, Paganini, Michela, Paine, Tom Le, Paiss, Roni, Parrish, Alicia, Peckham, Anne, Peswani, Vikas, Petrovski, Igor, Pfaff, Tobias, Pirozhenko, Alex, Poplin, Ryan, Prabhu, Utsav, Qi, Yuan, Rahtz, Matthew, Rashtchian, Cyrus, Rastogi, Charvi, Raul, Amit, Rebuffi, Sylvestre-Alvise, Ricco, Susanna, Riedel, Felix, Robinson, Dirk, Rohatgi, Pankaj, Rosgen, Bill, Rumbley, Sarah, Ryu, Moonkyung, Salgado, Anthony, Singla, Sahil, Schroff, Florian, Schumann, Candice, Shah, Tanmay, Shillingford, Brendan, Shivakumar, Kaushik, Shtatnov, Dennis, Singer, Zach, Sluzhaev, Evgeny, Sokolov, Valerii, Sottiaux, Thibault, Stimberg, Florian, Stone, Brad, Stutz, David, Su, Yu-Chuan, Tabellion, Eric, Tang, Shuai, Tao, David, Thomas, Kurt, Thornton, Gregory, Toor, Andeep, Udrescu, Cristian, Upadhyay, Aayush, Vasconcelos, Cristina, Vasiloff, Alex, Voynov, Andrey, Walker, Amanda, Wang, Luyu, Wang, Miaosen, Wang, Simon, Wang, Stanley, Wang, Qifei, Wang, Yuxiao, Weisz, Ágoston, Wiles, Olivia, Wu, Chenxia, Xu, Xingyu Federico, Xue, Andrew, Yang, Jianbo, Yu, Luo, Yurtoglu, Mete, Zand, Ali, Zhang, Han, Zhang, Jiageng, Zhao, Catherine, Zhaxybay, Adilet, Zhou, Miao, Zhu, Shengqi, Zhu, Zhenkai, Bloxwich, Dawn, Bordbar, Mahyar, Cobo, Luis C., Collins, Eli, Dai, Shengyang, Doshi, Tulsee, Dragan, Anca, Eck, Douglas, Hassabis, Demis, Hsiao, Sissie, Hume, Tom, Kavukcuoglu, Koray, King, Helen, Krawczyk, Jack, Li, Yeqing, Meier-Hellstern, Kathy, Orban, Andras, Pinsky, Yury, Subramanya, Amar, Vinyals, Oriol, Yu, Ting, and Zwols, Yori
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
- Published
- 2024
34. What Ails Generative Structure-based Drug Design: Too Little or Too Much Expressivity?
- Author
-
Karczewski, Rafał, Kaski, Samuel, Heinonen, Markus, and Garg, Vikas
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Biomolecules - Abstract
Several generative models with elaborate training and sampling procedures have been proposed recently to accelerate structure-based drug design (SBDD); however, perplexingly, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may suspect that they inherit the representational limitations of GNNs. We analyze this aspect, establishing the first such results for protein-ligand complexes. A plausible counterview may attribute the underperformance of these models to their excessive parameterizations, inducing expressivity at the expense of generalization. We also investigate this possibility with a simple metric-aware approach that learns an economical surrogate for affinity to infer an unlabelled molecular graph and optimizes for labels conditioned on this graph and molecular properties. The resulting model achieves state-of-the-art results using 100x fewer trainable parameters and affords up to 1000x speedup. Collectively, our findings underscore the need to reassess and redirect the existing paradigm and efforts for SBDD., Comment: 25 pages, 11 figures
- Published
- 2024
35. Omobot: a low-cost mobile robot for autonomous search and fall detection
- Author
-
Ahamad, Shihab Uddin, Ataei, Masoud, Devabhaktuni, Vijay, and Dhiman, Vikas
- Subjects
Computer Science - Robotics - Abstract
Detecting falls among the elderly and alerting their community responders can save countless lives. We design and develop a low-cost mobile robot that periodically searches the house for the person being monitored and sends an email to a set of designated responders if a fall is detected. In this project, we make three novel design decisions and contributions. First, our custom-designed low-cost robot has advanced features like omnidirectional wheels, the ability to run deep learning models, and autonomous wireless charging. Second, we improve the accuracy of fall detection for the YOLOv8-Pose-nano object detection network by 6% and YOLOv8-Pose-large by 12%. We do so by transforming the images captured from the robot viewpoint (camera height 0.15m from the ground) to a typical human viewpoint (1.5m above the ground) using a principally computed Homography matrix. This improves network accuracy because the training dataset MS-COCO on which YOLOv8-Pose is trained is captured from a human-height viewpoint. Lastly, we improve the robot controller by learning a model that predicts the robot velocity from the input signal to the motor controller., Comment: Accepted to IEEE AIM-2024
- Published
- 2024
36. Achieving Human Level Competitive Robot Table Tennis
- Author
-
D'Ambrosio, David B., Abeyruwan, Saminda, Graesser, Laura, Iscen, Atil, Amor, Heni Ben, Bewley, Alex, Reed, Barney J., Reymann, Krista, Takayama, Leila, Tassa, Yuval, Choromanski, Krzysztof, Coumans, Erwin, Jain, Deepali, Jaitly, Navdeep, Jaques, Natasha, Kataoka, Satoshi, Kuang, Yuheng, Lazic, Nevena, Mahjourian, Reza, Moore, Sherry, Oslund, Kenneth, Shankar, Anish, Sindhwani, Vikas, Vanhoucke, Vincent, Vesom, Grace, Xu, Peng, and Sanketi, Pannag R.
- Subjects
Computer Science - Robotics - Abstract
Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced level of proficiency. In this paper, we contribute (1) a hierarchical and modular policy architecture consisting of (i) low level controllers with their detailed skill descriptors which model the agent's capabilities and help to bridge the sim-to-real gap and (ii) a high level controller that chooses the low level skills, (2) techniques for enabling zero-shot sim-to-real including an iterative approach to defining the task distribution that is grounded in the real-world and defines an automatic curriculum, and (3) real time adaptation to unseen opponents. Policy performance was assessed through 29 robot vs. human matches of which the robot won 45% (13/29). All humans were unseen players and their skill level varied from beginner to tournament level. Whilst the robot lost all matches vs. the most advanced players it won 100% matches vs. beginners and 55% matches vs. intermediate players, demonstrating solidly amateur human-level performance. Videos of the matches can be viewed at https://sites.google.com/view/competitive-robot-table-tennis, Comment: v2, 29 pages, 19 main paper, 10 references + appendix, adding an additional 9 references
- Published
- 2024
37. Employing Vector Field Techniques on the Analysis of Memristor Cellular Nonlinear Networks Cell Dynamics
- Author
-
Singh, Chandan, Ntinas, Vasileios, Prousalis, Dimitrios, Wang, Yongmin, Demirkol, Ahmet Samil, Messaris, Ioannis, Rana, Vikas, Menzel, Stephan, Ascoli, Alon, and Tetzlaff, Ronald
- Subjects
Computer Science - Emerging Technologies - Abstract
This paper introduces an innovative graphical analysis tool for investigating the dynamics of Memristor Cellular Nonlinear Networks (M-CNNs) featuring 2nd-order processing elements, known as M-CNN cells. In the era of specialized hardware catering to the demands of intelligent autonomous systems, the integration of memristors within Cellular Nonlinear Networks (CNNs) has emerged as a promising paradigm due to their exceptional characteristics. However, the standard Dynamic Route Map (DRM) analysis, applicable to 1st-order systems, fails to address the intricacies of 2nd-order M-CNN cell dynamics, as well the 2nd-order DRM (DRM2) exhibits limitations on the graphical illustration of local dynamical properties of the M-CNN cells, e.g. state derivative's magnitude. To address this limitation, we propose a novel integration of M-CNN cell vector field into the cell's phase portrait, enhancing the analysis efficacy and enabling efficient M-CNN cell design. A comprehensive exploration of M-CNN cell dynamics is presented, showcasing the utility of the proposed graphical tool for various scenarios, including bistable and monostable behavior, and demonstrating its superior ability to reveal subtle variations in cell behavior. Through this work, we offer a refined perspective on the analysis and design of M-CNNs, paving the way for advanced applications in edge computing and specialized hardware., Comment: Presented at the 18th IEEE International Workshop on Cellular Nanoscale Networks and their Applications (CNNA'23) and the 8th Memristor and Memristive Symposium
- Published
- 2024
38. Modeling the Plasma Composition of 67P/C-G at different Heliocentric Distances
- Author
-
Ahmed, Sana and Soni, Vikas
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Physics - Space Physics - Abstract
The Rosetta spacecraft accompanied the comet 67P/C-G for nearly 2 years, collecting valuable data on the neutral and ion composition of the coma. The Rosetta Plasma Consortium (RPC) provided continuous measurements of the in situ plasma density while ROSINA-COPS monitored the neutral composition. In this work, we aim to estimate the composition of the cometary ionosphere at different heliocentric distances of the comet. Lauter et al. (2020) derived the temporal evolution of the volatile sublimation rates for 50 separated time intervals on the orbit of 67P/C-G using the COPS and DFMS data. We use these sublimation rates as inputs in a multifluid chemical-hydrodynamical model for 36 of the time intervals for heliocentric distances < 3 au. We compare the total ion densities obtained from our models with the local plasma density measured by the RPC instruments. We find that at the location of the spacecraft, our modeled ion densities match with the in situ measured plasma density within factors of 1 - 3 for many of the time intervals. We obtain the cometocentric distance variation of the ions H2O+ and H3O+ and the ion groups created respectively by the ionization and protonation of neutral species. We see that H3O+ is dominant at the spacecraft location for nearly all the time intervals while ions created due to protonation are dominant at low cometocentric distances for the intervals near perihelion. We also discuss our ion densities in the context of their detection by DFMS., Comment: Accepted for publication in Icarus
- Published
- 2024
- Full Text
- View/download PDF
39. Gemma 2: Improving Open Language Models at a Practical Size
- Author
-
Gemma Team, Riviere, Morgane, Pathak, Shreya, Sessa, Pier Giuseppe, Hardin, Cassidy, Bhupatiraju, Surya, Hussenot, Léonard, Mesnard, Thomas, Shahriari, Bobak, Ramé, Alexandre, Ferret, Johan, Liu, Peter, Tafti, Pouya, Friesen, Abe, Casbon, Michelle, Ramos, Sabela, Kumar, Ravin, Lan, Charline Le, Jerome, Sammy, Tsitsulin, Anton, Vieillard, Nino, Stanczyk, Piotr, Girgin, Sertan, Momchev, Nikola, Hoffman, Matt, Thakoor, Shantanu, Grill, Jean-Bastien, Neyshabur, Behnam, Bachem, Olivier, Walton, Alanna, Severyn, Aliaksei, Parrish, Alicia, Ahmad, Aliya, Hutchison, Allen, Abdagic, Alvin, Carl, Amanda, Shen, Amy, Brock, Andy, Coenen, Andy, Laforge, Anthony, Paterson, Antonia, Bastian, Ben, Piot, Bilal, Wu, Bo, Royal, Brandon, Chen, Charlie, Kumar, Chintu, Perry, Chris, Welty, Chris, Choquette-Choo, Christopher A., Sinopalnikov, Danila, Weinberger, David, Vijaykumar, Dimple, Rogozińska, Dominika, Herbison, Dustin, Bandy, Elisa, Wang, Emma, Noland, Eric, Moreira, Erica, Senter, Evan, Eltyshev, Evgenii, Visin, Francesco, Rasskin, Gabriel, Wei, Gary, Cameron, Glenn, Martins, Gus, Hashemi, Hadi, Klimczak-Plucińska, Hanna, Batra, Harleen, Dhand, Harsh, Nardini, Ivan, Mein, Jacinda, Zhou, Jack, Svensson, James, Stanway, Jeff, Chan, Jetha, Zhou, Jin Peng, Carrasqueira, Joana, Iljazi, Joana, Becker, Jocelyn, Fernandez, Joe, van Amersfoort, Joost, Gordon, Josh, Lipschultz, Josh, Newlan, Josh, Ji, Ju-yeong, Mohamed, Kareem, Badola, Kartikeya, Black, Kat, Millican, Katie, McDonell, Keelin, Nguyen, Kelvin, Sodhia, Kiranbir, Greene, Kish, Sjoesund, Lars Lowe, Usui, Lauren, Sifre, Laurent, Heuermann, Lena, Lago, Leticia, McNealus, Lilly, Soares, Livio Baldini, Kilpatrick, Logan, Dixon, Lucas, Martins, Luciano, Reid, Machel, Singh, Manvinder, Iverson, Mark, Görner, Martin, Velloso, Mat, Wirth, Mateo, Davidow, Matt, Miller, Matt, Rahtz, Matthew, Watson, Matthew, Risdal, Meg, Kazemi, Mehran, Moynihan, Michael, Zhang, Ming, Kahng, Minsuk, Park, Minwoo, Rahman, Mofi, Khatwani, Mohit, Dao, Natalie, Bardoliwalla, Nenshad, Devanathan, Nesh, Dumai, Neta, Chauhan, Nilay, Wahltinez, Oscar, Botarda, Pankil, Barnes, Parker, Barham, Paul, Michel, Paul, Jin, Pengchong, Georgiev, Petko, Culliton, Phil, Kuppala, Pradeep, Comanescu, Ramona, Merhej, Ramona, Jana, Reena, Rokni, Reza Ardeshir, Agarwal, Rishabh, Mullins, Ryan, Saadat, Samaneh, Carthy, Sara Mc, Cogan, Sarah, Perrin, Sarah, Arnold, Sébastien M. R., Krause, Sebastian, Dai, Shengyang, Garg, Shruti, Sheth, Shruti, Ronstrom, Sue, Chan, Susan, Jordan, Timothy, Yu, Ting, Eccles, Tom, Hennigan, Tom, Kocisky, Tomas, Doshi, Tulsee, Jain, Vihan, Yadav, Vikas, Meshram, Vilobh, Dharmadhikari, Vishal, Barkley, Warren, Wei, Wei, Ye, Wenming, Han, Woohyun, Kwon, Woosuk, Xu, Xiang, Shen, Zhe, Gong, Zhitao, Wei, Zichuan, Cotruta, Victor, Kirk, Phoebe, Rao, Anand, Giang, Minh, Peran, Ludovic, Warkentin, Tris, Collins, Eli, Barral, Joelle, Ghahramani, Zoubin, Hadsell, Raia, Sculley, D., Banks, Jeanine, Dragan, Anca, Petrov, Slav, Vinyals, Oriol, Dean, Jeff, Hassabis, Demis, Kavukcuoglu, Koray, Farabet, Clement, Buchatskaya, Elena, Borgeaud, Sebastian, Fiedel, Noah, Joulin, Armand, Kenealy, Kathleen, Dadashi, Robert, and Andreev, Alek
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
- Published
- 2024
40. Enabling Uniform Computer Interaction Experience for Blind Users through Large Language Models
- Author
-
Kodandaram, Satwik Ram, Uckun, Utku, Bi, Xiaojun, Ramakrishnan, IV, and Ashok, Vikas
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Blind individuals, who by necessity depend on screen readers to interact with computers, face considerable challenges in navigating the diverse and complex graphical user interfaces of different computer applications. The heterogeneity of various application interfaces often requires blind users to remember different keyboard combinations and navigation methods to use each application effectively. To alleviate this significant interaction burden imposed by heterogeneous application interfaces, we present Savant, a novel assistive technology powered by large language models (LLMs) that allows blind screen reader users to interact uniformly with any application interface through natural language. Novelly, Savant can automate a series of tedious screen reader actions on the control elements of the application when prompted by a natural language command from the user. These commands can be flexible in the sense that the user is not strictly required to specify the exact names of the control elements in the command. A user study evaluation of Savant with 11 blind participants demonstrated significant improvements in interaction efficiency and usability compared to current practices.
- Published
- 2024
41. Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
- Author
-
Madhusudhan, Nishanth, Madhusudhan, Sathwik Tejaswi, Yadav, Vikas, and Hashemi, Masoud
- Subjects
Computer Science - Computation and Language - Abstract
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM's capability to withhold responses when uncertain or lacking a definitive answer, without compromising performance. Although previous studies have attempted to improve AA, they lack a standardised evaluation method and remain unsuitable for black-box models where token prediction probabilities are inaccessible. This makes comparative analysis challenging, especially for state-of-the-art closed-source commercial LLMs. This paper bridges this gap by introducing a black-box evaluation approach and a new dataset, Abstain-QA, crafted to rigorously assess AA across varied question types (answerable and unanswerable), domains (well-represented and under-represented), and task types (fact centric and reasoning). We also propose a new confusion matrix, the ''Answerable-Unanswerable Confusion Matrix'' (AUCM) which serves as the basis for evaluating AA, by offering a structured and precise approach for assessment. Finally, we explore the impact of three prompting strategies-Strict Prompting, Verbal Confidence Thresholding, and Chain-of-Thought (CoT)-on improving AA. Our results indicate that even powerful models like GPT-4, Mixtral 8x22b encounter difficulties with abstention; however, strategic approaches such as Strict prompting and CoT can enhance this capability., Comment: 8 pages (excluding limitations, references and appendix) and 5 figures
- Published
- 2024
42. A new approach to delegate signing rights to proxy signers using isogeny-based cryptography
- Author
-
Dey, Kunal, Kumar, Somnath, Srivastava, Vikas, Debnath, Sumit Kumar, and Das, Ashok Kumar
- Subjects
Computer Science - Cryptography and Security - Abstract
E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme. In the context of digital signatures, the proxy signature is an important cryptographic primitive that allows the original signer to delegate signing authority to another signer (proxy signer). The proxy signature has a number of important applications in the e-government system. There are now a large amount of proxy signature schemes. The security of most of them relies on the following hard problems: the discrete logarithm problem and the factorization of integers problem. However, a large-scale quantum computer can solve them in polynomial time due to Shor's algorithm. As a consequence, there is a need for a quantum computer-resistant proxy signature to secure e-governance system from quantum adversaries. In this work, we propose the first post-quantum isogeny based proxy signature scheme CSI-PS (commutative supersingular isogeny proxy signature). Our construction is proven to be uf-cma secure under the hardness of the group action inverse problem (GAIP) based on isogeny.
- Published
- 2024
43. Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs
- Author
-
Chiang, Hao-Tien Lewis, Xu, Zhuo, Fu, Zipeng, Jacob, Mithun George, Zhang, Tingnan, Lee, Tsang-Wei Edward, Yu, Wenhao, Schenck, Connor, Rendleman, David, Shah, Dhruv, Xia, Fei, Hsu, Jasmine, Hoech, Jonathan, Florence, Pete, Kirmani, Sean, Singh, Sumeet, Sindhwani, Vikas, Parada, Carolina, Finn, Chelsea, Xu, Peng, Levine, Sergey, and Tan, Jie
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s
- Published
- 2024
44. Towards Photon-Number-Encoded High-dimensional Entanglement from a Sequentially Excited Quantum Three-Level System
- Author
-
Vajner, Daniel A., Kewitz, Nils D., von Helversen, Martin, Wein, Stephen C., Karli, Yusuf, Kappe, Florian, Remesh, Vikas, da Silva, Saimon F. Covre, Rastelli, Armando, Weihs, Gregor, Anton-Solanas, Carlos, and Heindel, Tobias
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Optics - Abstract
The sequential resonant excitation of a 2-level quantum system results in the emission of a state of light showing time-entanglement encoded in the photon-number-basis - notions that can be extended to 3-level quantum systems as discussed in a recent proposal. Here, we report the experimental implementation of a sequential two-photon resonant excitation process of a solid-state 3-level system, constituted by the biexciton-, exciton-, and ground-state of a semiconductor quantum dot. The resulting light state exhibits entanglement in time and energy, encoded in the photon-number basis, which could be used in quantum information applications, e.g., dense information encoding or quantum communication protocols. Performing energy- and time-resolved correlation experiments in combination with extensive theoretical modelling, we are able to partially retrieve the entanglement structure of the generated state., Comment: 14 pages (including 5 figures, 56 citations)
- Published
- 2024
45. Color-map recommendation for MR relaxometry maps
- Author
-
Fuderer, Miha, Wichtmann, Barbara, Crameri, Fabio, deSouza, Nandita M., Baeßler, Bettina, Gulani, Vikas, Wang, Meiyun, Poot, Dirk, de Boer, Ruud, Cashmore, Matt, de Graaf, Wolter, Keenan, Kathryn E., Ma, Dan, Pirkl, Carolin, Sollmann, Nico, Weingärtner, Sebastian, Mandija, Stefano, and Golay, Xavier
- Subjects
Physics - Medical Physics - Abstract
Purpose: To harmonize the use of color for MR relaxometry maps and therefore recommend the use of specific color-maps for representing T1 and T2 maps. Methods: Perceptually linearized color-maps were chosen to have similar color settings as those proposed by Griswold et al. in 2018. A Delphi process, polling the opinion of a panel of 81 experts, was used to generate consensus on the suitability of these maps. Results: Consensus was reached on the suitability of the logarithm-processed Lipari color-map for T1 and the logarithm-processed Navia color-map for T2. There was consensus on color bars being mandatory and on the use of a specific value indicating invalidity. There was no consensus on whether the ranges should be fixed per anatomy. Conclusion: The authors recommend the use of the logarithm-processed Lipari color map for displaying quantitative T1 maps and R1 maps; likewise, the authors recommend the logarithm-processed Navia color-map for displaying T2, T2*, R2 and R2* maps., Comment: 22 pages; embedded are 5 figures and 5 tables; contact the first author for supplementary material. Submitted to Magnetic Resonance in Medicine
- Published
- 2024
46. High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
- Author
-
Lan, Gael Le, Shi, Bowen, Ni, Zhaoheng, Srinivasan, Sidd, Kumar, Anurag, Ellis, Brian, Kant, David, Nagaraja, Varun, Chang, Ernie, Hsu, Wei-Ning, Shi, Yangyang, and Chandra, Vikas
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
We introduce MelodyFlow, an efficient text-controllable high-fidelity music generation and editing model. It operates on continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec. Based on a diffusion transformer architecture trained on a flow-matching objective the model can edit diverse high quality stereo samples of variable duration, with simple text descriptions. We adapt the ReNoise latent inversion method to flow matching and compare it with the original implementation and naive denoising diffusion implicit model (DDIM) inversion on a variety of music editing prompts. Our results indicate that our latent inversion outperforms both ReNoise and DDIM for zero-shot test-time text-guided editing on several objective metrics. Subjective evaluations exhibit a substantial improvement over previous state of the art for music editing. Code and model weights will be publicly made available. Samples are available at https://melodyflow.github.io.
- Published
- 2024
47. In-Memory Mirroring: Cloning Without Reading
- Author
-
Singh, Simranjeet, Bende, Ankit, Jha, Chandan Kumar, Rana, Vikas, Drechsler, Rolf, Patkar, Sachin, and Merchant, Farhad
- Subjects
Computer Science - Emerging Technologies - Abstract
In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which results in performance and energy penalties. To solve this fundamental problem, this paper presents in-memory mirroring (IMM). IMM eliminates the need for read and write-back steps, thus avoiding energy and performance penalties. Instead, we perform data movement within memory, involving row-wise and column-wise data transfers. Additionally, the IMM scheme enables parallel cloning of entire row (word) with a complexity of $\mathcal{O}(1)$. Moreover, our analysis of the energy consumption of the proposed technique using resistive random-access memory crossbar and experimentally validated JART VCM v1b model. The IMM increases energy efficiency and shows 2$\times$ performance improvement compared to conventional data movement methods., Comment: Accepted in IFIP/IEEE VLSI-SoC 2024
- Published
- 2024
48. Geometric Static Modeling Framework for Piecewise-Continuous Curved-Link Multi Point-of-Contact Tensegrity Robots
- Author
-
Ervin, Lauren and Vikas, Vishesh
- Subjects
Computer Science - Robotics - Abstract
Tensegrities synergistically combine tensile (cable) and rigid (link) elements to achieve structural integrity, making them lightweight, packable, and impact resistant. Consequently, they have high potential for locomotion in unstructured environments. This research presents geometric modeling of a Tensegrity eXploratory Robot (TeXploR) comprised of two semi-circular, curved links held together by 12 prestressed cables and actuated with an internal mass shifting along each link. This design allows for efficient rolling with stability (e.g., tip-over on an incline). However, the unique design poses static and dynamic modeling challenges given the discontinuous nature of the semi-circular, curved links, two changing points of contact with the surface plane, and instantaneous movement of the masses along the links. The robot is modeled using a geometric approach where the holonomic constraints confirm the experimentally observed four-state hybrid system, proving TeXploR rolls along one link while pivoting about the end of the other. It also identifies the quasi-static state transition boundaries that enable a continuous change in the robot states via internal mass shifting. This is the first time in literature a non-spherical two-point contact system is kinematically and geometrically modeled. Furthermore, the static solutions are closed-form and do not require numerical exploration of the solution. The MATLAB simulations are experimentally validated on a tetherless prototype with mean absolute error of 4.36{\deg}., Comment: This work is published on IEEE RA-L. Please refer to the published article below: https://ieeexplore.ieee.org/document/10734217 L. Ervin and V. Vikas, "Geometric Static Modeling Framework for Piecewise-Continuous Curved-Link Multi Point-of-Contact Tensegrity Robots," in IEEE Robotics and Automation Letters, vol. 9, no. 12, pp. 11066-11073, Dec. 2024, doi: 10.1109/LRA.2024.3486199
- Published
- 2024
- Full Text
- View/download PDF
49. DADEE: Well-calibrated uncertainty quantification in neural networks for barriers-based robot safety
- Author
-
Ataei, Masoud and Dhiman, Vikas
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Uncertainty-aware controllers that guarantee safety are critical for safety critical applications. Among such controllers, Control Barrier Functions (CBFs) based approaches are popular because they are fast, yet safe. However, most such works depend on Gaussian Processes (GPs) or MC-Dropout for learning and uncertainty estimation, and both approaches come with drawbacks: GPs are non-parametric methods that are slow, while MC-Dropout does not capture aleatoric uncertainty. On the other hand, modern Bayesian learning algorithms have shown promise in uncertainty quantification. The application of modern Bayesian learning methods to CBF-based controllers has not yet been studied. We aim to fill this gap by surveying uncertainty quantification algorithms and evaluating them on CBF-based safe controllers. We find that model variance-based algorithms (for example, Deep ensembles, MC-dropout, etc.) and direct estimation-based algorithms (such as DEUP) have complementary strengths. Algorithms in the former category can only estimate uncertainty accurately out-of-domain, while those in the latter category can only do so in-domain. We combine the two approaches to obtain more accurate uncertainty estimates both in- and out-of-domain. As measured by the failure rate of a simulated robot, this results in a safer CBF-based robot controller.
- Published
- 2024
50. Modeling the Real World with High-Density Visual Particle Dynamics
- Author
-
Whitney, William F., Varley, Jacob, Jain, Deepali, Choromanski, Krzysztof, Singh, Sumeet, and Sindhwani, Vikas
- Subjects
Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neighbour attention layers. We demonstrate the capabilities of HD-VPD by modeling the dynamics of high degree-of-freedom bi-manual robots with two RGB-D cameras. Compared to the previous graph neural network approach, our Interlacer dynamics is twice as fast with the same prediction quality, and can achieve higher quality using 4x as many particles. We illustrate how HD-VPD can evaluate motion plan quality with robotic box pushing and can grasping tasks. See videos and particle dynamics rendered by HD-VPD at https://sites.google.com/view/hd-vpd.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.