1,255 results on '"P Ameya"'
Search Results
2. Enhancing Diabetic Retinopathy Detection with CNN-Based Models: A Comparative Study of UNET and Stacked UNET Architectures
- Author
-
Uppina, Ameya, Krishnan, S Navaneetha, Teja, Talluri Krishna Sai, Iyer, Nikhil N, and R, Joe Dhanith P
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Diabetic Retinopathy DR is a severe complication of diabetes. Damaged or abnormal blood vessels can cause loss of vision. The need for massive screening of a large population of diabetic patients has generated an interest in a computer-aided fully automatic diagnosis of DR. In the realm of Deep learning frameworks, particularly convolutional neural networks CNNs, have shown great interest and promise in detecting DR by analyzing retinal images. However, several challenges have been faced in the application of deep learning in this domain. High-quality, annotated datasets are scarce, and the variations in image quality and class imbalances pose significant hurdles in developing a dependable model. In this paper, we demonstrate the proficiency of two Convolutional Neural Networks CNNs based models, UNET and Stacked UNET utilizing the APTOS Asia Pacific Tele-Ophthalmology Society Dataset. This system achieves an accuracy of 92.81% for the UNET and 93.32% for the stacked UNET architecture. The architecture classifies the images into five categories ranging from 0 to 4, where 0 is no DR and 4 is proliferative DR.
- Published
- 2024
3. History-Matching of Imbibition Flow in Multiscale Fractured Porous Media Using Physics-Informed Neural Networks (PINNs)
- Author
-
Abbasi, Jassem, Moseley, Ben, Kurotori, Takeshi, Jagtap, Ameya D., Kovscek, Anthony R., Hiorth, Aksel, and Andersen, Pål Østebø
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
We propose a workflow based on physics-informed neural networks (PINNs) to model multiphase fluid flow in fractured porous media. After validating the workflow in forward and inverse modeling of a synthetic problem of flow in fractured porous media, we applied it to a real experimental dataset in which brine is injected at a constant pressure drop into a CO2 saturated naturally fractured shale core plug. The exact spatial positions of natural fractures and the dynamic in-situ distribution of fluids were imaged using a CT-scan setup. To model the targeted system, we followed a domain decomposition approach for matrix and fractures and a multi-network architecture for the separate calculation of water saturation and pressure. The flow equations in the matrix, fractures and interplay between them were solved during training. Prior to fully-coupled simulations, we proposed pre-training the model. This aided in a more efficient and successful training of the coupled system. Both for the synthetic and experimental inverse problems, we determined flow parameters within the matrix and the fractures. Multiple random initializations of network and system parameters were performed to assess the uncertainty and uniqueness of the results. The results confirmed the precision of the inverse calculated parameters in retrieving the main flow characteristics of the system. The consideration of multiscale matrix-fracture impacts is commonly overlooked in existing workflows. Accounting for them led to several orders of magnitude variations in the calculated flow properties compared to not accounting for them. To the best of our knowledge, the proposed PINNs-based workflow is the first to offer a reliable and computationally efficient solution for inverse modeling of multiphase flow in fractured porous media, achieved through history-matching noisy and multi-fidelity experimental measurements., Comment: 47 pages of paper, including 19 figures
- Published
- 2024
4. JAMUN: Transferable Molecular Conformational Ensemble Generation with Walk-Jump Sampling
- Author
-
Daigavane, Ameya, Vani, Bodhi P., Saremi, Saeed, Kleinhenz, Joseph, and Rackers, Joshua
- Subjects
Physics - Biological Physics ,Computer Science - Machine Learning ,Quantitative Biology - Biomolecules - Abstract
Conformational ensembles of protein structures are immensely important both to understanding protein function, and for drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles are computationally inefficient, or do not transfer to systems outside their training data. We present walk-Jump Accelerated Molecular ensembles with Universal Noise (JAMUN), a step towards the goal of efficiently sampling the Boltzmann distribution of arbitrary proteins. By extending Walk-Jump Sampling to point clouds, JAMUN enables ensemble generation at orders of magnitude faster rates than traditional molecular dynamics or state-of-the-art ML methods. Further, JAMUN is able to predict the stable basins of small peptides that were not seen during training.
- Published
- 2024
5. EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants
- Author
-
Costa, Allan dos Santos, Mitnikov, Ilan, Pellegrini, Franco, Daigavane, Ameya, Geiger, Mario, Cao, Zhonglin, Kreis, Karsten, Smidt, Tess, Kucukbenli, Emine, and Jacobson, Joseph
- Subjects
Computer Science - Machine Learning ,Physics - Chemical Physics ,Quantitative Biology - Biomolecules - Abstract
Mapping the conformational dynamics of proteins is crucial for elucidating their functional mechanisms. While Molecular Dynamics (MD) simulation enables detailed time evolution of protein motion, its computational toll hinders its use in practice. To address this challenge, multiple deep learning models for reproducing and accelerating MD have been proposed drawing on transport-based generative methods. However, existing work focuses on generation through transport of samples from prior distributions, that can often be distant from the data manifold. The recently proposed framework of stochastic interpolants, instead, enables transport between arbitrary distribution endpoints. Building upon this work, we introduce EquiJump, a transferable SO(3)-equivariant model that bridges all-atom protein dynamics simulation time steps directly. Our approach unifies diverse sampling methods and is benchmarked against existing models on trajectory data of fast folding proteins. EquiJump achieves state-of-the-art results on dynamics simulation with a transferable model on all of the fast folding proteins.
- Published
- 2024
6. Behavior Matters: An Alternative Perspective on Promoting Responsible Data Science
- Author
-
Dong, Ziwei, Patil, Ameya, Shoda, Yuichi, Battle, Leilani, and Wall, Emily
- Subjects
Computer Science - Computers and Society ,Computer Science - Human-Computer Interaction ,Computer Science - Machine Learning - Abstract
Data science pipelines inform and influence many daily decisions, from what we buy to who we work for and even where we live. When designed incorrectly, these pipelines can easily propagate social inequity and harm. Traditional solutions are technical in nature; e.g., mitigating biased algorithms. In this vision paper, we introduce a novel lens for promoting responsible data science using theories of behavior change that emphasize not only technical solutions but also the behavioral responsibility of practitioners. By integrating behavior change theories from cognitive psychology with data science workflow knowledge and ethics guidelines, we present a new perspective on responsible data science. We present example data science interventions in machine learning and visual data analysis, contextualized in behavior change theories that could be implemented to interrupt and redirect potentially suboptimal or negligent practices while reinforcing ethically conscious behaviors. We conclude with a call to action to our community to explore this new research area of behavior change interventions for responsible data science., Comment: 23 pages, 4 figures, to be published in CSCW 2025
- Published
- 2024
7. Influence of the microstructure on the mechanical behavior of nanoporous materials under large strains
- Author
-
Chandrasekaran, Rajesh, Itskov, Mikhail, and Rege, Ameya
- Subjects
Condensed Matter - Materials Science ,Physics - Computational Physics - Abstract
Nanoporous materials are characterized by their complex porous morphology illustrated by the presence of a solid network and voids. The fraction of these voids is characterized by the porosity of the structure, which influences the bulk mechanical properties of the material. Most literature on the mechanics of porous materials has focused on the density-dependence of their elastic properties. In addition to porosity, other pore characteristics, namely pore-size and shape described by the pore-size distribution, and pore-wall size and shape, also influence the bulk response of these materials. In this work, the mechanical structure-property relation of nanoporous materials is studied under large deformations using a computational framework. The interdependent microstructural parameters are identified. After a successful correlation between the synthesis and microstructural parameters, the synthesis of porous materials can be guided and optimized by controlling these parameters., Comment: 22 pages, 14 figures
- Published
- 2024
8. MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
- Author
-
Nezakati, Niki, Reza, Md Kaykobad, Patil, Ameya, Solh, Mashhour, and Asif, M. Salman
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal learning seeks to combine data from multiple input sources to enhance the performance of different downstream tasks. In real-world scenarios, performance can degrade substantially if some input modalities are missing. Existing methods that can handle missing modalities involve custom training or adaptation steps for each input modality combination. These approaches are either tied to specific modalities or become computationally expensive as the number of input modalities increases. In this paper, we propose Masked Modality Projection (MMP), a method designed to train a single model that is robust to any missing modality scenario. We achieve this by randomly masking a subset of modalities during training and learning to project available input modalities to estimate the tokens for the masked modalities. This approach enables the model to effectively learn to leverage the information from the available modalities to compensate for the missing ones, enhancing missing modality robustness. We conduct a series of experiments with various baseline models and datasets to assess the effectiveness of this strategy. Experiments demonstrate that our approach improves robustness to different missing modality scenarios, outperforming existing methods designed for missing modalities or specific modality combinations.
- Published
- 2024
9. Online identification of skidding modes with interactive multiple model estimation
- Author
-
Salvi, Ameya, Ala, Pardha Sai Krishna, Smereka, Jonathon M., Brudnak, Mark, Gorsich, David, Schmid, Matthias, and Krovi, Venkat
- Subjects
Computer Science - Robotics - Abstract
Skid-steered wheel mobile robots (SSWMRs) operate in a variety of outdoor environments exhibiting motion behaviors dominated by the effects of complex wheel-ground interactions. Characterizing these interactions is crucial both from the immediate robot autonomy perspective (for motion prediction and control) as well as a long-term predictive maintenance and diagnostics perspective. An ideal solution entails capturing precise state measurements for decisions and controls, which is considerably difficult, especially in increasingly unstructured outdoor regimes of operations for these robots. In this milieu, a framework to identify pre-determined discrete modes of operation can considerably simplify the motion model identification process. To this end, we propose an interactive multiple model (IMM) based filtering framework to probabilistically identify predefined robot operation modes that could arise due to traversal in different terrains or loss of wheel traction.
- Published
- 2024
10. Fractonic Coset Construction for Spontaneously Broken Translations
- Author
-
Chavda, Ameya, Naegels, Daniel, and Staunton, John
- Subjects
High Energy Physics - Theory ,Condensed Matter - Other Condensed Matter - Abstract
We study the homogeneous breaking of spatial translation symmetry concomitantly with the spontaneous breaking of other internal and spacetime symmetries, including dilations. We use the symmetry breaking pattern as the only input to derive, via the coset construction, general effective field theories for the symmetry-originated modes associated with Goldstone's theorem, namely the Nambu-Goldstone candidates. Through explicit computations, we show that integrating out the explicit massive Nambu-Goldstone candidates or imposing symmetric constraints, namely the inverse Higgs constraints, to express massive modes in terms of the massless ones leads to physically distinct effective field theories. This sensitivity to the chosen method can be traced back to the homogeneous breaking of translations, the homogeneous aspect of the breaking induces a mixing between internal and spacetime symmetries at the level of the Lie algebra. This, in turn, leads to subtle discussions about the inverse Higgs constraints, in particular that they lead to a loss of generality in our specific examples. The derived general effective field theories also give rise to a broad class of theories exhibiting emergent enhanced shift symmetries, which constrain the mobility of the modes. The latter are referred to as fractonic modes., Comment: 53 pages, 1 figure, 2 supplemental Mathematica notebooks
- Published
- 2024
11. Stabilization of vertical motion of a vehicle on bumpy terrain using deep reinforcement learning
- Author
-
Salvi, Ameya, Coleman, John, Buzhardt, Jake, Krovi, Venkat, and Tallapragada, Phanindra
- Subjects
Computer Science - Robotics - Abstract
Stabilizing vertical dynamics for on-road and off-road vehicles is an important research area that has been looked at mostly from the point of view of ride comfort. The advent of autonomous vehicles now shifts the focus more towards developing stabilizing techniques from the point of view of onboard proprioceptive and exteroceptive sensors whose real-time measurements influence the performance of an autonomous vehicle. The current solutions to this problem of managing the vertical oscillations usually limit themselves to the realm of active suspension systems without much consideration to modulating the vehicle velocity, which plays an important role by the virtue of the fact that vertical and longitudinal dynamics of a ground vehicle are coupled. The task of stabilizing vertical oscillations for military ground vehicles becomes even more challenging due lack of structured environments, like city roads or highways, in off-road scenarios. Moreover, changes in structural parameters of the vehicle, such as mass (due to changes in vehicle loading), suspension stiffness and damping values can have significant effect on the controller's performance. This demands the need for developing deep learning based control policies, that can take into account an extremely large number of input features and approximate a near optimal control action. In this work, these problems are addressed by training a deep reinforcement learning agent to minimize the vertical acceleration of a scaled vehicle travelling over bumps by controlling its velocity.
- Published
- 2024
- Full Text
- View/download PDF
12. Investigating early and late-time epochs in $ f(Q) $ gravity
- Author
-
Kolhatkar, Ameya, Mishra, Sai Swagat, and Sahoo, P. K.
- Subjects
General Relativity and Quantum Cosmology ,High Energy Physics - Theory - Abstract
In the following work, a new hybrid model of the form $ f(Q)=Q(1+a)+b\frac{Q_0^2}{Q} $ has been proposed and confronted using both early as well as late-time constraints. We first use conditions from the era of Big Bang Nucleosynthesis (BBN) in order to constrain the models which are further used to study the evolution of the Universe through the deceleration parameter. This methodology is employed for the hybrid model as well as a simple model of the form $ \alpha_1 Q+\alpha_2 Q_0 $ which is found to reduce to $\Lambda$CDM. The error bar plot for the Cosmic Chronometer (CC) and Pantheon+SH0ES datasets which includes the comparison with $\Lambda$CDM, has been studied for the constrained hybrid model. Additionally, we perform a Monte Carlo Markov Chain (MCMC) sampling of the model against three datasets -- CC, Pantheon+SH0ES, and Baryon Acoustic Oscillations (BAO) to find the best-fit ranges of the free parameters. It is found that the constraint range of the model parameter ($a$) from the BBN study has a region of overlap with the ranges obtained from the MCMC analysis. Finally, we perform a statistical comparison between our model and the $\Lambda$CDM model using AIC and BIC method., Comment: EPJ C published version
- Published
- 2024
- Full Text
- View/download PDF
13. A Practitioner's Guide to Continual Multimodal Pretraining
- Author
-
Roth, Karsten, Udandarao, Vishaal, Dziadzio, Sebastian, Prabhu, Ameya, Cherti, Mehdi, Vinyals, Oriol, Hénaff, Olivier, Albanie, Samuel, Bethge, Matthias, and Akata, Zeynep
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these two limit cases, as real-world applications often demand adaptation to specific subdomains, tasks or concepts -- spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios. We first introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements, constructed over 63 datasets with diverse visual and semantic coverage. Using FoMo-in-Flux, we explore the complex landscape of practical continual pretraining through multiple perspectives: (1) A data-centric investigation of data mixtures and stream orderings that emulate real-world deployment situations, (2) a method-centric investigation ranging from simple fine-tuning and traditional continual learning strategies to parameter-efficient updates and model merging, (3) meta learning rate schedules and mechanistic design choices, and (4) the influence of model and compute scaling. Together, our insights provide a practitioner's guide to continual multimodal pretraining for real-world deployment. Our benchmark and code is here: https://github.com/ExplainableML/fomo_in_flux., Comment: Technical Report. 52 pages
- Published
- 2024
14. Analysis of Plan-based Retrieval for Grounded Text Generation
- Author
-
Godbole, Ameya, Monath, Nicholas, Kim, Seungyeon, Rawat, Ankit Singh, McCallum, Andrew, and Zaheer, Manzil
- Subjects
Computer Science - Computation and Language ,Computer Science - Information Retrieval - Abstract
In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametric knowledge (due to rarity, recency, domain, etc.). A common strategy to address this limitation is to infuse the language models with retrieval mechanisms, providing the model with relevant knowledge for the task. In this paper, we leverage the planning capabilities of instruction-tuned LLMs and analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations. We empirically evaluate several variations of our proposed approach on long-form text generation tasks. By improving the coverage of relevant facts, plan-guided retrieval and generation can produce more informative responses while providing a higher rate of attribution to source documents.
- Published
- 2024
15. Data Contamination Report from the 2024 CONDA Shared Task
- Author
-
Sainz, Oscar, García-Ferrero, Iker, Jacovi, Alon, Campos, Jon Ander, Elazar, Yanai, Agirre, Eneko, Goldberg, Yoav, Chen, Wei-Lin, Chim, Jenny, Choshen, Leshem, D'Amico-Wong, Luca, Dell, Melissa, Fan, Run-Ze, Golchin, Shahriar, Li, Yucheng, Liu, Pengfei, Pahwa, Bhavish, Prabhu, Ameya, Sharma, Suryansh, Silcock, Emily, Solonko, Kateryna, Stap, David, Surdeanu, Mihai, Tseng, Yu-Min, Udandarao, Vishaal, Wang, Zengzhi, Xu, Ruijie, and Yang, Jinglin
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models. The goal of the shared task and associated database is to assist the community in understanding the extent of the problem and to assist researchers in avoiding reporting evaluation results on known contaminated resources. The shared task provides a structured, centralized public database for the collection of contamination evidence, open to contributions from the community via GitHub pool requests. This first compilation paper is based on 566 reported entries over 91 contaminated sources from a total of 23 contributors. The details of the individual contamination events are available in the platform. The platform continues to be online, open to contributions from the community., Comment: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database
- Published
- 2024
16. CiteME: Can Language Models Accurately Cite Scientific Claims?
- Author
-
Press, Ori, Hochlehnert, Andreas, Prabhu, Ameya, Udandarao, Vishaal, Press, Ofir, and Bethge, Matthias
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3\% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.
- Published
- 2024
17. DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction
- Author
-
Pore, Ameya, Muradore, Riccardo, and Dall'Alba, Diego
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data, especially when the visual scene is complex and unstructured. In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods. We propose a novel method, Disentangled Environment and Agent Representations (DEAR), that uses the segmentation mask of the agent as supervision to learn disentangled representations of the environment and the agent through feature separation constraints. Unlike previous approaches, DEAR does not require reconstruction of visual observations. These representations are then used as an auxiliary loss to the RL objective, encouraging the agent to focus on the relevant features of the environment. We evaluate DEAR on two challenging benchmarks: Distracting DeepMind control suite and Franka Kitchen manipulation tasks. Our findings demonstrate that DEAR surpasses state-of-the-art methods in sample efficiency, achieving comparable or superior performance with reduced parameters. Our results indicate that integrating agent knowledge into visual RL methods has the potential to enhance their learning efficiency and robustness., Comment: 6 pages, 7 figures, 2 tables. Accepted at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
- Published
- 2024
18. Nemotron-4 340B Technical Report
- Author
-
Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, and Zhu, Chen
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
- Published
- 2024
19. Small-scale and large-scale dynamos in global convection simulations of solar-like stars
- Author
-
Warnecke, Jörn, Korpi-Lagg, Maarit J., Rheinhard, Matthias, Viviani, Mariangela, and Prabhu, Ameya
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
It has been recently shown that a small-scale dynamo (SSD) instability could be possible in solar-like low magnetic Prandtl number Pm plasmas. It has been proposed that the presence of SSD can potentially have a significant impact on the dynamics of the large-scale dynamo (LSD) in the stellar convection zones. Studying these two dynamos, SSD and LSD, together in a global magnetoconvection model requires high-resolution simulations and large amounts of computational resources. Starting from a well-studied global convective dynamo model that produces cyclic magnetic fields, we systematically increased the resolution and lowered the diffusivities to enter the regime of Reynolds numbers that allow for the excitation of SSD on top of the LSD. We studied how the properties of convection, generated differential rotation profiles, and LSD solutions change with the presence of SSD. We performed convective dynamo simulations in a spherical wedge with the Pencil Code. The resolutions of the models were increased in 4 steps by a total factor of 16 to achieve maximal fluid and magnetic Reynolds numbers of over 500. We found that the differential rotation is strongly quenched by the presence of the LSD and SSD. Even though the small-scale magnetic field only mildly decreases increasing Re, the large-scale field strength decreases significantly. We do not find the SSD dynamo significantly quenching the convective flows as claimed recently by other authors; in contrast, the convective flows first grow and then saturate for increasing Re. Furthermore, the angular momentum transport is highly affected by the presence of small-scale magnetic fields, which are mostly generated by LSD. These fields not only change the Reynolds stresses, but also generate dynamically important Maxwell stresses. The LSD evolution in terms of its pattern and field distribution is rather independent of the increase in Rm., Comment: 17 pages, 18 figures with an Appendix with 3 pages and 3 figures, submitted to A&A
- Published
- 2024
20. Opposing tumor-cell-intrinsic and -extrinsic roles of the IRF1 transcription factor in antitumor immunity
- Author
-
Purbey, Prabhat K, Seo, Joowon, Paul, Manash K, Iwamoto, Keisuke S, Daly, Allison E, Feng, An-Chieh, Champhekar, Ameya S, Langerman, Justin, Campbell, Katie M, Schaue, Dörthe, McBride, William H, Dubinett, Steven M, Ribas, Antoni, Smale, Stephen T, and Scumpia, Philip O
- Subjects
Biological Sciences ,Immunotherapy ,Genetics ,Cancer ,2.1 Biological and endogenous factors ,1.1 Normal biological development and functioning ,Inflammatory and immune system ,Animals ,Humans ,Mice ,B7-H1 Antigen ,Cell Line ,Tumor ,Immunity ,Interferon Regulatory Factor-1 ,Mice ,Inbred C57BL ,Neoplasms ,STAT1 Transcription Factor ,Male ,Female ,CP: Cancer ,CP: Immunology ,IRF1 ,PD-L1 regulation ,TLR signaling ,antitumor immunity ,cytotoxic T lymphocytes ,immune checkpoint blockade ,immune evasion ,interferon signaling ,scRNA-seq ,transcription ,Biochemistry and Cell Biology ,Medical Physiology ,Biological sciences - Abstract
Type I interferon (IFN-I) and IFN-γ foster antitumor immunity by facilitating T cell responses. Paradoxically, IFNs may promote T cell exhaustion by activating immune checkpoints. The downstream regulators of these disparate responses are incompletely understood. Here, we describe how interferon regulatory factor 1 (IRF1) orchestrates these opposing effects of IFNs. IRF1 expression in tumor cells blocks Toll-like receptor- and IFN-I-dependent host antitumor immunity by preventing interferon-stimulated gene (ISG) and effector programs in immune cells. In contrast, expression of IRF1 in the host is required for antitumor immunity. Mechanistically, IRF1 binds distinctly or together with STAT1 at promoters of immunosuppressive but not immunostimulatory ISGs in tumor cells. Overexpression of programmed cell death ligand 1 (PD-L1) in Irf1-/- tumors only partially restores tumor growth, suggesting multifactorial effects of IRF1 on antitumor immunity. Thus, we identify that IRF1 expression in tumor cells opposes host IFN-I- and IRF1-dependent antitumor immunity to facilitate immune escape and tumor growth.
- Published
- 2024
21. Biclustering a dataset using photonic quantum computing
- Author
-
Borle, Ajinkya and Bhave, Ameya
- Subjects
Quantum Physics ,Computer Science - Emerging Technologies ,Computer Science - Machine Learning ,Physics - Optics - Abstract
Biclustering is a problem in machine learning and data mining that seeks to group together rows and columns of a dataset according to certain criteria. In this work, we highlight the natural relation that quantum computing models like boson and Gaussian boson sampling (GBS) have to this problem. We first explore the use of boson sampling to identify biclusters based on matrix permanents. We then propose a heuristic that finds clusters in a dataset using Gaussian boson sampling by (i) converting the dataset into a bipartite graph and then (ii) running GBS to find the densest sub-graph(s) within the larger bipartite graph. Our simulations for the above proposed heuristics show promising results for future exploration in this area., Comment: 32 pages, 5 figures, 6 tables
- Published
- 2024
22. Generation of mega-gauss axial and azimuthal magnetic fields in a solid plasma by ultrahigh intensity, circularly polarised femtosecond laser pulses
- Author
-
Choudhary, Anandam, Goswami, Laxman Prasad, Aparajit, C., Lad, Amit D., Parab, Ameya, Ved, Yash M., Das, Amita, and Kumar, G. Ravindra
- Subjects
Physics - Plasma Physics - Abstract
The interaction of intense linearly polarized femtosecond laser pulses with solids is known to generate azimuthal magnetic fields, while circularly polarized light has been shown to create axial fields. We demonstrate through experiments and particle-in-cell simulations that circularly polarized light can generate both axial and azimuthal fields of comparable magnitude in a plasma created in a solid. Angular distributions of the generated fast electrons at target front and rear show significant differences between the results for the two polarization states, with circular polarization enforcing more axial confinement. The measurement of the spatial distribution of both types of magnetic fields captures their turbulent evolution., Comment: 9 pages, 10 figures
- Published
- 2024
23. kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
- Author
-
Gui, Zhongrui, Sun, Shuyang, Li, Runjia, Yuan, Jianhao, An, Zhaochong, Roth, Karsten, Prabhu, Ameya, and Torr, Philip
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Continual segmentation has not yet tackled the challenge of improving open-vocabulary segmentation models with training data for accurate segmentation across large, continually expanding vocabularies. We discover that traditional continual training results in severe catastrophic forgetting, failing to outperform a zero-shot segmentation baseline. We introduce a novel training-free strategy, kNN-CLIP, which augments the model with a database of instance embeddings for semantic and panoptic segmentation that achieves zero forgetting. We demonstrate that kNN-CLIP can adapt to continually growing vocabularies without the need for retraining or large memory costs. kNN-CLIP enables open-vocabulary segmentation methods to expand their vocabularies on any domain with a single pass through the data, while only storing compact embeddings. This approach minimizes both compute and memory costs. kNN-CLIP achieves state-of-the-art performance across large-vocabulary semantic and panoptic segmentation datasets. We hope kNN-CLIP represents a significant step forward in enabling more efficient and adaptable continual segmentation, paving the way for advances in real-world large-vocabulary continual segmentation methods.
- Published
- 2024
24. Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
- Author
-
Sinha, Shiven, Prabhu, Ameya, Kumaraguru, Ponnurangam, Bhat, Siddharth, and Bethge, Matthias
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computational Geometry ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist., Comment: Work in Progress. Released for wider feedback
- Published
- 2024
25. Petersson norms of Borcherds theta lifts to O(1, 8n+1) with applications to injectivity and sup-norm bounds
- Author
-
Marshall, Simon, Narita, Hiroaki, and Pitale, Ameya
- Subjects
Mathematics - Number Theory - Abstract
We give an explicit formula for the Petersson norms of theta lifts from Maass cusp forms of level one to cusp forms on orthogonal groups O(1,8n+1). Our formula explicitly determines archimedean local factors of the norms. As an application, we obtain the injectivity of the lifting of Maass forms and bounds on the sup-norm of cusp forms on these orthogonal groups in terms of their Laplace eigenvalues.
- Published
- 2024
26. No 'Zero-Shot' Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
- Author
-
Udandarao, Vishaal, Prabhu, Ameya, Ghosh, Adhiraj, Sharma, Yash, Torr, Philip H. S., Bibi, Adel, Albanie, Samuel, and Bethge, Matthias
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found., Comment: Short version accepted at DPFM, ICLR'24; Full paper at NeurIPS'24
- Published
- 2024
27. Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion
- Author
-
Huang, Zhiyu, Zhang, Zixu, Vaidya, Ameya, Chen, Yuxiao, Lv, Chen, and Fisac, Jaime Fernández
- Subjects
Computer Science - Robotics - Abstract
Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-based generative modeling and introduce a novel framework Versatile Behavior Diffusion (VBD) to simulate interactive scenarios with multiple traffic participants. Our model not only generates scene-consistent multi-agent interactions but also enables scenario editing through multi-step guidance and refinement. Experimental evaluations show that VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark. In addition, we illustrate the versatility of our model by adapting it to various applications. VBD is capable of producing scenarios conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios by fusing marginal predictions, and generating safety-critical scenarios when combined with a game-theoretic solver.
- Published
- 2024
28. CSSTs: A Dynamic Data Structure for Partial Orders in Concurrent Execution Analysis
- Author
-
Tunç, Hünkar Can, Deshmukh, Ameya Prashant, Çirisci, Berk, Enea, Constantin, and Pavlogiannis, Andreas
- Subjects
Computer Science - Programming Languages - Abstract
Dynamic analyses are a standard approach to analyzing and testing concurrent programs. Such techniques observe program traces and analyze them to infer the presence or absence of bugs. At its core, each analysis maintains a partial order $P$ that represents order dependencies between events of the analyzed trace $\sigma$. Naturally, the scalability of the analysis largely depends on how efficiently it maintains $P$. The standard data structure for this task has thus far been vector clocks. These, however, are slow for analyses that follow a non-streaming style, costing $O(n)$ for inserting (and propagating) each new ordering in $P$, where $n$ is the size of $\sigma$, while they cannot handle the deletion of existing orderings. In this paper we develop collective sparse segment trees (CSSTs), a simple but elegant data structure for generically maintaining a partial order $P$. CSSTs thrive when the width $k$ of $P$ is much smaller than the size $n$ of its domain, allowing inserting, deleting, and querying for orderings in $P$ to run in $O(logn)$ time. For a concurrent trace, $k$ is bounded by the number of its threads, and is normally orders of magnitude smaller than its size $n$, making CSSTs fitting for this setting. Our experimental results confirm that CSSTs are the best data structure currently to handle a range of dynamic analyses from existing literature.
- Published
- 2024
29. Granular Aluminum Parametric Amplifier for Low-Noise Measurements in Tesla Fields
- Author
-
Zapata, Nicolas, Takmakov, Ivan, Günzler, Simon, Nambisan, Ameya, Rieger, Dennis, Reisinger, Thomas, Wernsdorfer, Wolfgang, and Pop, Ioan M.
- Subjects
Quantum Physics - Abstract
Josephson junction parametric amplifiers have become essential tools for microwave quantum circuit readout with minimal added noise. Even after improving at an impressive rate in the last decade, they remain vulnerable to magnetic field, which limits their use in many applications such as spin qubits, Andreev and molecular magnet devices, dark matter searches, etc. Kinetic inductance materials, such as granular aluminum (grAl), offer an alternative source of non-linearity with innate magnetic field resilience. We present a non-degenerate amplifier made of two coupled grAl resonators resilient to in-plane magnetic field up to 1 T. It offers 20 dB of gain close to the quantum limit of added noise, with a gain-bandwidth product of 28 MHz and -110 dBm input saturation power.
- Published
- 2024
30. Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism
- Author
-
Brahmachary, Shuvayan, Joshi, Subodh M., Panda, Aniruddha, Koneripalli, Kaushik, Sagotra, Arun Kumar, Patel, Harshil, Sharma, Ankush, Jagtap, Ameya D., and Kalyanaraman, Kaushic
- Subjects
Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Language-Model-Based Evolutionary Optimizer (LEO). Our hypothesis is supported through numerical examples, spanning benchmark and industrial engineering problems such as supersonic nozzle shape optimization, heat transfer, and windfarm layout optimization. We compare our method to several gradient-based and gradient-free optimization approaches. While LLMs yield comparable results to state-of-the-art methods, their imaginative nature and propensity to hallucinate demand careful handling. We provide practical guidelines for obtaining reliable answers from LLMs and discuss method limitations and potential research directions.
- Published
- 2024
31. Computational homogenization for aerogel-like polydisperse open-porous materials using neural network--based surrogate models on the microscale
- Author
-
Klawonn, Axel, Lanser, Martin, Mager, Lucas, and Rege, Ameya
- Subjects
Mathematics - Numerical Analysis ,68T07, 65N30, 74Q05 - Abstract
The morphology of nanostructured materials exhibiting a polydisperse porous space, such as aerogels, is very open porous and fine grained. Therefore, a simulation of the deformation of a large aerogel structure resolving the nanostructure would be extremely expensive. Thus, multi-scale or homogenization approaches have to be considered. Here, a computational scale bridging approach based on the FE$^2$ method is suggested, where the macroscopic scale is discretized using finite elements while the microstructure of the open-porous material is resolved as a network of Euler-Bernoulli beams. Here, the beam frame based RVEs (representative volume elements) have pores whose size distribution follows the measured values for a specific material. This is a well-known approach to model aerogel structures. For the computational homogenization, an approach to average the first Piola-Kirchhoff stresses in a beam frame by neglecting rotational moments is suggested. To further overcome the computationally most expensive part in the homogenization method, that is, solving the RVEs and averaging their stress fields, a surrogate model is introduced based on neural networks. The networks input is the localized deformation gradient on the macroscopic scale and its output is the averaged stress for the specific material. It is trained on data generated by the beam frame based approach. The effiency and robustness of both homogenization approaches is shown numerically, the approximation properties of the surrogate model is verified for different macroscopic problems and discretizations. Different (Quasi-)Newton solvers are considered on the macroscopic scale and compared with respect to their convergence properties.
- Published
- 2024
32. FET PET to differentiate between post-treatment changes and recurrence in high-grade gliomas: a single center multidisciplinary clinic controlled study
- Author
-
Puranik, Ameya D., Dev, Indraja D., Rangarajan, Venkatesh, Jain, Yash, Patra, Sukriti, Purandare, Nilendu C., Sahu, Arpita, Choudhary, Amitkumar, Bhattacharya, Kajari, Gupta, Tejpal, Chatterjee, Abhishek, Dasgupta, Archya, Moiyadi, Aliasgar, Shetty, Prakash, Singh, Vikas, Sridhar, Epari, Sahay, Ayushi, Shah, Aekta, Menon, Nandini, Ghosh, Suchismita, Choudhury, Sayak, Shah, Sneha, Agrawal, Archi, Lakshminarayanan, N., Kumar, Amit, and Gopalakrishna, Arjun
- Published
- 2024
- Full Text
- View/download PDF
33. NeuFENet: neural finite element solutions with theoretical bounds for parametric PDEs
- Author
-
Khara, Biswajit, Balu, Aditya, Joshi, Ameya, Sarkar, Soumik, Hegde, Chinmay, Krishnamurthy, Adarsh, and Ganapathysubramanian, Baskar
- Published
- 2024
- Full Text
- View/download PDF
34. Long-Term Functional Outcome of Primary Closure with Hyoglossus Release for Small to Medium Size Lower Gingivobuccal Sulcus Complex Defect: An Alternative Choice to Flaps
- Author
-
Pai, Prathamesh S., Shah, Dinesh, Gangiti, Kranthikumar, Pai, Ameya, and Shukla, Aishwariya
- Published
- 2024
- Full Text
- View/download PDF
35. BNP-Track: a framework for superresolved tracking
- Author
-
Sgouralis, Ioannis, Xu, Lance W. Q., Jalihal, Ameya P., Kilic, Zeliha, Walter, Nils G., and Pressé, Steve
- Published
- 2024
- Full Text
- View/download PDF
36. Factors Predicting Prognosis in Metastatic Grade 1 Gastro-entero-pancreatic Neuroendocrine Tumors
- Author
-
Pandrowala, Saneya A., Kapoor, Deeksha, Kunte, Aditya, Chopde, Amit, Puranik, Ameya, Dev, Indraja Devidas, Parghane, Rahul, Basu, Sandip, Ramaswamy, Anant, Ostwal, Vikas, Chaudhari, Vikram A., Bhandare, Manish S., and Shrikhande, Shailesh V.
- Published
- 2024
- Full Text
- View/download PDF
37. Fundamental Aspects of Dissolution of Lime into Steelmaking Slags
- Author
-
Kadrolkar, Ameya, Overbosch, Aart, Koopmans, Pieter, and Deo, Brahma
- Published
- 2024
- Full Text
- View/download PDF
38. Utility of Microvascular Reconstruction in Gastrointestinal Cancer Surgery During Complex Resections and Emergency Salvage
- Author
-
Jaiswal, Dushyant, Bhansali, Chirag, Shitole, Abhishek, Kumar, Vineet, Bindu, Ameya, Mantri, Mayur, Mathews, Saumya, and Shankhdhar, Vinay Kant
- Published
- 2024
- Full Text
- View/download PDF
39. Flow-Through Radial Artery Forearm Flap for Tongue Revascularization After Excision of Base of Tongue Malignancies
- Author
-
Bindu, Ameya, Kumar, Vineet, Kulkarni, Onkar S., Jaiswal, Dushyant, Mathews, Saumya, Mantri, Mayur, Mokhale, Kunal, and Shankhdhar, Vinay Kant
- Published
- 2024
- Full Text
- View/download PDF
40. Long-Term Outcomes of Recurrent Laryngeal Nerve Repair/Reconstruction in Oncological Settings
- Author
-
Mandavgane, Mayank, Kumar, Vineet, Mokhale, Kunal, Bindu, Ameya, Mantri, Mayur, Mathews, Saumya, Jaiswal, Dushyant, and Shankhdhar, Vinay Kant
- Published
- 2024
- Full Text
- View/download PDF
41. Anti-T-lymphocyte globulin (ATLG) compared to post-transplant cyclophosphamide as GvHD prophylaxis in ALL patients undergoing allogeneic stem cell transplantation
- Author
-
Steiner, Normann, Massoud, Radwan, Klyuchnikov, Evgeny, Gagelmann, Nico, Richter, Johanna, Niederwieser, Christian, Rathje, Kristin, Urbanowicz, Tatjana, Kunte, Ameya, Engelmann, Janik, Ihne, Christina, Lastovytska, Iryna, Lindhauer, Cecilia, Marquard, Franziska, Reichard, Mirjam, Ryzhkova, Alla, Sabauri, Rusudan, Schäfersküpper, Mathias, Seyedi, Niloufar, Kalogeropoulos, Georgios, Heidenreich, Silke, Rudolph, Ina, Zeck, Gaby, Janson, Dietlinde, Wolschke, Christine, Ayuk, Francis, and Kröger, Nicolaus
- Published
- 2024
- Full Text
- View/download PDF
42. Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
- Author
-
Prabhu, Ameya, Udandarao, Vishaal, Torr, Philip, Bethge, Matthias, Bibi, Adel, and Albanie, Samuel
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Standardized benchmarks drive progress in machine learning. However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling ever-expanding large-scale benchmarks called Lifelong Benchmarks. As exemplars of our approach, we create Lifelong-CIFAR10 and Lifelong-ImageNet, containing (for now) 1.69M and 1.98M test samples, respectively. While reducing overfitting, lifelong benchmarks introduce a key challenge: the high cost of evaluating a growing number of models across an ever-expanding sample set. To address this challenge, we also introduce an efficient evaluation framework: Sort \& Search (S&S), which reuses previously evaluated models by leveraging dynamic programming algorithms to selectively rank and sub-select test samples, enabling cost-effective lifelong benchmarking. Extensive empirical evaluations across 31,000 models demonstrate that S&S achieves highly-efficient approximate accuracy measurement, reducing compute cost from 180 GPU days to 5 GPU hours (1000x reduction) on a single A100 GPU, with low approximation error. As such, lifelong benchmarks offer a robust, practical solution to the "benchmark exhaustion" problem.
- Published
- 2024
43. A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection
- Author
-
Boytsov, Leonid, Joshi, Ameya, and Condessa, Filipe
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We tested front-end enhanced neural models where a frozen classifier was prepended by a differentiable and fully convolutional model with a skip connection. By training them using a small learning rate for about one epoch, we obtained models that retained the accuracy of the backbone classifier while being unusually resistant to gradient attacks including APGD and FAB-T attacks from the AutoAttack package, which we attributed to gradient masking. The gradient masking phenomenon is not new, but the degree of masking was quite remarkable for fully differentiable models that did not have gradient-shattering components such as JPEG compression or components that are expected to cause diminishing gradients. Though black box attacks can be partially effective against gradient masking, they are easily defeated by combining models into randomized ensembles. We estimate that such ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10, CIFAR100, and ImageNet despite having virtually zero accuracy under adaptive attacks. Adversarial training of the backbone classifier can further increase resistance of the front-end enhanced model to gradient attacks. On CIFAR10, the respective randomized ensemble achieved 90.8$\pm 2.5$% (99% CI) accuracy under AutoAttack while having only 18.2$\pm 3.6$% accuracy under the adaptive attack. We do not establish SOTA in adversarial robustness. Instead, we make methodological contributions and further supports the thesis that adaptive attacks designed with the complete knowledge of model architecture are crucial in demonstrating model robustness and that even the so-called white-box gradient attacks can have limited applicability. Although gradient attacks can be complemented with black-box attack such as the SQUARE attack or the zero-order PGD, black-box attacks can be weak against randomized ensembles, e.g., when ensemble models mask gradients.
- Published
- 2024
44. Nemotron-4 15B Technical Report
- Author
-
Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Patwary, Mostofa, Subramanian, Sandeep, Su, Dan, Zhu, Chen, Narayanan, Deepak, Jhunjhunwala, Aastha, Dattagupta, Ayush, Jawa, Vibhu, Liu, Jiwei, Mahabaleshwarkar, Ameya, Nitski, Osvald, Brundyn, Annika, Maki, James, Martinez, Miguel, You, Jiaxuan, Kamalu, John, LeGresley, Patrick, Fridman, Denys, Casper, Jared, Aithal, Ashwath, Kuchaiev, Oleksii, Shoeybi, Mohammad, Cohen, Jonathan, and Catanzaro, Bryan
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
- Published
- 2024
45. Corrective Machine Unlearning
- Author
-
Goel, Shashwat, Prabhu, Ameya, Torr, Philip, Kumaraguru, Ponnurangam, and Sanyal, Amartya
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Machine Learning models increasingly face data integrity challenges due to the use of large-scale training datasets drawn from the Internet. We study what model developers can do if they detect that some data was manipulated or incorrect. Such manipulated data can cause adverse effects including vulnerability to backdoored samples, systemic biases, and reduced accuracy on certain input domains. Realistically, all manipulated training samples cannot be identified, and only a small, representative subset of the affected data can be flagged. We formalize Corrective Machine Unlearning as the problem of mitigating the impact of data affected by unknown manipulations on a trained model, only having identified a subset of the corrupted data. We demonstrate that the problem of corrective unlearning has significantly different requirements from traditional privacy-oriented unlearning. We find most existing unlearning methods, including retraining-from-scratch without the deletion set, require most of the manipulated data to be identified for effective corrective unlearning. However, one approach, Selective Synaptic Dampening, achieves limited success, unlearning adverse effects with just a small portion of the manipulated samples in our setting, which shows encouraging signs for future progress. We hope our work spurs research towards developing better methods for corrective unlearning and offers practitioners a new strategy to handle data integrity challenges arising from web-scale training. Code is available at https://github.com/drimpossible/corrective-unlearning-bench., Comment: Published in Transactions of Machine Learning Research (TMLR), 17 pages, 7 figures
- Published
- 2024
46. RanDumb: A Simple Approach that Questions the Efficacy of Continual Representation Learning
- Author
-
Prabhu, Ameya, Sinha, Shiven, Kumaraguru, Ponnurangam, Torr, Philip H. S., Sener, Ozan, and Dokania, Puneet K.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Continual learning has primarily focused on the issue of catastrophic forgetting and the associated stability-plasticity tradeoffs. However, little attention has been paid to the efficacy of continually learned representations, as representations are learned alongside classifiers throughout the learning process. Our primary contribution is empirically demonstrating that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our approach embeds raw pixels using a fixed random transform, approximating an RBF-Kernel initialized before any data is seen. We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting. This method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all standard online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios. Extending our investigation to popular exemplar-free scenarios with pretrained models, we find that training only a linear classifier on top of pretrained representations surpasses most continual fine-tuning and prompt-tuning strategies. Overall, our investigation challenges the prevailing assumptions about effective representation learning in online continual learning. Our code is available at://github.com/drimpossible/RanDumb., Comment: Tech Report
- Published
- 2024
47. Weisfeiler-Leman at the margin: When more expressivity matters
- Author
-
Franks, Billy J., Morris, Christopher, Velingker, Ameya, and Geerts, Floris
- Subjects
Computer Science - Machine Learning ,Computer Science - Discrete Mathematics ,Computer Science - Neural and Evolutionary Computing ,Statistics - Machine Learning - Abstract
The Weisfeiler-Leman algorithm ($1$-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, $1$-WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of more expressive MPNN and kernel architectures. However, the relationship between enhanced expressivity and improved generalization performance remains unclear. Here, we show that an architecture's expressivity offers limited insights into its generalization performance when viewed through graph isomorphism. Moreover, we focus on augmenting $1$-WL and MPNNs with subgraph information and employ classical margin theory to investigate the conditions under which an architecture's increased expressivity aligns with improved generalization performance. In addition, we show that gradient flow pushes the MPNN's weights toward the maximum margin solution. Further, we introduce variations of expressive $1$-WL-based kernel and MPNN architectures with provable generalization properties. Our empirical study confirms the validity of our theoretical findings., Comment: Accepted at ICML 2024. arXiv admin note: text overlap with arXiv:2301.11039
- Published
- 2024
48. RiemannONets: Interpretable Neural Operators for Riemann Problems
- Author
-
Peyvan, Ahmad, Oommen, Vivek, Jagtap, Ameya D., and Karniadakis, George Em
- Subjects
Computer Science - Machine Learning ,Physics - Fluid Dynamics - Abstract
Developing the proper representations for simulating high-speed flows with strong shock waves, rarefactions, and contact discontinuities has been a long-standing question in numerical analysis. Herein, we employ neural operators to solve Riemann problems encountered in compressible flows for extreme pressure jumps (up to $10^{10}$ pressure ratio). In particular, we first consider the DeepONet that we train in a two-stage process, following the recent work of \cite{lee2023training}, wherein the first stage, a basis is extracted from the trunk net, which is orthonormalized and subsequently is used in the second stage in training the branch net. This simple modification of DeepONet has a profound effect on its accuracy, efficiency, and robustness and leads to very accurate solutions to Riemann problems compared to the vanilla version. It also enables us to interpret the results physically as the hierarchical data-driven produced basis reflects all the flow features that would otherwise be introduced using ad hoc feature expansion layers. We also compare the results with another neural operator based on the U-Net for low, intermediate, and very high-pressure ratios that are very accurate for Riemann problems, especially for large pressure ratios, due to their multiscale nature but computationally more expensive. Overall, our study demonstrates that simple neural network architectures, if properly pre-trained, can achieve very accurate solutions of Riemann problems for real-time forecasting. The source code, along with its corresponding data, can be found at the following URL: https://github.com/apey236/RiemannONet/tree/main
- Published
- 2024
- Full Text
- View/download PDF
49. Big Bang Nucleosynthesis constraints on $f(T, \mathcal{T})$ gravity
- Author
-
Mishra, Sai Swagat, Kolhatkar, Ameya, and Sahoo, P. K.
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
Big Bang Nucleosynthesis provides us with an observational insight into the very early Universe. Since this mechanism of light element synthesis comes out of the standard model of particle cosmology which follows directly from General Relativity, it is expected that any modifications to GR will result in deviations in the predicted observable parameters which are mainly, the neutron-to-proton ratio and the baryon-to-photon ratio. We use the measured neutron-to-proton ratio and compare the theoretically obtained expressions to constrain two models in the framework of $ f(T,\mathcal{T}) $ gravity. The theoretically constrained models are then tested against observational data from the Hubble dataset and the $ \Lambda $CDM model to explain the accelerated expansion of the Universe., Comment: PLB published version
- Published
- 2023
- Full Text
- View/download PDF
50. Role of Microsurgery in Organ-Preserving Surgeries in Uro-oncology
- Author
-
Kaur, Navneet, Kumar, Vineet, Shankhdhar, Vinay Kant, Bindu, Ameya, Mantri, Mayur, Mathews, Saumya, Jaiswal, Dushyant, Arora, Amandeep, Prakash, Gagan, Pal, Mahendra, and Mokhale, Kunal
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.