24,278 results on '"Naseer, A"'
Search Results
2. Block Induced Signature Generative Adversarial Network (BISGAN): Signature Spoofing Using GANs and Their Evaluation
- Author
-
Amjad, Haadia, Goeller, Kilian, Seitz, Steffen, Knoll, Carsten, Bajwa, Naseer, Tetzlaff, Ronald, and Malik, Muhammad Imran
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Deep learning is actively being used in biometrics to develop efficient identification and verification systems. Handwritten signatures are a common subset of biometric data for authentication purposes. Generative adversarial networks (GANs) learn from original and forged signatures to generate forged signatures. While most GAN techniques create a strong signature verifier, which is the discriminator, there is a need to focus more on the quality of forgeries generated by the generator model. This work focuses on creating a generator that produces forged samples that achieve a benchmark in spoofing signature verification systems. We use CycleGANs infused with Inception model-like blocks with attention heads as the generator and a variation of the SigCNN model as the base Discriminator. We train our model with a new technique that results in 80% to 100% success in signature spoofing. Additionally, we create a custom evaluation technique to act as a goodness measure of the generated forgeries. Our work advocates generator-focused GAN architectures for spoofing data quality that aid in a better understanding of biometric data generation and evaluation.
- Published
- 2024
3. Anisotropic Stellar Models with Tolman IV Spacetime in Non-minimally Coupled Theory
- Author
-
Sharif, M. and Naseer, Tayyab
- Subjects
General Relativity and Quantum Cosmology - Abstract
This article aims to investigate various anisotropic stellar models in the background of $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ gravity, where $\mathcal{Q}=\mathcal{R}_{\varphi\vartheta}\mathcal{T}^{\varphi\vartheta}$. In this regard, we adopt two standard models as $\mathcal{R}+\zeta\mathcal{Q}$ and $\mathcal{R}+\zeta\mathcal{R}\mathcal{Q}$, where $\zeta$ symbolizes an arbitrary coupling parameter. We take spherical interior geometry and find solution to the modified gravitational field equations corresponding to each model by employing the `Tolman IV' spacetime. We need an additional constraint to close the system of field equations, thus the $\mathbb{MIT}$ bag model equation of state is chosen. The effects of modified theory on physical properties of six compact stars like PSR J 1614 2230,~SMC X-1,~Cen X-3,~PSR J 1903+327,~SAX J 1808.4-3658 and 4U 1820-30 are analyzed by using their respective masses and radii. We also determine the values of three unknowns involving in Tolman IV solution as well as the bag constant for each star at the hypersurface. Furthermore, various characteristics of the resulting solutions are examined through graphical interpretation for $\zeta=\pm5$. Finally, we explore the stability of the compact objects through two different approaches. We conclude that our model-I produces physically acceptable structures corresponding to each star candidate for both values of $\zeta$ whereas model-II is stable only for $\zeta=5$., Comment: 39 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
4. AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
- Author
-
Nawaz, Umair, Awais, Muhammad, Gani, Hanan, Naseer, Muzammal, Khan, Fahad, Khan, Salman, and Anwer, Rao Muhammad
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Capitalizing on vast amount of image-text data, large-scale vision-language pre-training has demonstrated remarkable zero-shot capabilities and has been utilized in several applications. However, models trained on general everyday web-crawled data often exhibit sub-optimal performance for specialized domains, likely due to domain shift. Recent works have tackled this problem for some domains (e.g., healthcare) by constructing domain-specialized image-text data. However, constructing a dedicated large-scale image-text dataset for sustainable area of agriculture and livestock is still open to research. Further, this domain desires fine-grained feature learning due to the subtle nature of the downstream tasks (e.g, nutrient deficiency detection, livestock breed classification). To address this we present AgriCLIP, a vision-language foundational model dedicated to the domain of agriculture and livestock. First, we propose a large-scale dataset, named ALive, that leverages customized prompt generation strategy to overcome the scarcity of expert annotations. Our ALive dataset covers crops, livestock, and fishery, with around 600,000 image-text pairs. Second, we propose a training pipeline that integrates both contrastive and self-supervised learning to learn both global semantic and local fine-grained domain-specialized features. Experiments on diverse set of 20 downstream tasks demonstrate the effectiveness of AgriCLIP framework, achieving an absolute gain of 7.8\% in terms of average zero-shot classification accuracy, over the standard CLIP adaptation via domain-specialized ALive dataset. Our ALive dataset and code can be accessible at \href{https://github.com/umair1221/AgriCLIP/tree/main}{Github}.
- Published
- 2024
5. Y-CA-Net: A Convolutional Attention Based Network for Volumetric Medical Image Segmentation
- Author
-
Sharif, Muhammad Hamza, Naseer, Muzammal, Yaqub, Mohammad, Xu, Min, and Guizani, Mohsen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent attention-based volumetric segmentation (VS) methods have achieved remarkable performance in the medical domain which focuses on modeling long-range dependencies. However, for voxel-wise prediction tasks, discriminative local features are key components for the performance of the VS models which is missing in attention-based VS methods. Aiming at resolving this issue, we deliberately incorporate the convolutional encoder branch with transformer backbone to extract local and global features in a parallel manner and aggregate them in Cross Feature Mixer Module (CFMM) for better prediction of segmentation mask. Consequently, we observe that the derived model, Y-CT-Net, achieves competitive performance on multiple medical segmentation tasks. For example, on multi-organ segmentation, Y-CT-Net achieves an 82.4% dice score, surpassing well-tuned VS Transformer/CNN-like baselines UNETR/ResNet-3D by 2.9%/1.4%. With the success of Y-CT-Net, we extend this concept with hybrid attention models, that derived Y-CH-Net model, which brings a 3% improvement in terms of HD95 score for same segmentation task. The effectiveness of both models Y-CT-Net and Y-CH-Net verifies our hypothesis and motivates us to initiate the concept of Y-CA-Net, a versatile generic architecture based upon any two encoders and a decoder backbones, to fully exploit the complementary strengths of both convolution and attention mechanisms. Based on experimental results, we argue Y-CA-Net is a key player in achieving superior results for volumetric segmentation.
- Published
- 2024
6. CDChat: A Large Multimodal Model for Remote Sensing Change Description
- Author
-
Noman, Mubashir, Ahsan, Noor, Naseer, Muzammal, Cholakkal, Hisham, Anwer, Rao Muhammad, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large multimodal models (LMMs) have shown encouraging performance in the natural image domain using visual instruction tuning. However, these LMMs struggle to describe the content of remote sensing images for tasks such as image or region grounding, classification, etc. Recently, GeoChat make an effort to describe the contents of the RS images. Although, GeoChat achieves promising performance for various RS tasks, it struggles to describe the changes between bi-temporal RS images which is a key RS task. This necessitates the development of an LMM that can describe the changes between the bi-temporal RS images. However, there is insufficiency of datasets that can be utilized to tune LMMs. In order to achieve this, we introduce a change description instruction dataset that can be utilized to finetune an LMM and provide better change descriptions for RS images. Furthermore, we show that the LLaVA-1.5 model, with slight modifications, can be finetuned on the change description instruction dataset and achieve favorably better performance.
- Published
- 2024
7. Distillation-free Scaling of Large SSMs for Images and Videos
- Author
-
Suleman, Hamid, Wasim, Syed Talal, Naseer, Muzammal, and Gall, Juergen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
State-space models (SSMs), exemplified by S4, have introduced a novel context modeling method by integrating state-space techniques into deep learning. However, they struggle with global context modeling due to their data-independent matrices. The Mamba model addressed this with data-dependent variants via the S6 selective-scan algorithm, enhancing context modeling, especially for long sequences. However, Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications. This paper addresses the scalability issue of large SSMs for image classification and action recognition without requiring additional techniques like knowledge distillation. We analyze the distinct characteristics of Mamba-based and Attention-based models, proposing a Mamba-Attention interleaved architecture that enhances scalability, robustness, and performance. We demonstrate that the stable and efficient interleaved architecture resolves the scalability issue of Mamba-based architectures for images and videos and increases robustness to common artifacts like JPEG compression. Our thorough evaluation on the ImageNet-1K, Kinetics-400 and Something-Something-v2 benchmarks demonstrates that our approach improves the accuracy of state-of-the-art Mamba-based architectures by up to $+1.7$.
- Published
- 2024
8. Multi-scale Cycle Tracking in Dynamic Planar Graphs
- Author
-
Rasheed, Farhan, Naseer, Abrar, Nilsson, Emma, Masood, Talha Bin, and Hotz, Ingrid
- Subjects
Computer Science - Graphics ,Computer Science - Computational Geometry ,Computer Science - Computer Vision and Pattern Recognition ,I.3.5 ,I.3.6 ,I.3.8 ,J.2 - Abstract
This paper presents a nested tracking framework for analyzing cycles in 2D force networks within granular materials. These materials are composed of interacting particles, whose interactions are described by a force network. Understanding the cycles within these networks at various scales and their evolution under external loads is crucial, as they significantly contribute to the mechanical and kinematic properties of the system. Our approach involves computing a cycle hierarchy by partitioning the 2D domain into segments bounded by cycles in the force network. We can adapt concepts from nested tracking graphs originally developed for merge trees by leveraging the duality between this partitioning and the cycles. We demonstrate the effectiveness of our method on two force networks derived from experiments with photoelastic disks., Comment: TopoInVis 2024, 11 pages
- Published
- 2024
9. Existence of Non-singular Stellar Solutions within the context of Electromagnetic Field: A Comparison between Minimal and Non-minimal Gravity Models
- Author
-
Naseer, Tayyab and Said, Jackson Levi
- Subjects
General Relativity and Quantum Cosmology - Abstract
In this paper, we explore the existence of various non-singular compact stellar solutions influenced by the Maxwell field within the matter-geometry coupling based modified gravity. We start this analysis by considering a static spherically symmetric spacetime which is associated with the isotropic matter distribution. We then determine the field equations corresponding to two specific functions of this modified theory. Along with these models, we also adopt different forms of the matter Lagrangian. We observe several unknowns in these equations such as the metric potentials, charge and fluid parameters. Thus, the embedding class-one condition and a particular realistic equation of state is used to construct their corresponding solutions. The former condition provides the metric components possessing three constants, and we calculate them through junction conditions. Further, four developed models are graphically analyzed under different parametric values. Finally, we find all our developed solutions well-agreeing with the physical requirements, offering valuable insights for future explorations of the stellar compositions in this theory., Comment: 24 pages, 13 figures
- Published
- 2024
- Full Text
- View/download PDF
10. STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models
- Author
-
Srivatsan, Koushik, Shamshad, Fahad, Naseer, Muzammal, and Nandakumar, Karthik
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The rapid proliferation of large-scale text-to-image generation (T2IG) models has led to concerns about their potential misuse in generating harmful content. Though many methods have been proposed for erasing undesired concepts from T2IG models, they only provide a false sense of security, as recent works demonstrate that concept-erased models (CEMs) can be easily deceived to generate the erased concept through adversarial attacks. The problem of adversarially robust concept erasing without significant degradation to model utility (ability to generate benign concepts) remains an unresolved challenge, especially in the white-box setting where the adversary has access to the CEM. To address this gap, we propose an approach called STEREO that involves two distinct stages. The first stage searches thoroughly enough for strong and diverse adversarial prompts that can regenerate an erased concept from a CEM, by leveraging robust optimization principles from adversarial training. In the second robustly erase once stage, we introduce an anchor-concept-based compositional objective to robustly erase the target concept at one go, while attempting to minimize the degradation on model utility. By benchmarking the proposed STEREO approach against four state-of-the-art concept erasure methods under three adversarial attacks, we demonstrate its ability to achieve a better robustness vs. utility trade-off. Our code and models are available at https://github.com/koushiksrivats/robust-concept-erasing., Comment: Project Page: https://koushiksrivats.github.io/robust-concept-erasing/
- Published
- 2024
11. PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
- Author
-
Hussein, Noor, Shamshad, Fahad, Naseer, Muzammal, and Nandakumar, Karthik
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Cryptography and Security - Abstract
Medical vision-language models (Med-VLMs) trained on large datasets of medical image-text pairs and later fine-tuned for specific tasks have emerged as a mainstream paradigm in medical image analysis. However, recent studies have highlighted the susceptibility of these Med-VLMs to adversarial attacks, raising concerns about their safety and robustness. Randomized smoothing is a well-known technique for turning any classifier into a model that is certifiably robust to adversarial perturbations. However, this approach requires retraining the Med-VLM-based classifier so that it classifies well under Gaussian noise, which is often infeasible in practice. In this paper, we propose a novel framework called PromptSmooth to achieve efficient certified robustness of Med-VLMs by leveraging the concept of prompt learning. Given any pre-trained Med-VLM, PromptSmooth adapts it to handle Gaussian noise by learning textual prompts in a zero-shot or few-shot manner, achieving a delicate balance between accuracy and robustness, while minimizing the computational overhead. Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level. Comprehensive experiments based on three Med-VLMs and across six downstream datasets of various imaging modalities demonstrate the efficacy of PromptSmooth. Our code and models are available at https://github.com/nhussein/promptsmooth., Comment: Accepted to MICCAI 2024
- Published
- 2024
12. Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors
- Author
-
Shamshad, Fahad, Naseer, Muzammal, and Nandakumar, Karthik
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Deep learning-based face recognition (FR) systems pose significant privacy risks by tracking users without their consent. While adversarial attacks can protect privacy, they often produce visible artifacts compromising user experience. To mitigate this issue, recent facial privacy protection approaches advocate embedding adversarial noise into the natural looking makeup styles. However, these methods require training on large-scale makeup datasets that are not always readily available. In addition, these approaches also suffer from dataset bias. For instance, training on makeup data that predominantly contains female faces could compromise protection efficacy for male faces. To handle these issues, we propose a test-time optimization approach that solely optimizes an untrained neural network to transfer makeup style from a reference to a source image in an adversarial manner. We introduce two key modules: a correspondence module that aligns regions between reference and source images in latent space, and a decoder with conditional makeup layers. The untrained decoder, optimized via carefully designed structural and makeup consistency losses, generates a protected image that resembles the source but incorporates adversarial makeup to deceive FR models. As our approach does not rely on training with makeup face datasets, it avoids potential male/female dataset biases while providing effective protection. We further extend the proposed approach to videos by leveraging on temporal correlations. Experiments on benchmark datasets demonstrate superior performance in face verification and identification tasks and effectiveness against commercial FR systems. Our code and models will be available at https://github.com/fahadshamshad/deep-facial-privacy-prior, Comment: Proceedings of ECCV Workshop on Explainable AI for Biometrics, 2024
- Published
- 2024
13. BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
- Author
-
Hanif, Asif, Shamshad, Fahad, Awais, Muhammad, Naseer, Muzammal, Khan, Fahad Shahbaz, Nandakumar, Karthik, Khan, Salman, and Anwer, Rao Muhammad
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets, we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods. Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications. Code is available at https://asif-hanif.github.io/baple/., Comment: MICCAI 2024
- Published
- 2024
14. Decoupled Anisotropic Buchdahl's Relativistic Models in $f(\mathbb{R},\mathbb{T})$ Theory
- Author
-
Naseer, Tayyab and Sharif, M.
- Subjects
General Relativity and Quantum Cosmology - Abstract
This paper constructs three different anisotropic extensions of the existing isotropic solution to the modified field equations through the gravitational decoupling in $f(\mathbb{R},\mathbb{T})$ theory. For this, we take a static sphere that is initially filled with the isotropic fluid and then add a new gravitational source producing anisotropy in the system. The field equations now correspond to the total matter configuration. We transform the radial metric component to split these equations into two sets characterizing their parent sources. The unknowns comprising in the first set are determined by considering the Buchdahl isotropic solution. On the other hand, we employ different constraints related to the additional gravitational source and make the second system solvable. Further, the constant triplet in Buchdahl solution is calculated by means of matching criteria between the interior and exterior geometries at the spherical boundary. The mass and radius of a compact star LMC X-4 are used to analyze the physical relevancy of the developed models. We conclude that our resulting models II and III are in well-agreement with acceptability conditions for the considered values of the parameters., Comment: 34 pages, 16 figures
- Published
- 2024
- Full Text
- View/download PDF
15. Two-Phase Segmentation Approach for Accurate Left Ventricle Segmentation in Cardiac MRI using Machine Learning
- Author
-
Tamoor, Maria, Ali, Abbas Raza, Philip, Philemon, Adil, Ruqqayia, Shahid, Rabia, and Naseer, Asma
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Accurate segmentation of the Left Ventricle (LV) holds substantial importance due to its implications in disease detection, regional analysis, and the development of complex models for cardiac surgical planning. CMR is a golden standard for diagnosis of serveral cardiac diseases. LV in CMR comprises of three distinct sections: Basal, Mid-Ventricle, and Apical. This research focuses on the precise segmentation of the LV from Cardiac MRI (CMR) scans, joining with the capabilities of Machine Learning (ML). The central challenge in this research revolves around the absence of a set of parameters applicable to all three types of LV slices. Parameters optimized for basal slices often fall short when applied to mid-ventricular and apical slices, and vice versa. To handle this issue, a new method is proposed to enhance LV segmentation. The proposed method involves using distinct sets of parameters for each type of slice, resulting in a two-phase segmentation approach. The initial phase categorizes images into three groups based on the type of LV slice, while the second phase aims to segment CMR images using parameters derived from the preceding phase. A publicly available dataset (Automated Cardiac Diagnosis Challenge (ACDC)) is used. 10-Fold Cross Validation is used and it achieved a mean score of 0.9228. Comprehensive testing indicates that the best parameter set for a particular type of slice does not perform adequately for the other slice types. All results show that the proposed approach fills a critical void in parameter standardization through a two-phase segmentation model for the LV, aiming to not only improve the accuracy of cardiac image analysis but also contribute advancements to the field of LV segmentation.
- Published
- 2024
16. Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification
- Author
-
Alkhunaizi, Naif, Almalik, Faris, Al-Refai, Rouqaiah, Naseer, Muzammal, and Nandakumar, Karthik
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
With the advent of large pre-trained transformer models, fine-tuning these models for various downstream tasks is a critical problem. Paucity of training data, the existence of data silos, and stringent privacy constraints exacerbate this fine-tuning problem in the medical imaging domain, creating a strong need for algorithms that enable collaborative fine-tuning of pre-trained models. Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning. In this work, we systematically investigate various federated PEFT strategies for adapting a Vision Transformer (ViT) model (pre-trained on a large natural image dataset) for medical image classification. Apart from evaluating known PEFT techniques, we introduce new federated variants of PEFT algorithms such as visual prompt tuning (VPT), low-rank decomposition of visual prompts, stochastic block attention fine-tuning, and hybrid PEFT methods like low-rank adaptation (LoRA)+VPT. Moreover, we perform a thorough empirical analysis to identify the optimal PEFT method for the federated setting and understand the impact of data distribution on federated PEFT, especially for out-of-domain (OOD) and non-IID data. The key insight of this study is that while most federated PEFT methods work well for in-domain transfer, there is a substantial accuracy vs. efficiency trade-off when dealing with OOD and non-IID scenarios, which is commonly the case in medical imaging. Specifically, every order of magnitude reduction in fine-tuned/exchanged parameters can lead to a 4% drop in accuracy. Thus, the initial model choice is crucial for federated PEFT. It is preferable to use medical foundation models learned from in-domain medical image data (if available) rather than general vision models.
- Published
- 2024
17. Data-Driven Abstractions via Binary-Tree Gaussian Processes for Formal Verification
- Author
-
Schön, Oliver, Naseer, Shammakh, Wooding, Ben, and Soudjani, Sadegh
- Subjects
Computer Science - Logic in Computer Science ,Computer Science - Formal Languages and Automata Theory ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control - Abstract
To advance formal verification of stochastic systems against temporal logic requirements for handling unknown dynamics, researchers have been designing data-driven approaches inspired by breakthroughs in the underlying machine learning techniques. As one promising research direction, abstraction-based solutions based on Gaussian process (GP) regression have become popular for their ability to learn a representation of the latent system from data with a quantified error. Results obtained based on this model are then translated to the true system via various methods. In a recent publication, GPs using a so-called binary-tree kernel have demonstrated a polynomial speedup w.r.t. the size of the data compared to their vanilla version, outcompeting all existing sparse GP approximations. Incidentally, the resulting binary-tree Gaussian process (BTGP) is characteristic for its piecewise-constant posterior mean and covariance functions, naturally abstracting the input space into discrete partitions. In this paper, we leverage this natural abstraction of the BTGP for formal verification, eliminating the need for cumbersome abstraction and error quantification procedures. We show that the BTGP allows us to construct an interval Markov chain model of the unknown system with a speedup that is polynomial w.r.t. the size of the abstraction compared to alternative approaches. We provide a delocalized error quantification via a unified formula even when the true dynamics do not live in the function space of the BTGP. This allows us to compute upper and lower bounds on the probability of satisfying reachability specifications that are robust to both aleatoric and epistemic uncertainties., Comment: Published at IFAC conference on analysis and design of hybrid systems (ADHS) 2024
- Published
- 2024
18. Study of Anisotropic Compact Stars in $f(\mathcal{R},\mathcal{T},\mathcal{R}_{\chi\xi}\mathcal{T}^{\chi\xi})$ Gravity
- Author
-
Sharif, M. and Naseer, T.
- Subjects
General Relativity and Quantum Cosmology - Abstract
This paper aims to examine the composition of various spherically symmetric star models which are coupled with anisotropic configuration in $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ gravity, where $\mathcal{Q}=\mathcal{R}_{\chi\xi}\mathcal{T}^{\chi\xi}$. We discuss the physical features of compact objects by employing bag model equation of state and construct the modified field equations in terms of Krori-Barua ansatz involving unknowns ($A,B,C$). The observational data of 4U 1820-30,~Vela X-I,~SAX J 1808.4-3658,~RXJ 1856-37 and Her X-I is used to calculate these unknowns and bag constant $\mathfrak{B_c}$. Further, we observe the behavior of energy density, radial and tangential pressure as well as anisotropy through graphical interpretation for a viable model $\mathcal{R}+\varrho\mathcal{Q}$ of this gravity. For a particular value of the coupling constant $\varrho$, we study the behavior of mass, compactness, redshift and the energy bounds. The stability of the considered stars is also checked by using two criteria. We conclude that our developed structure in this gravity is in well-agreement with all the physical requirements., Comment: 28 pages, 8 figures
- Published
- 2024
- Full Text
- View/download PDF
19. VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
- Author
-
Bharadwaj, Rohit, Gani, Hanan, Naseer, Muzammal, Khan, Fahad Shahbaz, and Khan, Salman
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The recent developments in Large Multi-modal Video Models (Video-LMMs) have significantly enhanced our ability to interpret and analyze video data. Despite their impressive capabilities, current Video-LMMs have not been evaluated for anomaly detection tasks, which is critical to their deployment in practical scenarios e.g., towards identifying deepfakes, manipulated video content, traffic accidents and crimes. In this paper, we introduce VANE-Bench, a benchmark designed to assess the proficiency of Video-LMMs in detecting and localizing anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models, encompassing a variety of subtle anomalies and inconsistencies grouped into five categories: unnatural transformations, unnatural appearance, pass-through, disappearance and sudden appearance. Additionally, our benchmark features real-world samples from existing anomaly detection datasets, focusing on crime-related irregularities, atypical pedestrian behavior, and unusual events. The task is structured as a visual question-answering challenge to gauge the models' ability to accurately detect and localize the anomalies within the videos. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies. In conclusion, our research offers significant insights into the current capabilities of Video-LMMs in the realm of anomaly detection, highlighting the importance of our work in evaluating and improving these models for real-world applications. Our code and data is available at https://hananshafi.github.io/vane-benchmark/, Comment: Data: https://huggingface.co/datasets/rohit901/VANE-Bench
- Published
- 2024
20. Towards Evaluating the Robustness of Visual State Space Models
- Author
-
Malik, Hashmat Shadab, Shamshad, Fahad, Naseer, Muzammal, Nandakumar, Karthik, Khan, Fahad Shahbaz, and Khan, Salman
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In this work, we present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios, including occlusions, image structure, common corruptions, and adversarial attacks, and compare their performance to well-established architectures such as transformers and Convolutional Neural Networks. Furthermore, we investigate the resilience of VSSMs to object-background compositional changes on sophisticated benchmarks designed to test model performance in complex visual scenes. We also assess their robustness on object detection and segmentation tasks using corrupted datasets that mimic real-world scenarios. To gain a deeper understanding of VSSMs' adversarial robustness, we conduct a frequency-based analysis of adversarial attacks, evaluating their performance against low-frequency and high-frequency perturbations. Our findings highlight the strengths and limitations of VSSMs in handling complex visual corruptions, offering valuable insights for future research. Our code and models will be available at https://github.com/HashmatShadab/MambaRobustness.
- Published
- 2024
21. On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
- Author
-
Malik, Hashmat Shadab, Saeed, Numan, Hanif, Asif, Naseer, Muzammal, Yaqub, Mohammad, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks in recent years. However, their vulnerability to adversarial attacks remains largely unexplored, raising serious concerns regarding the real-world deployment of tools employing such models in the healthcare sector. This underscores the importance of investigating the robustness of existing models. In this context, our work aims to empirically examine the adversarial robustness across current volumetric segmentation architectures, encompassing Convolutional, Transformer, and Mamba-based models. We extend this investigation across four volumetric segmentation datasets, evaluating robustness under both white box and black box adversarial attacks. Overall, we observe that while both pixel and frequency-based attacks perform reasonably well under \emph{white box} setting, the latter performs significantly better under transfer-based black box attacks. Across our experiments, we observe transformer-based models show higher robustness than convolution-based models with Mamba-based models being the most vulnerable. Additionally, we show that large-scale training of volumetric segmentation models improves the model's robustness against adversarial attacks. The code and robust models are available at https://github.com/HashmatShadab/Robustness-of-Volumetric-Medical-Segmentation-Models., Comment: Accepted at British Machine Vision Conference 2024
- Published
- 2024
22. Multi-Granularity Language-Guided Multi-Object Tracking
- Author
-
Li, Yuhao, Naseer, Muzammal, Cao, Jiale, Zhu, Yu, Sun, Jinqiu, Zhang, Yanning, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations. To develop LG-MOT, we annotate existing MOT datasets with scene-and instance-level language descriptions. We then encode both instance-and scene-level language information into high-dimensional embeddings, which are utilized to guide the visual features during training. At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions. Extensive experiments on three benchmarks, MOT17, DanceTrack and SportsMOT, reveal the merits of the proposed contributions leading to state-of-the-art performance. On the DanceTrack test set, our LG-MOT achieves an absolute gain of 2.2\% in terms of target object association (IDF1 score), compared to the baseline using only visual features. Further, our LG-MOT exhibits strong cross-domain generalizability. The dataset and code will be available at ~\url{https://github.com/WesLee88524/LG-MOT}.
- Published
- 2024
23. Anisotropic Durgapal-Fuloria Neutron Stars in $f(\mathcal{R},\mathrm{T}^{2})$ Gravity
- Author
-
Naseer, Tayyab, Sharif, M., Manzoor, Sana, and Fatima, Arooj
- Subjects
General Relativity and Quantum Cosmology - Abstract
The main purpose of this paper is to obtain physically stable stellar models coupled with anisotropic matter distribution in the context of $f(\mathcal{R},\mathrm{T}^{2})$ theory. For this, we consider a static spherical geometry and formulate modified field equations containing various unknowns such as matter determinants and metric potentials. We then obtain a unique solution to these equations by employing Durgapal-Fuloria ansatz possessing a constant doublet. We also use matching criteria to calculate the values of these constants by considering the Schwarzschild exterior spacetime. Two different viable models of this modified theory are adopted to analyze the behavior of effective matter variables, anisotropy, energy conditions, compactness and redshift in the interiors of Her X-1, PSR J0348-0432, LMC X-4, SMC X-1, Cen X-3, and SAX J 1808.4-3658 star candidates. We also check the stability of these models by using three different physical tests. It is concluded that our considered stars satisfy all the physical requirements and are stable in this modified gravity for the considered parametric values., Comment: 28 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
24. Room Temperature Ferroelectricity and Electrically Tunable Berry Curvature Dipole in III-V Monolayers
- Author
-
Naseer, Ateeb, Priydarshi, Achintya, Ghosh, Pritam, Ahammed, Raihan, Chauhan, Yogesh Singh, Bhowmick, Somnath, and Agarwal, Amit
- Subjects
Condensed Matter - Materials Science - Abstract
Two-dimensional ferroelectric monolayers are promising candidates for compact memory devices and flexible electronics. Here, through first-principles calculations, we predict room temperature ferroelectricity in AB-type monolayers comprising group III (A = Al, In, Ga) and group V (B = As, P, Sb) elements. We show that their spontaneous polarization, oriented out-of-plane, ranges from 9.48 to 13.96 pC/m, outperforming most known 2D ferroelectric. We demonstrate electric field tunable Berry curvature dipole and nonlinear Hall current in these monolayers. Additionally, we highlight their applicability in next-generation memory devices by forming efficient ferroelectric tunnel junctions, especially in InP, which supports high tunneling electroresistance. Our findings motivate further exploration of these monolayers for studying the interplay between Berry curvature and ferroelectricity and for integrating these ferroelectric monolayers in next-generation electronic devices.
- Published
- 2024
- Full Text
- View/download PDF
25. Multi-modal Generation via Cross-Modal In-Context Learning
- Author
-
Kumar, Amandeep, Naseer, Muzammal, Narayan, Sanath, Anwer, Rao Muhammad, Khan, Salman, and Cholakkal, Hisham
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we study the problem of generating novel images from complex multimodal prompt sequences. While existing methods achieve promising results for text-to-image generation, they often struggle to capture fine-grained details from lengthy prompts and maintain contextual coherence within prompt sequences. Moreover, they often result in misaligned image generation for prompt sequences featuring multiple objects. To address this, we propose a Multi-modal Generation via Cross-Modal In-Context Learning (MGCC) method that generates novel images from complex multimodal prompt sequences by leveraging the combined capabilities of large language models (LLMs) and diffusion models. Our MGCC comprises a novel Cross-Modal Refinement module to explicitly learn cross-modal dependencies between the text and image in the LLM embedding space, and a contextual object grounding module to generate object bounding boxes specifically targeting scenes with multiple objects. Our MGCC demonstrates a diverse range of multimodal capabilities, like novel image generation, the facilitation of multimodal dialogue, and generation of texts. Experimental evaluations on two benchmark datasets, demonstrate the effectiveness of our method. On Visual Story Generation (VIST) dataset with multimodal inputs, our MGCC achieves a CLIP Similarity score of $0.652$ compared to SOTA GILL $0.641$. Similarly, on Visual Dialogue Context (VisDial) having lengthy dialogue sequences, our MGCC achieves an impressive CLIP score of $0.660$, largely outperforming existing SOTA method scoring $0.645$. Code: https://github.com/VIROBO-15/MGCC, Comment: Technical Report
- Published
- 2024
26. How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
- Author
-
Khattak, Muhammad Uzair, Naeem, Muhammad Ferjad, Hassan, Jameel, Naseer, Muzammal, Tombari, Federico, Khan, Fahad Shahbaz, and Khan, Salman
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world applications such as robotics, AI assistants, medical surgery, and autonomous vehicles. The widespread adoption of Video-LMMs in our daily lives underscores the importance of ensuring and evaluating their robust performance in mirroring human-like reasoning and interaction capabilities in complex, real-world contexts. However, existing benchmarks for Video-LMMs primarily focus on general video comprehension abilities and neglect assessing their reasoning capabilities over complex videos in the real-world context, and robustness of these models through the lens of user prompts as text queries. In this paper, we present the Complex Video Reasoning and Robustness Evaluation Suite (CVRR-ES), a novel benchmark that comprehensively assesses the performance of Video-LMMs across 11 diverse real-world video dimensions. We evaluate 9 recent models, including both open-source and closed-source variants, and find that most of the Video-LMMs, especially open-source ones, struggle with robustness and reasoning when dealing with complex videos. Based on our analysis, we develop a training-free Dual-Step Contextual Prompting (DSCP) technique to enhance the performance of existing Video-LMMs. Our findings provide valuable insights for building the next generation of human-centric AI systems with advanced robustness and reasoning capabilities. Our dataset and code are publicly available at: https://mbzuai-oryx.github.io/CVRR-Evaluation-Suite/., Comment: Technical report
- Published
- 2024
27. Study of Charged Cylindrical Collapse in $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ Gravity
- Author
-
Sharif, M. and Naseer, Tayyab
- Subjects
General Relativity and Quantum Cosmology - Abstract
This paper investigates the effects of electromagnetic field on the gravitational collapse in $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ theory, where $\mathcal{Q} = \mathcal{R}_{\varphi\vartheta} \mathcal{T}^{\varphi\vartheta}$. For this, we assume dynamical cylindrically symmetric self-gravitating geometry which is coupled with generalized anisotropic matter distribution as well as dissipation flux. We adopt the model $\mathcal{R}+\Phi\sqrt{\mathcal{T}}+\Psi\mathcal{Q}$ to formulate the corresponding dynamical and transport equations by employing the Misner-Sharp as well as M\"{u}ler-Israel Stewart formalisms, where $\Phi$ and $\Psi$ are real-valued coupling constants. The influence of state variables, heat dissipation, charge and the bulk viscosity on the collapsing phenomenon is then studied by establishing some relations between these evolution equations. Moreover, the Weyl scalar and the modified field equations are expressed in terms of each other. We apply some constraints on the considered modified model and the fluid configuration to obtain conformally flat spacetime. Finally, we address different cases to check how the modified corrections and charge affect the collapse rate of cylindrical matter source., Comment: 25 pages, no figure
- Published
- 2024
- Full Text
- View/download PDF
28. Challenges in Care and Service Provision for Older Adults with Intellectual Disabilities and Complex Age-Related Conditions in Ireland
- Author
-
Fintan Sheerin, Sandra Fleming, Peter May, Philip McCallion, Mary McCarron, Amara Naseer, Georgia Lalor, and Maureen D'Eath
- Abstract
Background: People with intellectual disabilities are living longer and are increasingly diverse, with health and care needs that are varied and complex. Without changes to funding, services have found it difficult to respond to needs and wishes. Method: A descriptive mixed methods design study, data were collected through questionnaire, focus groups and individual interviews from intellectual disability service managers, direct care staff and older people with intellectual disabilities and family members. Results: Continued reticence on the part of some community healthcare providers to treat people with intellectual disability was noted. Although some service innovations were noted, housing, staffing levels, staff mix and the timely provision of equipment were all reported to impact the ability of services to respond to changing needs. Current per-capita funding practices were reported as unresponsive to growing age-related complexity and fundamentally unsustainable. Conclusions: The health inequalities experienced by people with intellectual disabilities are compounded as they age with complex age-related health needs. There is an urgent need for revision of the service model in Ireland and instigation of flexible and responsive approaches to funding.
- Published
- 2024
- Full Text
- View/download PDF
29. Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels
- Author
-
Dharmasiri, Amaya, Naseer, Muzammal, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large-scale vision 2D vision language models, such as CLIP can be aligned with a 3D encoder to learn generalizable (open-vocabulary) 3D vision models. However, current methods require supervised pre-training for such alignment, and the performance of such 3D zero-shot models remains sub-optimal for real-world adaptation. In this work, we propose an optimization framework: Cross-MoST: Cross-Modal Self-Training, to improve the label-free classification performance of a zero-shot 3D vision model by simply leveraging unlabeled 3D data and their accompanying 2D views. We propose a student-teacher framework to simultaneously process 2D views and 3D point clouds and generate joint pseudo labels to train a classifier and guide cross-model feature alignment. Thereby we demonstrate that 2D vision language models such as CLIP can be used to complement 3D representation learning to improve classification performance without the need for expensive class annotations. Using synthetic and real-world 3D datasets, we further demonstrate that Cross-MoST enables efficient cross-modal knowledge exchange resulting in both image and point cloud modalities learning from each other's rich representations., Comment: To be published in Workshop for Learning 3D with Multi-View Supervision (3DMV) at CVPR 2024
- Published
- 2024
30. S-box Security Analysis of NIST Lightweight Cryptography Candidates: A Critical Empirical Study
- Author
-
Naseer, Mahnoor, Tariq, Sundas, Riaz, Naveed, Ahmed, Naveed, and Hussain, Mureed
- Subjects
Computer Science - Cryptography and Security - Abstract
In the resource-constrained world of the digital landscape, lightweight cryptography plays a critical role in safeguarding information and ensuring the security of various systems, devices, and communication channels. Its efficient and resource-friendly nature makes it the ideal solution for applications where computational power is limited. In response to the growing need for platform-specific implementations, NIST issued a call for standardization of Lightweight cryptography algorithms in 2018. Ascon emerged as the winner of this competition. NIST initially established general evaluation criteria for a standard lightweight scheme including security strength, mitigation against side-channel and fault-injection attacks, and implementation efficiency. To verify the security claims, evaluating the individual components used in any cryptographic algorithm is a crucial step. The quality of a substitution box (S-box) significantly impacts the overall security of a cryptographic primitive. This paper analyzes the S-boxes of six finalists in the NIST Lightweight Cryptography (LWC) standardization process. We evaluate them based on well-established cryptographic properties. Our analysis explores how these properties influence the S-boxes' resistance against known cryptanalytic attacks and potential implementation-specific vulnerabilities, thus reflecting on their compliance with NIST's security requirements.
- Published
- 2024
31. Study of Decoupled Anisotropic Solutions in $f(R,T,R_{\rho\eta}T^{\rho\eta})$ Theory
- Author
-
Naseer, Tayyab and Sharif, M.
- Subjects
General Relativity and Quantum Cosmology - Abstract
In this paper, we consider isotropic solution and extend it to two different exact well-behaved spherical anisotropic solutions through minimal geometric deformation method in $f(R,T,R_{\rho\eta}T^{\rho\eta})$ gravity. We only deform the radial metric component that separates the field equations into two sets corresponding to their original sources. The first set corresponds to perfect matter distribution while the other set exhibits the effects of additional source, i.e., anisotropy. The isotropic system is resolved by assuming the metric potentials proposed by Krori-Barua while the second set needs one constraint to be solved. The physical acceptability and consistency of the obtained solutions are analyzed through graphical analysis of effective matter components and energy bounds. We also examine mass, surface redshift and compactness of the resulting solutions. For particular values of the decoupling parameter, our both solutions turn out to be viable and stable. We conclude that this curvature-matter coupling gravity provides more stable solutions corresponding to a self-gravitating geometry., Comment: 29 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
32. Language Guided Domain Generalized Medical Image Segmentation
- Author
-
Kunhimon, Shahina, Naseer, Muzammal, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Single source domain generalization (SDG) holds promise for more reliable and consistent image segmentation across real-world clinical settings particularly in the medical domain, where data privacy and acquisition cost constraints often limit the availability of diverse datasets. Depending solely on visual features hampers the model's capacity to adapt effectively to various domains, primarily because of the presence of spurious correlations and domain-specific characteristics embedded within the image features. Incorporating text features alongside visual features is a potential solution to enhance the model's understanding of the data, as it goes beyond pixel-level information to provide valuable context. Textual cues describing the anatomical structures, their appearances, and variations across various imaging modalities can guide the model in domain adaptation, ultimately contributing to more robust and consistent segmentation. In this paper, we propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features to learn a more robust feature representation. We assess the effectiveness of our text-guided contrastive feature alignment technique in various scenarios, including cross-modality, cross-sequence, and cross-site settings for different segmentation tasks. Our approach achieves favorable performance against existing methods in literature. Our code and model weights are available at https://github.com/ShahinaKK/LG_SDG.git., Comment: Accepted at ISBI2024
- Published
- 2024
33. Composed Video Retrieval via Enriched Context and Discriminative Embeddings
- Author
-
Thawakar, Omkar, Naseer, Muzammal, Anwer, Rao Muhammad, Khan, Salman, Felsberg, Michael, Shah, Mubarak, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases. Existing works predominantly rely on visual queries combined with modification text to distinguish relevant videos. However, such a strategy struggles to fully preserve the rich query-specific context in retrieved target videos and only represents the target video using visual embedding. We introduce a novel CoVR framework that leverages detailed language descriptions to explicitly encode query-specific contextual information and learns discriminative embeddings of vision only, text only and vision-text for better alignment to accurately retrieve matched target videos. Our proposed framework can be flexibly employed for both composed video (CoVR) and image (CoIR) retrieval tasks. Experiments on three datasets show that our approach obtains state-of-the-art performance for both CovR and zero-shot CoIR tasks, achieving gains as high as around 7% in terms of recall@K=1 score. Our code, models, detailed language descriptions for WebViD-CoVR dataset are available at \url{https://github.com/OmkarThawakar/composed-video-retrieval}, Comment: CVPR-2024
- Published
- 2024
34. VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
- Author
-
Mahmood, Ahmad, Vayani, Ashmal, Naseer, Muzammal, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent studies have demonstrated the effectiveness of Large Language Models (LLMs) as reasoning modules that can deconstruct complex tasks into more manageable sub-tasks, particularly when applied to visual reasoning tasks for images. In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs. Ours is a novel approach to extend the utility of LLMs in the context of video tasks, leveraging their capacity to generalize from minimal input and output demonstrations within a contextual framework. By presenting LLMs with pairs of instructions and their corresponding high-level programs, we harness their contextual learning capabilities to generate executable visual programs for video understanding. To enhance program's accuracy and robustness, we implement two important strategies. Firstly, we employ a feedback-generation approach, powered by GPT-3.5, to rectify errors in programs utilizing unsupported functions. Secondly, taking motivation from recent works on self refinement of LLM outputs, we introduce an iterative procedure for improving the quality of the in-context examples by aligning the initial outputs to the outputs that would have been generated had the LLM not been bound by the structure of the in-context examples. Our results on several video-specific tasks, including visual QA, video anticipation, pose estimation and multi-video QA illustrate the efficacy of these enhancements in improving the performance of visual programming approaches for video tasks.
- Published
- 2024
35. Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
- Author
-
Watawana, Hasindri, Ranasinghe, Kanchana, Mahmood, Tariq, Naseer, Muzammal, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations. In this paper, we explore how the combination of domain specific natural language information with such hierarchical visual representations can benefit rich representation learning for medical image tasks. Building on automated language description generation for features visible in histopathology images, we present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images. We explore contrastive objectives and granular language description based text alignment at multiple hierarchies to inject language modality information into the visual representations. Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets. Our framework also provides better interpretability with our language aligned representation space. Code is available at https://github.com/Hasindri/HLSS., Comment: 13 pages and 5 figures
- Published
- 2024
36. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
- Author
-
Gemini Team, Georgiev, Petko, Lei, Ving Ian, Burnell, Ryan, Bai, Libin, Gulati, Anmol, Tanzer, Garrett, Vincent, Damien, Pan, Zhufeng, Wang, Shibo, Mariooryad, Soroosh, Ding, Yifan, Geng, Xinyang, Alcober, Fred, Frostig, Roy, Omernick, Mark, Walker, Lexi, Paduraru, Cosmin, Sorokin, Christina, Tacchetti, Andrea, Gaffney, Colin, Daruki, Samira, Sercinoglu, Olcan, Gleicher, Zach, Love, Juliette, Voigtlaender, Paul, Jain, Rohan, Surita, Gabriela, Mohamed, Kareem, Blevins, Rory, Ahn, Junwhan, Zhu, Tao, Kawintiranon, Kornraphop, Firat, Orhan, Gu, Yiming, Zhang, Yujing, Rahtz, Matthew, Faruqui, Manaal, Clay, Natalie, Gilmer, Justin, Co-Reyes, JD, Penchev, Ivo, Zhu, Rui, Morioka, Nobuyuki, Hui, Kevin, Haridasan, Krishna, Campos, Victor, Mahdieh, Mahdis, Guo, Mandy, Hassan, Samer, Kilgour, Kevin, Vezer, Arpi, Cheng, Heng-Tze, de Liedekerke, Raoul, Goyal, Siddharth, Barham, Paul, Strouse, DJ, Noury, Seb, Adler, Jonas, Sundararajan, Mukund, Vikram, Sharad, Lepikhin, Dmitry, Paganini, Michela, Garcia, Xavier, Yang, Fan, Valter, Dasha, Trebacz, Maja, Vodrahalli, Kiran, Asawaroengchai, Chulayuth, Ring, Roman, Kalb, Norbert, Soares, Livio Baldini, Brahma, Siddhartha, Steiner, David, Yu, Tianhe, Mentzer, Fabian, He, Antoine, Gonzalez, Lucas, Xu, Bibo, Kaufman, Raphael Lopez, Shafey, Laurent El, Oh, Junhyuk, Hennigan, Tom, Driessche, George van den, Odoom, Seth, Lucic, Mario, Roelofs, Becca, Lall, Sid, Marathe, Amit, Chan, Betty, Ontanon, Santiago, He, Luheng, Teplyashin, Denis, Lai, Jonathan, Crone, Phil, Damoc, Bogdan, Ho, Lewis, Riedel, Sebastian, Lenc, Karel, Yeh, Chih-Kuan, Chowdhery, Aakanksha, Xu, Yang, Kazemi, Mehran, Amid, Ehsan, Petrushkina, Anastasia, Swersky, Kevin, Khodaei, Ali, Chen, Gowoon, Larkin, Chris, Pinto, Mario, Yan, Geng, Badia, Adria Puigdomenech, Patil, Piyush, Hansen, Steven, Orr, Dave, Arnold, Sebastien M. R., Grimstad, Jordan, Dai, Andrew, Douglas, Sholto, Sinha, Rishika, Yadav, Vikas, Chen, Xi, Gribovskaya, Elena, Austin, Jacob, Zhao, Jeffrey, Patel, Kaushal, Komarek, Paul, Austin, Sophia, Borgeaud, Sebastian, Friso, Linda, Goyal, Abhimanyu, Caine, Ben, Cao, Kris, Chung, Da-Woon, Lamm, Matthew, Barth-Maron, Gabe, Kagohara, Thais, Olszewska, Kate, Chen, Mia, Shivakumar, Kaushik, Agarwal, Rishabh, Godhia, Harshal, Rajwar, Ravi, Snaider, Javier, Dotiwalla, Xerxes, Liu, Yuan, Barua, Aditya, Ungureanu, Victor, Zhang, Yuan, Batsaikhan, Bat-Orgil, Wirth, Mateo, Qin, James, Danihelka, Ivo, Doshi, Tulsee, Chadwick, Martin, Chen, Jilin, Jain, Sanil, Le, Quoc, Kar, Arjun, Gurumurthy, Madhu, Li, Cheng, Sang, Ruoxin, Liu, Fangyu, Lamprou, Lampros, Munoz, Rich, Lintz, Nathan, Mehta, Harsh, Howard, Heidi, Reynolds, Malcolm, Aroyo, Lora, Wang, Quan, Blanco, Lorenzo, Cassirer, Albin, Griffith, Jordan, Das, Dipanjan, Lee, Stephan, Sygnowski, Jakub, Fisher, Zach, Besley, James, Powell, Richard, Ahmed, Zafarali, Paulus, Dominik, Reitter, David, Borsos, Zalan, Joshi, Rishabh, Pope, Aedan, Hand, Steven, Selo, Vittorio, Jain, Vihan, Sethi, Nikhil, Goel, Megha, Makino, Takaki, May, Rhys, Yang, Zhen, Schalkwyk, Johan, Butterfield, Christina, Hauth, Anja, Goldin, Alex, Hawkins, Will, Senter, Evan, Brin, Sergey, Woodman, Oliver, Ritter, Marvin, Noland, Eric, Giang, Minh, Bolina, Vijay, Lee, Lisa, Blyth, Tim, Mackinnon, Ian, Reid, Machel, Sarvana, Obaid, Silver, David, Chen, Alexander, Wang, Lily, Maggiore, Loren, Chang, Oscar, Attaluri, Nithya, Thornton, Gregory, Chiu, Chung-Cheng, Bunyan, Oskar, Levine, Nir, Chung, Timothy, Eltyshev, Evgenii, Si, Xiance, Lillicrap, Timothy, Brady, Demetra, Aggarwal, Vaibhav, Wu, Boxi, Xu, Yuanzhong, McIlroy, Ross, Badola, Kartikeya, Sandhu, Paramjit, Moreira, Erica, Stokowiec, Wojciech, Hemsley, Ross, Li, Dong, Tudor, Alex, Shyam, Pranav, Rahimtoroghi, Elahe, Haykal, Salem, Sprechmann, Pablo, Zhou, Xiang, Mincu, Diana, Li, Yujia, Addanki, Ravi, Krishna, Kalpesh, Wu, Xiao, Frechette, Alexandre, Eyal, Matan, Dafoe, Allan, Lacey, Dave, Whang, Jay, Avrahami, Thi, Zhang, Ye, Taropa, Emanuel, Lin, Hanzhao, Toyama, Daniel, Rutherford, Eliza, Sano, Motoki, Choe, HyunJeong, Tomala, Alex, Safranek-Shrader, Chalence, Kassner, Nora, Pajarskas, Mantas, Harvey, Matt, Sechrist, Sean, Fortunato, Meire, Lyu, Christina, Elsayed, Gamaleldin, Kuang, Chenkai, Lottes, James, Chu, Eric, Jia, Chao, Chen, Chih-Wei, Humphreys, Peter, Baumli, Kate, Tao, Connie, Samuel, Rajkumar, Santos, Cicero Nogueira dos, Andreassen, Anders, Rakićević, Nemanja, Grewe, Dominik, Kumar, Aviral, Winkler, Stephanie, Caton, Jonathan, Brock, Andrew, Dalmia, Sid, Sheahan, Hannah, Barr, Iain, Miao, Yingjie, Natsev, Paul, Devlin, Jacob, Behbahani, Feryal, Prost, Flavien, Sun, Yanhua, Myaskovsky, Artiom, Pillai, Thanumalayan Sankaranarayana, Hurt, Dan, Lazaridou, Angeliki, Xiong, Xi, Zheng, Ce, Pardo, Fabio, Li, Xiaowei, Horgan, Dan, Stanton, Joe, Ambar, Moran, Xia, Fei, Lince, Alejandro, Wang, Mingqiu, Mustafa, Basil, Webson, Albert, Lee, Hyo, Anil, Rohan, Wicke, Martin, Dozat, Timothy, Sinha, Abhishek, Piqueras, Enrique, Dabir, Elahe, Upadhyay, Shyam, Boral, Anudhyan, Hendricks, Lisa Anne, Fry, Corey, Djolonga, Josip, Su, Yi, Walker, Jake, Labanowski, Jane, Huang, Ronny, Misra, Vedant, Chen, Jeremy, Skerry-Ryan, RJ, Singh, Avi, Rijhwani, Shruti, Yu, Dian, Castro-Ros, Alex, Changpinyo, Beer, Datta, Romina, Bagri, Sumit, Hrafnkelsson, Arnar Mar, Maggioni, Marcello, Zheng, Daniel, Sulsky, Yury, Hou, Shaobo, Paine, Tom Le, Yang, Antoine, Riesa, Jason, Rogozinska, Dominika, Marcus, Dror, Badawy, Dalia El, Zhang, Qiao, Wang, Luyu, Miller, Helen, Greer, Jeremy, Sjos, Lars Lowe, Nova, Azade, Zen, Heiga, Chaabouni, Rahma, Rosca, Mihaela, Jiang, Jiepu, Chen, Charlie, Liu, Ruibo, Sainath, Tara, Krikun, Maxim, Polozov, Alex, Lespiau, Jean-Baptiste, Newlan, Josh, Cankara, Zeyncep, Kwak, Soo, Xu, Yunhan, Chen, Phil, Coenen, Andy, Meyer, Clemens, Tsihlas, Katerina, Ma, Ada, Gottweis, Juraj, Xing, Jinwei, Gu, Chenjie, Miao, Jin, Frank, Christian, Cankara, Zeynep, Ganapathy, Sanjay, Dasgupta, Ishita, Hughes-Fitt, Steph, Chen, Heng, Reid, David, Rong, Keran, Fan, Hongmin, van Amersfoort, Joost, Zhuang, Vincent, Cohen, Aaron, Gu, Shixiang Shane, Mohananey, Anhad, Ilic, Anastasija, Tobin, Taylor, Wieting, John, Bortsova, Anna, Thacker, Phoebe, Wang, Emma, Caveness, Emily, Chiu, Justin, Sezener, Eren, Kaskasoli, Alex, Baker, Steven, Millican, Katie, Elhawaty, Mohamed, Aisopos, Kostas, Lebsack, Carl, Byrd, Nathan, Dai, Hanjun, Jia, Wenhao, Wiethoff, Matthew, Davoodi, Elnaz, Weston, Albert, Yagati, Lakshman, Ahuja, Arun, Gao, Isabel, Pundak, Golan, Zhang, Susan, Azzam, Michael, Sim, Khe Chai, Caelles, Sergi, Keeling, James, Sharma, Abhanshu, Swing, Andy, Li, YaGuang, Liu, Chenxi, Bostock, Carrie Grimes, Bansal, Yamini, Nado, Zachary, Anand, Ankesh, Lipschultz, Josh, Karmarkar, Abhijit, Proleev, Lev, Ittycheriah, Abe, Yeganeh, Soheil Hassas, Polovets, George, Faust, Aleksandra, Sun, Jiao, Rrustemi, Alban, Li, Pen, Shivanna, Rakesh, Liu, Jeremiah, Welty, Chris, Lebron, Federico, Baddepudi, Anirudh, Krause, Sebastian, Parisotto, Emilio, Soricut, Radu, Xu, Zheng, Bloxwich, Dawn, Johnson, Melvin, Neyshabur, Behnam, Mao-Jones, Justin, Wang, Renshen, Ramasesh, Vinay, Abbas, Zaheer, Guez, Arthur, Segal, Constant, Nguyen, Duc Dung, Svensson, James, Hou, Le, York, Sarah, Milan, Kieran, Bridgers, Sophie, Gworek, Wiktor, Tagliasacchi, Marco, Lee-Thorp, James, Chang, Michael, Guseynov, Alexey, Hartman, Ale Jakse, Kwong, Michael, Zhao, Ruizhe, Kashem, Sheleem, Cole, Elizabeth, Miech, Antoine, Tanburn, Richard, Phuong, Mary, Pavetic, Filip, Cevey, Sebastien, Comanescu, Ramona, Ives, Richard, Yang, Sherry, Du, Cosmo, Li, Bo, Zhang, Zizhao, Iinuma, Mariko, Hu, Clara Huiyi, Roy, Aurko, Bijwadia, Shaan, Zhu, Zhenkai, Martins, Danilo, Saputro, Rachel, Gergely, Anita, Zheng, Steven, Jia, Dawei, Antonoglou, Ioannis, Sadovsky, Adam, Gu, Shane, Bi, Yingying, Andreev, Alek, Samangooei, Sina, Khan, Mina, Kocisky, Tomas, Filos, Angelos, Kumar, Chintu, Bishop, Colton, Yu, Adams, Hodkinson, Sarah, Mittal, Sid, Shah, Premal, Moufarek, Alexandre, Cheng, Yong, Bloniarz, Adam, Lee, Jaehoon, Pejman, Pedram, Michel, Paul, Spencer, Stephen, Feinberg, Vladimir, Xiong, Xuehan, Savinov, Nikolay, Smith, Charlotte, Shakeri, Siamak, Tran, Dustin, Chesus, Mary, Bohnet, Bernd, Tucker, George, von Glehn, Tamara, Muir, Carrie, Mao, Yiran, Kazawa, Hideto, Slone, Ambrose, Soparkar, Kedar, Shrivastava, Disha, Cobon-Kerr, James, Sharman, Michael, Pavagadhi, Jay, Araya, Carlos, Misiunas, Karolis, Ghelani, Nimesh, Laskin, Michael, Barker, David, Li, Qiujia, Briukhov, Anton, Houlsby, Neil, Glaese, Mia, Lakshminarayanan, Balaji, Schucher, Nathan, Tang, Yunhao, Collins, Eli, Lim, Hyeontaek, Feng, Fangxiaoyu, Recasens, Adria, Lai, Guangda, Magni, Alberto, De Cao, Nicola, Siddhant, Aditya, Ashwood, Zoe, Orbay, Jordi, Dehghani, Mostafa, Brennan, Jenny, He, Yifan, Xu, Kelvin, Gao, Yang, Saroufim, Carl, Molloy, James, Wu, Xinyi, Arnold, Seb, Chang, Solomon, Schrittwieser, Julian, Buchatskaya, Elena, Radpour, Soroush, Polacek, Martin, Giordano, Skye, Bapna, Ankur, Tokumine, Simon, Hellendoorn, Vincent, Sottiaux, Thibault, Cogan, Sarah, Severyn, Aliaksei, Saleh, Mohammad, Thakoor, Shantanu, Shefey, Laurent, Qiao, Siyuan, Gaba, Meenu, Chang, Shuo-yiin, Swanson, Craig, Zhang, Biao, Lee, Benjamin, Rubenstein, Paul Kishan, Song, Gan, Kwiatkowski, Tom, Koop, Anna, Kannan, Ajay, Kao, David, Schuh, Parker, Stjerngren, Axel, Ghiasi, Golnaz, Gibson, Gena, Vilnis, Luke, Yuan, Ye, Ferreira, Felipe Tiengo, Kamath, Aishwarya, Klimenko, Ted, Franko, Ken, Xiao, Kefan, Bhattacharya, Indro, Patel, Miteyan, Wang, Rui, Morris, Alex, Strudel, Robin, Sharma, Vivek, Choy, Peter, Hashemi, Sayed Hadi, Landon, Jessica, Finkelstein, Mara, Jhakra, Priya, Frye, Justin, Barnes, Megan, Mauger, Matthew, Daun, Dennis, Baatarsukh, Khuslen, Tung, Matthew, Farhan, Wael, Michalewski, Henryk, Viola, Fabio, Quitry, Felix de Chaumont, Lan, Charline Le, Hudson, Tom, Wang, Qingze, Fischer, Felix, Zheng, Ivy, White, Elspeth, Dragan, Anca, Alayrac, Jean-baptiste, Ni, Eric, Pritzel, Alexander, Iwanicki, Adam, Isard, Michael, Bulanova, Anna, Zilka, Lukas, Dyer, Ethan, Sachan, Devendra, Srinivasan, Srivatsan, Muckenhirn, Hannah, Cai, Honglong, Mandhane, Amol, Tariq, Mukarram, Rae, Jack W., Wang, Gary, Ayoub, Kareem, FitzGerald, Nicholas, Zhao, Yao, Han, Woohyun, Alberti, Chris, Garrette, Dan, Krishnakumar, Kashyap, Gimenez, Mai, Levskaya, Anselm, Sohn, Daniel, Matak, Josip, Iturrate, Inaki, Chang, Michael B., Xiang, Jackie, Cao, Yuan, Ranka, Nishant, Brown, Geoff, Hutter, Adrian, Mirrokni, Vahab, Chen, Nanxin, Yao, Kaisheng, Egyed, Zoltan, Galilee, Francois, Liechty, Tyler, Kallakuri, Praveen, Palmer, Evan, Ghemawat, Sanjay, Liu, Jasmine, Tao, David, Thornton, Chloe, Green, Tim, Jasarevic, Mimi, Lin, Sharon, Cotruta, Victor, Tan, Yi-Xuan, Fiedel, Noah, Yu, Hongkun, Chi, Ed, Neitz, Alexander, Heitkaemper, Jens, Sinha, Anu, Zhou, Denny, Sun, Yi, Kaed, Charbel, Hulse, Brice, Mishra, Swaroop, Georgaki, Maria, Kudugunta, Sneha, Farabet, Clement, Shafran, Izhak, Vlasic, Daniel, Tsitsulin, Anton, Ananthanarayanan, Rajagopal, Carin, Alen, Su, Guolong, Sun, Pei, V, Shashank, Carvajal, Gabriel, Broder, Josef, Comsa, Iulia, Repina, Alena, Wong, William, Chen, Warren Weilun, Hawkins, Peter, Filonov, Egor, Loher, Lucia, Hirnschall, Christoph, Wang, Weiyi, Ye, Jingchen, Burns, Andrea, Cate, Hardie, Wright, Diana Gage, Piccinini, Federico, Zhang, Lei, Lin, Chu-Cheng, Gog, Ionel, Kulizhskaya, Yana, Sreevatsa, Ashwin, Song, Shuang, Cobo, Luis C., Iyer, Anand, Tekur, Chetan, Garrido, Guillermo, Xiao, Zhuyun, Kemp, Rupert, Zheng, Huaixiu Steven, Li, Hui, Agarwal, Ananth, Ngani, Christel, Goshvadi, Kati, Santamaria-Fernandez, Rebeca, Fica, Wojciech, Chen, Xinyun, Gorgolewski, Chris, Sun, Sean, Garg, Roopal, Ye, Xinyu, Eslami, S. M. Ali, Hua, Nan, Simon, Jon, Joshi, Pratik, Kim, Yelin, Tenney, Ian, Potluri, Sahitya, Thiet, Lam Nguyen, Yuan, Quan, Luisier, Florian, Chronopoulou, Alexandra, Scellato, Salvatore, Srinivasan, Praveen, Chen, Minmin, Koverkathu, Vinod, Dalibard, Valentin, Xu, Yaming, Saeta, Brennan, Anderson, Keith, Sellam, Thibault, Fernando, Nick, Huot, Fantine, Jung, Junehyuk, Varadarajan, Mani, Quinn, Michael, Raul, Amit, Le, Maigo, Habalov, Ruslan, Clark, Jon, Jalan, Komal, Bullard, Kalesha, Singhal, Achintya, Luong, Thang, Wang, Boyu, Rajayogam, Sujeevan, Eisenschlos, Julian, Jia, Johnson, Finchelstein, Daniel, Yakubovich, Alex, Balle, Daniel, Fink, Michael, Agarwal, Sameer, Li, Jing, Dvijotham, Dj, Pal, Shalini, Kang, Kai, Konzelmann, Jaclyn, Beattie, Jennifer, Dousse, Olivier, Wu, Diane, Crocker, Remi, Elkind, Chen, Jonnalagadda, Siddhartha Reddy, Lee, Jong, Holtmann-Rice, Dan, Kallarackal, Krystal, Liu, Rosanne, Vnukov, Denis, Vats, Neera, Invernizzi, Luca, Jafari, Mohsen, Zhou, Huanjie, Taylor, Lilly, Prendki, Jennifer, Wu, Marcus, Eccles, Tom, Liu, Tianqi, Kopparapu, Kavya, Beaufays, Francoise, Angermueller, Christof, Marzoca, Andreea, Sarcar, Shourya, Dib, Hilal, Stanway, Jeff, Perbet, Frank, Trdin, Nejc, Sterneck, Rachel, Khorlin, Andrey, Li, Dinghua, Wu, Xihui, Goenka, Sonam, Madras, David, Goldshtein, Sasha, Gierke, Willi, Zhou, Tong, Liu, Yaxin, Liang, Yannie, White, Anais, Li, Yunjie, Singh, Shreya, Bahargam, Sanaz, Epstein, Mark, Basu, Sujoy, Lao, Li, Ozturel, Adnan, Crous, Carl, Zhai, Alex, Lu, Han, Tung, Zora, Gaur, Neeraj, Walton, Alanna, Dixon, Lucas, Zhang, Ming, Globerson, Amir, Uy, Grant, Bolt, Andrew, Wiles, Olivia, Nasr, Milad, Shumailov, Ilia, Selvi, Marco, Piccinno, Francesco, Aguilar, Ricardo, McCarthy, Sara, Khalman, Misha, Shukla, Mrinal, Galic, Vlado, Carpenter, John, Villela, Kevin, Zhang, Haibin, Richardson, Harry, Martens, James, Bosnjak, Matko, Belle, Shreyas Rammohan, Seibert, Jeff, Alnahlawi, Mahmoud, McWilliams, Brian, Singh, Sankalp, Louis, Annie, Ding, Wen, Popovici, Dan, Simicich, Lenin, Knight, Laura, Mehta, Pulkit, Gupta, Nishesh, Shi, Chongyang, Fatehi, Saaber, Mitrovic, Jovana, Grills, Alex, Pagadora, Joseph, Petrova, Dessie, Eisenbud, Danielle, Zhang, Zhishuai, Yates, Damion, Mittal, Bhavishya, Tripuraneni, Nilesh, Assael, Yannis, Brovelli, Thomas, Jain, Prateek, Velimirovic, Mihajlo, Akbulut, Canfer, Mu, Jiaqi, Macherey, Wolfgang, Kumar, Ravin, Xu, Jun, Qureshi, Haroon, Comanici, Gheorghe, Wiesner, Jeremy, Gong, Zhitao, Ruddock, Anton, Bauer, Matthias, Felt, Nick, GP, Anirudh, Arnab, Anurag, Zelle, Dustin, Rothfuss, Jonas, Rosgen, Bill, Shenoy, Ashish, Seybold, Bryan, Li, Xinjian, Mudigonda, Jayaram, Erdogan, Goker, Xia, Jiawei, Simsa, Jiri, Michi, Andrea, Yao, Yi, Yew, Christopher, Kan, Steven, Caswell, Isaac, Radebaugh, Carey, Elisseeff, Andre, Valenzuela, Pedro, McKinney, Kay, Paterson, Kim, Cui, Albert, Latorre-Chimoto, Eri, Kim, Solomon, Zeng, William, Durden, Ken, Ponnapalli, Priya, Sosea, Tiberiu, Choquette-Choo, Christopher A., Manyika, James, Robenek, Brona, Vashisht, Harsha, Pereira, Sebastien, Lam, Hoi, Velic, Marko, Owusu-Afriyie, Denese, Lee, Katherine, Bolukbasi, Tolga, Parrish, Alicia, Lu, Shawn, Park, Jane, Venkatraman, Balaji, Talbert, Alice, Rosique, Lambert, Cheng, Yuchung, Sozanschi, Andrei, Paszke, Adam, Kumar, Praveen, Austin, Jessica, Li, Lu, Salama, Khalid, Kim, Wooyeol, Dukkipati, Nandita, Baryshnikov, Anthony, Kaplanis, Christos, Sheng, XiangHai, Chervonyi, Yuri, Unlu, Caglar, Casas, Diego de Las, Askham, Harry, Tunyasuvunakool, Kathryn, Gimeno, Felix, Poder, Siim, Kwak, Chester, Miecnikowski, Matt, Dimitriev, Alek, Parisi, Aaron, Liu, Dangyi, Tsai, Tomy, Shevlane, Toby, Kouridi, Christina, Garmon, Drew, Goedeckemeyer, Adrian, Brown, Adam R., Vijayakumar, Anitha, Elqursh, Ali, Jazayeri, Sadegh, Huang, Jin, Carthy, Sara Mc, Hoover, Jay, Kim, Lucy, Kumar, Sandeep, Chen, Wei, Biles, Courtney, Bingham, Garrett, Rosen, Evan, Wang, Lisa, Tan, Qijun, Engel, David, Pongetti, Francesco, de Cesare, Dario, Hwang, Dongseong, Yu, Lily, Pullman, Jennifer, Narayanan, Srini, Levin, Kyle, Gopal, Siddharth, Li, Megan, Aharoni, Asaf, Trinh, Trieu, Lo, Jessica, Casagrande, Norman, Vij, Roopali, Matthey, Loic, Ramadhana, Bramandia, Matthews, Austin, Carey, CJ, Johnson, Matthew, Goranova, Kremena, Shah, Rohin, Ashraf, Shereen, Dasgupta, Kingshuk, Larsen, Rasmus, Wang, Yicheng, Vuyyuru, Manish Reddy, Jiang, Chong, Ijazi, Joana, Osawa, Kazuki, Smith, Celine, Boppana, Ramya Sree, Bilal, Taylan, Koizumi, Yuma, Xu, Ying, Altun, Yasemin, Shabat, Nir, Bariach, Ben, Korchemniy, Alex, Choo, Kiam, Ronneberger, Olaf, Iwuanyanwu, Chimezie, Zhao, Shubin, Soergel, David, Hsieh, Cho-Jui, Cai, Irene, Iqbal, Shariq, Sundermeyer, Martin, Chen, Zhe, Bursztein, Elie, Malaviya, Chaitanya, Biadsy, Fadi, Shroff, Prakash, Dhillon, Inderjit, Latkar, Tejasi, Dyer, Chris, Forbes, Hannah, Nicosia, Massimo, Nikolaev, Vitaly, Greene, Somer, Georgiev, Marin, Wang, Pidong, Martin, Nina, Sedghi, Hanie, Zhang, John, Banzal, Praseem, Fritz, Doug, Rao, Vikram, Wang, Xuezhi, Zhang, Jiageng, Patraucean, Viorica, Du, Dayou, Mordatch, Igor, Jurin, Ivan, Liu, Lewis, Dubey, Ayush, Mohan, Abhi, Nowakowski, Janek, Ion, Vlad-Doru, Wei, Nan, Tojo, Reiko, Raad, Maria Abi, Hudson, Drew A., Keshava, Vaishakh, Agrawal, Shubham, Ramirez, Kevin, Wu, Zhichun, Nguyen, Hoang, Liu, Ji, Sewak, Madhavi, Petrini, Bryce, Choi, DongHyun, Philips, Ivan, Wang, Ziyue, Bica, Ioana, Garg, Ankush, Wilkiewicz, Jarek, Agrawal, Priyanka, Guo, Danhao, Xue, Emily, Shaik, Naseer, Leach, Andrew, Khan, Sadh MNM, Wiesinger, Julia, Jerome, Sammy, Chakladar, Abhishek, Wang, Alek Wenjiao, Ornduff, Tina, Abu, Folake, Ghaffarkhah, Alireza, Wainwright, Marcus, Cortes, Mario, Liu, Frederick, Maynez, Joshua, Terzis, Andreas, Samangouei, Pouya, Mansour, Riham, Kępa, Tomasz, Aubet, François-Xavier, Algymr, Anton, Banica, Dan, Weisz, Agoston, Orban, Andras, Senges, Alexandre, Andrejczuk, Ewa, Geller, Mark, Santo, Niccolo Dal, Anklin, Valentin, Merey, Majd Al, Baeuml, Martin, Strohman, Trevor, Bai, Junwen, Petrov, Slav, Wu, Yonghui, Hassabis, Demis, Kavukcuoglu, Koray, Dean, Jeffrey, and Vinyals, Oriol
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
- Published
- 2024
37. Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
- Author
-
Noman, Mubashir, Naseer, Muzammal, Cholakkal, Hisham, Anwar, Rao Muhammad, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data. Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data. Different from standard natural image datasets, remote sensing data is acquired from various sensor technologies and exhibit diverse range of scale variations as well as modalities. Existing satellite image pre-training methods either ignore the scale information present in the remote sensing imagery or restrict themselves to use only a single type of data modality. In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities. Our proposed approach, named SatMAE++, performs multi-scale pre-training and utilizes convolution based upsampling blocks to reconstruct the image at higher scales making it extensible to include more scales. Compared to existing works, the proposed SatMAE++ with multi-scale pre-training is equally effective for both optical as well as multi-spectral imagery. Extensive experiments on six datasets reveal the merits of proposed contributions, leading to state-of-the-art performance on all datasets. SatMAE++ achieves mean average precision (mAP) gain of 2.5\% for multi-label classification task on BigEarthNet dataset. Our code and pre-trained models are available at \url{https://github.com/techmn/satmae_pp}., Comment: Accepted at CVPR 2024
- Published
- 2024
38. ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes
- Author
-
Malik, Hashmat Shadab, Huzaifa, Muhammad, Naseer, Muzammal, Khan, Salman, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Given the large-scale multi-modal training of recent vision-based models and their generalization capabilities, understanding the extent of their robustness is critical for their real-world deployment. In this work, we evaluate the resilience of current vision-based models against diverse object-to-background context variations. The majority of robustness evaluation methods have introduced synthetic datasets to induce changes to object characteristics (viewpoints, scale, color) or utilized image transformation techniques (adversarial changes, common corruptions) on real images to simulate shifts in distributions. Recent works have explored leveraging large language models and diffusion models to generate changes in the background. However, these methods either lack in offering control over the changes to be made or distort the object semantics, making them unsuitable for the task. Our method, on the other hand, can induce diverse object-to-background changes while preserving the original semantics and appearance of the object. To achieve this goal, we harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate a broad spectrum of object-to-background changes. We induce both natural and adversarial background changes by either modifying the textual prompts or optimizing the latents and textual embedding of text-to-image models. We produce various versions of standard vision datasets (ImageNet, COCO), incorporating either diverse and realistic backgrounds into the images or introducing color, texture, and adversarial changes in the background. We conduct extensive experiments to analyze the robustness of vision-based models against object-to-background context variations across diverse tasks. Code https://github.com/Muhammad-Huzaifaa/ObjectCompose.
- Published
- 2024
39. Location of the zeros of quaternionic polynomials using matrix tools
- Author
-
Rather, N. A., Wani, Naseer Ahmad, and Dar, Ishfaq
- Subjects
Mathematics - Complex Variables ,30A10, 30C10, 30C15 - Abstract
Using a variety of matrix techniques, the problem of locating the left eigenvalues of the quaternion companion matrices are investigated in this paper. In a recent paper, Dar et al. [6], proved that the zeros of a quaternionic polynomial and the left eigenvalues of corresponding companion matrix are same. In view of this, we use various newly developed matrix techniques to prove various results concerning the location of the zeros of regular polynomials of a quaternionic variable with quaternionic coefficients, which include an extension of the result of A. L. Cauchy as well.
- Published
- 2024
40. Broadband spectral and temporal study of Ton 599 during the brightest January 2023 flare
- Author
-
Manzoor, Aaqib, Shah, Zahir, Sahayanathan, Sunder, Iqbal, Naseer, and Dar, Athar A.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
In this work, we provide a detailed analysis of the broadband temporal and spectral properties of the blazar Ton\,599 by using the observations from \emph{Fermi}-LAT and \emph{Swift}-XRT/UVOT telescopes, during its brightest $\gamma$-ray flaring. The one-day bin $\gamma$-ray light curve exhibits multiple substructures with asymmetric and symmetric profiles. Notably, the $\gamma$-ray light curve shows a maximum flux of $\rm 3.63 \times 10^{-6}\, ph \,cm^{-2}\,s^{-1}$ on MJD\,59954.50, which is the highest flux ever observed from this source. The correlation between the $\gamma$-ray flux and $\gamma$-ray spectral indices suggests a moderate harder when the brighter trend. Taking $\gamma$-ray light curve as the reference, a strong correlation is observed with X-ray, optical, and UV energies. Additionally, the $\gamma$-rays and optical/UV emissions exhibit higher variability compared to X-rays. To understand the parameter variation during the active state of the source, we conducted a statistical broadband spectral modelling of the source in 10 flux intervals of equal duration. A one-zone leptonic model involving synchrotron, synchrotron-self-Compton, and external-Compton processes successfully reproduces the broadband SED in each of these flux intervals. We observed that the flux variation during the active state is mainly associated with the variation in the magnetic field and the particle spectral indices., Comment: 9 pages, 5 figures
- Published
- 2024
- Full Text
- View/download PDF
41. Effects of $f(\mathcal{R},\mathcal{T},\mathcal{R}_{\gamma\upsilon}\mathcal{T}^{\gamma\upsilon})$ Gravity on Anisotropic Charged Compact Structures
- Author
-
Sharif, M. and Naseer, T.
- Subjects
General Relativity and Quantum Cosmology - Abstract
This paper focuses on the analysis of static spherically symmetric anisotropic solutions in the presence of electromagnetic field through the gravitational decoupling approach in $f(\mathcal{R},\mathcal{T},\mathcal{R}_{\gamma\upsilon}\mathcal{T}^{\gamma\upsilon})$ gravity. We use geometric deformation only on radial metric function and obtain two sets of the field equations. The first set deals with isotropic fluid while the second set yields the influence of anisotropic source. We consider the modified Krori-Barua charged isotropic solution for spherical self-gravitating star to deal with the isotropic system. The second set of the field equations is solved by taking two different constraints. We then investigate physical acceptability of the obtained solutions through graphical analysis of the effective physical variables and energy conditions. We also analyze the effects of charge on different parameters, (i.e., mass, compactness and redshift) for the resulting solutions. It is found that our both solutions are viable as well as stable for specific values of the decoupling parameter $\varphi$ and charge. We conclude that a self-gravitating star shows more stable behavior in this gravity., Comment: 30 pages, 8 figures
- Published
- 2024
- Full Text
- View/download PDF
42. MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
- Author
-
Gani, Hanan, Naseer, Muzammal, Khan, Fahad, and Khan, Salman
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Volumetric medical segmentation is a critical component of 3D medical image analysis that delineates different semantic regions. Deep neural networks have significantly improved volumetric medical segmentation, but they generally require large-scale annotated data to achieve better performance, which can be expensive and prohibitive to obtain. To address this limitation, existing works typically perform transfer learning or design dedicated pretraining-finetuning stages to learn representative features. However, the mismatch between the source and target domain can make it challenging to learn optimal representation for volumetric data, while the multi-stage training demands higher compute as well as careful selection of stage-specific design choices. In contrast, we propose a universal training framework called MedContext that is architecture-agnostic and can be incorporated into any existing training framework for 3D medical segmentation. Our approach effectively learns self supervised contextual cues jointly with the supervised voxel segmentation task without requiring large-scale annotated volumetric medical data or dedicated pretraining-finetuning stages. The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space. The effectiveness of MedContext is validated across multiple 3D medical datasets and four state-of-the-art model architectures. Our approach demonstrates consistent gains in segmentation performance across datasets and different architectures even in few-shot data scenarios. Our code and pretrained models are available at https://github.com/hananshafi/MedContext, Comment: Accepted at MICCAI 2024
- Published
- 2024
43. Dynamics of bubble migration in a square channel flow of a viscoelastic fluid
- Author
-
Naseer, Hafiz Usman, Izbassarov, Daulet, Ahmed, Zaheer, and Muradoglu, Metin
- Subjects
Physics - Fluid Dynamics - Abstract
Cross-stream migration of a deformable bubble is investigated computationally in a pressure-driven channel flow of a viscoelastic fluid via interface-resolved simulations. The flow equations are solved fully coupled with the Giesekus model equations using the front-tracking method and extensive simulations are performed for a wide range of flow parameters to reveal the effects of bubble deformability, fluid elasticity, shear-thinning, and fluid inertia on the bubble migration dynamics. Migration rate of a bubble is found to be much higher than that of a solid particle under similar flow conditions mainly due to free-slip condition on its surface. It is observed that direction of bubble migration can be altered by varying shear-thinning of the ambient fluid. With a strong shear-thinning, the bubble migrates towards the wall while it migrates towards the center of the channel in a purely elastic fluid without shear-thinning. An onset of elastic flow instability is observed beyond a critical Weissenberg number, which in turn causes a path instability even for a nearly spherical bubble. An inertial path instability is also observed once bubble deformation exceeds a critical value. Shear-thinning is found to be suppressing the path instability in a viscoelastic fluid with a high polymer concentration whereas it reverses its role and promotes path instability in a dilute polymer solution. It is found that bubble migration towards wall induces a secondary flow with a velocity that is about an order of magnitude higher than the one induced by a solid particle under similar flow conditions.
- Published
- 2024
44. GAN-driven Electromagnetic Imaging of 2-D Dielectric Scatterers
- Author
-
Naseer, Ehtasham, Sandhu, Ali Imran, Siddique, Muhammad Adnan, Ahmed, Waqas W., Farhat, Mohamed, and Wu, Ying
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Inverse scattering problems are inherently challenging, given the fact they are ill-posed and nonlinear. This paper presents a powerful deep learning-based approach that relies on generative adversarial networks to accurately and efficiently reconstruct randomly-shaped two-dimensional dielectric objects from amplitudes of multi-frequency scattered electric fields. An adversarial autoencoder (AAE) is trained to learn to generate the scatterer's geometry from a lower-dimensional latent representation constrained to adhere to the Gaussian distribution. A cohesive inverse neural network (INN) framework is set up comprising a sequence of appropriately designed dense layers, the already-trained generator as well as a separately trained forward neural network. The images reconstructed at the output of the inverse network are validated through comparison with outputs from the forward neural network, addressing the non-uniqueness challenge inherent to electromagnetic (EM) imaging problems. The trained INN demonstrates an enhanced robustness, evidenced by a mean binary cross-entropy (BCE) loss of $0.13$ and a structure similarity index (SSI) of $0.90$. The study not only demonstrates a significant reduction in computational load, but also marks a substantial improvement over traditional objective-function-based methods. It contributes both to the fields of machine learning and EM imaging by offering a real-time quantitative imaging approach. The results obtained with the simulated data, for both training and testing, yield promising results and may open new avenues for radio-frequency inverse imaging.
- Published
- 2024
45. Time resolved spectroscopy of a GRS 1915+105 flare during its unusual low state using AstroSat
- Author
-
Boked, Sajad, Maqbool, Bari, Jithesh, V., Misra, Ranjeev, Iqbal, Naseer, and Bhulla, Yashpal
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
Since its discovery in 1992, GRS 1915+105 has been among the brightest sources in the X-ray sky. However, in early 2018, it dimmed significantly and has stayed in this faint state ever since. We report on AstroSat and NuSTAR observation of GRS 1915+105 in its unusual low/hard state during 2019 May. We performed time-resolved spectroscopy of the X-ray flares observed in this state and found that the spectra can be fitted well using highly ionised absorption models. We further show that the spectra can also be fitted using a highly relativistic reflection dominated model, where for the lamp post geometry, the X-ray emitting source is always very close to the central black hole. For both interpretations, the flare can be attributed to a change in the intrinsic flux, rather than dramatic variation in the absorption or geometry. These reflection dominated spectra are very similar to the reflection dominated spectra reported for Active Galactic Nuclei in their low flux states., Comment: 12 pages, 11 figures, Accepted for publication in MNRAS
- Published
- 2024
- Full Text
- View/download PDF
46. Influence of $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ Gravity on Cylindrical Collapse
- Author
-
Sharif, M. and Naseer, Tayyab
- Subjects
General Relativity and Quantum Cosmology - Abstract
This article examines the dynamics of gravitational collapse in $f(\mathcal{R},\mathcal{T},\mathcal{Q})$ gravity, where $\mathcal{Q}=\mathcal{R}_{\mathrm{ab}}\mathcal{T}^{\mathrm{ab}}$. We consider self-gravitating anisotropic cylindrical geometry whose interior is filled with dissipative matter configuration and match it with exterior cylindrically symmetric spacetime at the hypersurface through junction conditions. We employ the Misner-Sharp and M\"{u}ler-Israel Stewart formalisms to derive the dynamical as well as transport equations corresponding to the model $\mathcal{R}+\Phi\sqrt{\mathcal{T}}+\Psi\mathcal{Q}$, where $\Phi$ and $\Psi$ are arbitrary coupling constants. We then establish some relations between these equations through which the impact of effective matter variables, heat dissipation and the bulk viscosity on the collapse rate is studied. Further, we express the Weyl scalar in terms of the effective matter sector. We also obtain the conformal flatness by applying some restrictions on the considered model and taking dust configuration into the account. Finally, we investigate various cases to check whether the modified corrections increase or decrease the collapse rate., Comment: 23 pages, no figure
- Published
- 2024
- Full Text
- View/download PDF
47. Compositional analysis of dark colored particulates homogeneously emitted with combustion gases (dark plumes) from brick making kilns situated in the area of Khyber Pakhtunkhwa, Pakistan
- Author
-
Hassan, Iatizaz, Khan, Naseer Ahmed, Syed, Naveed ul Hasan, Memon, Najma, Habib, Muddasar, and Barki, Khalid Mehmood
- Published
- 2024
48. In vivo evaluation of a Nano-enabled therapeutic vitreous substitute for the precise delivery of triamcinolone to the posterior segment of the eye
- Author
-
Naik, Kruti, du Toit, Lisa Claire, Ally, Naseer, and Choonara, Yahya Essop
- Published
- 2024
- Full Text
- View/download PDF
49. Characterization of heavy metal-associated bacteria from petroleum-contaminated soil and their resistogram and antibiogram analysis
- Author
-
Basit, Abdul, Andleeb, Saiqa, Liaqat, Iram, Ashraf, Nasra, Ali, Shaukat, Naseer, Anum, Nazir, Aisha, and Kiyani, Fahad
- Published
- 2024
- Full Text
- View/download PDF
50. Bioconversion of fruit peels to levan by solid state fermentation and statistical optimization by response surface methodology
- Author
-
Saeed, Shagufta, Shahid, Mahnoor, Naseer, Rahat, Ghazanfar, Misbah, and Irfan, Muhammad
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.