12,169 results on '"Samaras A"'
Search Results
2. Instance-Aware Generalized Referring Expression Segmentation
- Author
-
Nguyen, E-Ro, Le, Hieu, Samaras, Dimitris, and Ryoo, Michael
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Recent works on Generalized Referring Expression Segmentation (GRES) struggle with handling complex expressions referring to multiple distinct objects. This is because these methods typically employ an end-to-end foreground-background segmentation and lack a mechanism to explicitly differentiate and associate different object instances to the text query. To this end, we propose InstAlign, a method that incorporates object-level reasoning into the segmentation process. Our model leverages both text and image inputs to extract a set of object-level tokens that capture both the semantic information in the input prompt and the objects within the image. By modeling the text-object alignment via instance-level supervision, each token uniquely represents an object segment in the image, while also aligning with relevant semantic information from the text. Extensive experiments on the gRefCOCO and Ref-ZOM benchmarks demonstrate that our method significantly advances state-of-the-art performance, setting a new standard for precise and flexible GRES., Comment: 12 pages, 7 figures
- Published
- 2024
3. Direct and Explicit 3D Generation from a Single Image
- Author
-
Wu, Haoyu, Karumuri, Meher Gitika, Zou, Chuhang, Bang, Seungbae, Li, Yuelong, Samaras, Dimitris, and Hadap, Sunil
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time., Comment: 3DV 2025, Project page: https://hao-yu-wu.github.io/gen3d/
- Published
- 2024
4. Fast constrained sampling in pre-trained diffusion models
- Author
-
Graikos, Alexandros, Jojic, Nebojsa, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Diffusion models have dominated the field of large, generative image models, with the prime examples of Stable Diffusion and DALL-E 3 being widely adopted. These models have been trained to perform text-conditioned generation on vast numbers of image-caption pairs and as a byproduct, have acquired general knowledge about natural image statistics. However, when confronted with the task of constrained sampling, e.g. generating the right half of an image conditioned on the known left half, applying these models is a delicate and slow process, with previously proposed algorithms relying on expensive iterative operations that are usually orders of magnitude slower than text-based inference. This is counter-intuitive, as image-conditioned generation should rely less on the difficult-to-learn semantic knowledge that links captions and imagery, and should instead be achievable by lower-level correlations among image pixels. In practice, inverse models are trained or tuned separately for each inverse problem, e.g. by providing parts of images during training as an additional condition, to allow their application in realistic settings. However, we argue that this is not necessary and propose an algorithm for fast-constrained sampling in large pre-trained diffusion models (Stable Diffusion) that requires no expensive backpropagation operations through the model and produces results comparable even to the state-of-the-art \emph{tuned} models. Our method is based on a novel optimization perspective to sampling under constraints and employs a numerical approximation to the expensive gradients, previously computed using backpropagation, incurring significant speed-ups.
- Published
- 2024
5. TopoDiffusionNet: A Topology-aware Diffusion Model
- Author
-
Gupta, Saumya, Samaras, Dimitris, and Chen, Chao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models excel at creating visually impressive images but often struggle to generate images with a specified topology. The Betti number, which represents the number of structures in an image, is a fundamental measure in topology. Yet, diffusion models fail to satisfy even this basic constraint. This limitation restricts their utility in applications requiring exact control, like robotics and environmental modeling. To address this, we propose TopoDiffusionNet (TDN), a novel approach that enforces diffusion models to maintain the desired topology. We leverage tools from topological data analysis, particularly persistent homology, to extract the topological structures within an image. We then design a topology-based objective function to guide the denoising process, preserving intended structures while suppressing noisy ones. Our experiments across four datasets demonstrate significant improvements in topological accuracy. TDN is the first to integrate topology with diffusion models, opening new avenues of research in this area., Comment: 20 pages, 11 figures, 7 tables
- Published
- 2024
6. Constraints on the Hubble and matter density parameters with and without modelling the CMB anisotropies
- Author
-
Banik, Indranil and Samaras, Nick
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,Astrophysics - Astrophysics of Galaxies ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We consider constraints on the Hubble parameter $H_0$ and the matter density parameter $\Omega_{\mathrm{M}}$ from: (i) the age of the Universe based on old stars and stellar populations in the Galactic disc and halo (Cimatti & Moresco 2023); (ii) the turnover scale in the matter power spectrum, which tells us the cosmological horizon at the epoch of matter-radiation equality (Philcox et al. 2022); and (iii) the shape of the expansion history from supernovae (SNe) and baryon acoustic oscillations (BAOs) with no absolute calibration of either, a technique known as uncalibrated cosmic standards (UCS; Lin, Chen, & Mack 2021). A narrow region is consistent with all three constraints just outside their $1\sigma$ uncertainties. Although this region is defined by techniques unrelated to the physics of recombination and the sound horizon then, the standard $Planck$ fit to the CMB anisotropies falls precisely in this region. This concordance argues against early-time explanations for the anomalously high local estimate of $H_0$ (the 'Hubble tension'), which can only be reconciled with the age constraint at an implausibly low $\Omega_{\mathrm{M}}$. We suggest instead that outflow from the local KBC supervoid (Keenan, Barger, & Cowie 2013) inflates redshifts in the nearby universe and thus the apparent local $H_0$. Given the difficulties with solutions in the early universe, we argue that the most promising alternative to a local void is a modification to the expansion history at late times, perhaps due to a changing dark energy density., Comment: 5 pages, 1 figure, no tables. Submitted to The Open Journal of Astrophysics
- Published
- 2024
7. Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
- Author
-
Xu, Jingyi, Le, Hieu, Shu, Zhixin, Wang, Yang, Tsai, Yi-Hsuan, and Samaras, Dimitris
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Human emotional expression is inherently dynamic, complex, and fluid, characterized by smooth transitions in intensity throughout verbal communication. However, the modeling of such intensity fluctuations has been largely overlooked by previous audio-driven talking-head generation methods, which often results in static emotional outputs. In this paper, we explore how emotion intensity fluctuates during speech, proposing a method for capturing and generating these subtle shifts for talking-head generation. Specifically, we develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. This is achieved by learning a continuous emotion latent space, where emotion types are encoded within latent orientations and emotion intensity is reflected in latent norms. In addition, to capture the dynamic intensity fluctuations, we adopt an audio-to-intensity predictor by considering the speaking tone that reflects the intensity. The training signals for this predictor are obtained through our emotion-agnostic intensity pseudo-labeling method without the need of frame-wise intensity labeling. Extensive experiments and analyses validate the effectiveness of our proposed method in accurately capturing and reproducing emotion intensity fluctuations in talking-head generation, thereby significantly enhancing the expressiveness and realism of the generated outputs.
- Published
- 2024
8. TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans
- Author
-
Chatziagapi, Aggelina, Chaudhuri, Bindita, Kumar, Amit, Ranjan, Rakesh, Samaras, Dimitris, and Sarafianos, Nikolaos
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a novel framework that learns a dynamic neural radiance field (NeRF) for full-body talking humans from monocular videos. Prior work represents only the body pose or the face. However, humans communicate with their full body, combining body pose, hand gestures, as well as facial expressions. In this work, we propose TalkinNeRF, a unified NeRF-based network that represents the holistic 4D human motion. Given a monocular video of a subject, we learn corresponding modules for the body, face, and hands, that are combined together to generate the final result. To capture complex finger articulation, we learn an additional deformation field for the hands. Our multi-identity representation enables simultaneous training for multiple subjects, as well as robust animation under completely unseen poses. It can also generalize to novel identities, given only a short video as input. We demonstrate state-of-the-art performance for animating full-body talking humans, with fine-grained hand articulation and facial expressions., Comment: Accepted by ECCVW 2024. Project page: https://aggelinacha.github.io/TalkinNeRF/
- Published
- 2024
9. JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
- Author
-
Chakkera, Sai Tanmay Reddy, Chatziagapi, Aggelina, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a novel method for joint expression and audio-guided talking face generation. Recent approaches either struggle to preserve the speaker identity or fail to produce faithful facial expressions. To address these challenges, we propose a NeRF-based network. Since we train our network on monocular videos without any ground truth, it is essential to learn disentangled representations for audio and expression. We first learn audio features in a self-supervised manner, given utterances from multiple subjects. By incorporating a contrastive learning technique, we ensure that the learned audio features are aligned to the lip motion and disentangled from the muscle motion of the rest of the face. We then devise a transformer-based architecture that learns expression features, capturing long-range facial expressions and disentangling them from the speech-specific mouth movements. Through quantitative and qualitative evaluation, we demonstrate that our method can synthesize high-fidelity talking face videos, achieving state-of-the-art facial expression transfer along with lip synchronization to unseen audio., Comment: Accepted by BMVC 2024. Project Page: https://starc52.github.io/publications/2024-07-19-JEAN
- Published
- 2024
10. Shadow Removal Refinement via Material-Consistent Shadow Edges
- Author
-
Hu, Shilin, Le, Hieu, Athar, ShahRukh, Das, Sagnik, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Shadow boundaries can be confused with material boundaries as both exhibit sharp changes in luminance or contrast within a scene. However, shadows do not modify the intrinsic color or texture of surfaces. Therefore, on both sides of shadow edges traversing regions with the same material, the original color and textures should be the same if the shadow is removed properly. These shadow/shadow-free pairs are very useful but hard-to-collect supervision signals. The crucial contribution of this paper is to learn how to identify those shadow edges that traverse material-consistent regions and how to use them as self-supervision for shadow removal refinement during test time. To achieve this, we fine-tune SAM, an image segmentation foundation model, to produce a shadow-invariant segmentation and then extract material-consistent shadow edges by comparing the SAM segmentation with the shadow mask. Utilizing these shadow edges, we introduce color and texture-consistency losses to enhance the shadow removal process. We demonstrate the effectiveness of our method in improving shadow removal results on more challenging, in-the-wild images, outperforming the state-of-the-art shadow removal methods. Additionally, we propose a new metric and an annotated dataset for evaluating the performance of shadow removal methods without the need for paired shadow/shadow-free data.
- Published
- 2024
11. Look Hear: Gaze Prediction for Speech-directed Human Attention
- Author
-
Mondal, Sounak, Ahn, Seoyoung, Yang, Zhibo, Balasubramanian, Niranjan, Samaras, Dimitris, Zelinsky, Gregory, and Hoai, Minh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
For computer systems to effectively interact with humans using spoken language, they need to understand how the words being generated affect the users' moment-by-moment attention. Our study focuses on the incremental prediction of attention as a person is seeing an image and hearing a referring expression defining the object in the scene that should be fixated by gaze. To predict the gaze scanpaths in this incremental object referral task, we developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression. ART uses a multimodal transformer encoder to jointly learn gaze behavior and its underlying grounding tasks, and an autoregressive transformer decoder to predict, for each word, a variable number of fixations based on fixation history. To train ART, we created RefCOCO-Gaze, a large-scale dataset of 19,738 human gaze scanpaths, corresponding to 2,094 unique image-expression pairs, from 220 participants performing our referral task. In our quantitative and qualitative analyses, ART not only outperforms existing methods in scanpath prediction, but also appears to capture several human attention patterns, such as waiting, scanning, and verification., Comment: Accepted for ECCV 2024
- Published
- 2024
12. Assessing Sample Quality via the Latent Space of Generative Models
- Author
-
Xu, Jingyi, Le, Hieu, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Advances in generative models increase the need for sample quality assessment. To do so, previous methods rely on a pre-trained feature extractor to embed the generated samples and real samples into a common space for comparison. However, different feature extractors might lead to inconsistent assessment outcomes. Moreover, these methods are not applicable for domains where a robust, universal feature extractor does not yet exist, such as medical images or 3D assets. In this paper, we propose to directly examine the latent space of the trained generative model to infer generated sample quality. This is feasible because the quality a generated sample directly relates to the amount of training data resembling it, and we can infer this information by examining the density of the latent space. Accordingly, we use a latent density score function to quantify sample quality. We show that the proposed score correlates highly with the sample quality for various generative models including VAEs, GANs and Latent Diffusion Models. Compared with previous quality assessment methods, our method has the following advantages: 1) pre-generation quality estimation with reduced computational cost, 2) generalizability to various domains and modalities, and 3) applicability to latent-based image editing and generation methods. Extensive experiments demonstrate that our proposed methods can benefit downstream tasks such as few-shot image classification and latent face image editing. Code is available at https://github.com/cvlab-stonybrook/LS-sample-quality., Comment: Accepted paper - ECCV 2024
- Published
- 2024
13. $\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
- Author
-
Le, Minh-Quan, Graikos, Alexandros, Yellapragada, Srikar, Gupta, Rajarsi, Saltz, Joel, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush., Comment: Accepted to ECCV 2024. Project page: https://histodiffusion.github.io
- Published
- 2024
14. Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
- Author
-
Howlader, Prantik, Le, Hieu, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semi-supervised semantic segmentation methods leverage unlabeled data by pseudo-labeling them. Thus the success of these methods hinges on the reliablility of the pseudo-labels. Existing methods mostly choose high-confidence pixels in an effort to avoid erroneous pseudo-labels. However, high confidence does not guarantee correct pseudo-labels especially in the initial training iterations. In this paper, we propose a novel approach to reliably learn from pseudo-labels. First, we unify the predictions from a trained object detector and a semantic segmentation model to identify reliable pseudo-label pixels. Second, we assign different learning weights to pseudo-labeled pixels to avoid noisy training signals. To determine these weights, we first use the reliable pseudo-label pixels identified from the first step and labeled pixels to construct a prototype for each class. Then, the per-pixel weight is the structural similarity between the pixel and the prototype measured via rank-statistics similarity. This metric is robust to noise, making it better suited for comparing features from unlabeled images, particularly in the initial training phases where wrong pseudo labels are prone to occur. We show that our method can be easily integrated into four semi-supervised semantic segmentation frameworks, and improves them in both Cityscapes and Pascal VOC datasets., Comment: to be published in ECCV24
- Published
- 2024
15. iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine
- Author
-
Krithara, Anastasia, Aisopos, Fotis, Rentoumi, Vassiliki, Nentidis, Anastasios, Bougatiotis, Konstantinos, Vidal, Maria-Esther, Menasalvas, Ernestina, Rodriguez-Gonzalez, Alejandro, Samaras, Eleftherios G., Garrard, Peter, Torrente, Maria, Pulla, Mariano Provencio, Dimakopoulos, Nikos, Mauricio, Rui, De Argila, Jordi Rambla, Tartaglia, Gian Gaetano, and Paliouras, George
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Databases - Abstract
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer., Comment: 6 pages, 2 figures, accepted at 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)
- Published
- 2024
- Full Text
- View/download PDF
16. MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
- Author
-
Chatziagapi, Aggelina, Chrysos, Grigorios G., and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce MIGS (Multi-Identity Gaussian Splatting), a novel method that learns a single neural representation for multiple identities, using only monocular videos. Recent 3D Gaussian Splatting (3DGS) approaches for human avatars require per-identity optimization. However, learning a multi-identity representation presents advantages in robustly animating humans under arbitrary poses. We propose to construct a high-order tensor that combines all the learnable 3DGS parameters for all the training identities. By assuming a low-rank structure and factorizing the tensor, we model the complex rigid and non-rigid deformations of multiple subjects in a unified network, significantly reducing the total number of parameters. Our proposed approach leverages information from all the training identities and enables robust animation under challenging unseen poses, outperforming existing approaches. It can also be extended to learn unseen identities., Comment: Accepted by ECCV 2024. Project page: https://aggelinacha.github.io/MIGS/
- Published
- 2024
17. Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
- Author
-
Howlader, Prantik, Das, Srijan, Le, Hieu, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Incorporating pixel contextual information is critical for accurate segmentation. In this paper, we show that an effective way to incorporate contextual information is through a patch-based classifier. This patch classifier is trained to identify classes present within an image region, which facilitates the elimination of distractors and enhances the classification of small object segments. Specifically, we introduce Multi-scale Patch-based Multi-label Classifier (MPMC), a novel plug-in module designed for existing semi-supervised segmentation (SSS) frameworks. MPMC offers patch-level supervision, enabling the discrimination of pixel regions of different classes within a patch. Furthermore, MPMC learns an adaptive pseudo-label weight, using patch-level classification to alleviate the impact of the teacher's noisy pseudo-label supervision the student. This lightweight module can be integrated into any SSS framework, significantly enhancing their performance. We demonstrate the efficacy of our proposed MPMC by integrating it into four SSS methodologies and improving them across two natural image and one medical segmentation dataset, notably improving the segmentation results of the baselines across all the three datasets., Comment: to be published in ECCV24
- Published
- 2024
18. Predicting Visual Attention in Graphic Design Documents
- Author
-
Chakraborty, Souradeep, Wei, Zijun, Kelton, Conor, Ahn, Seoyoung, Balasubramanian, Aruna, Zelinsky, Gregory J., and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage model for predicting dynamic attention on such documents, with webpages being our primary choice of document design for demonstration. In the first stage, we predict the saliency maps for each of the document components (e.g. logos, banners, texts, etc. for webpages) conditioned on the type of document layout. These component saliency maps are then jointly used to predict the overall document saliency. In the second stage, we use these layout-specific component saliency maps as the state representation for an inverse reinforcement learning model of fixation scanpath prediction during document viewing. To test our model, we collected a new dataset consisting of eye movements from 41 people freely viewing 450 webpages (the largest dataset of its kind). Experimental results show that our model outperforms existing models in both saliency and scanpath prediction for webpages, and also generalizes very well to other graphic design documents such as comics, posters, mobile UIs, etc. and natural images.
- Published
- 2024
- Full Text
- View/download PDF
19. Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields
- Author
-
Yang, Yixiong, Hu, Shilin, Wu, Haoyu, Baldrich, Ramon, Samaras, Dimitris, and Vanrell, Maria
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging light variations in scenes to generate pseudo labels, our method provides guidance for intrinsic decomposition without requiring ground truth data. Our method, grounded in physical constraints, ensures robustness across diverse scene types and reduces the reliance on pre-trained models or hand-crafted priors. We validate our method on both synthetic and real-world datasets, achieving convincing results. Furthermore, the applicability of our method to image editing tasks demonstrates promising outcomes., Comment: Accepted by CVPR 2024 Workshop Neural Rendering Intelligence(NRI)
- Published
- 2024
20. Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
- Author
-
Miao, Qiaomu, Graikos, Alexandros, Zhang, Jingwei, Mondal, Sounak, Hoai, Minh, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process. We propose the first semi-supervised method for gaze following by introducing two novel priors to the task. We obtain the first prior using a large pretrained Visual Question Answering (VQA) model, where we compute Grad-CAM heatmaps by `prompting' the VQA model with a gaze following question. These heatmaps can be noisy and not suited for use in training. The need to refine these noisy annotations leads us to incorporate a second prior. We utilize a diffusion model trained on limited human annotations and modify the reverse sampling process to refine the Grad-CAM heatmaps. By tuning the diffusion process we achieve a trade-off between the human annotation prior and the VQA heatmap prior, which retains the useful VQA prior information while exhibiting similar properties to the training data distribution. Our method outperforms simple pseudo-annotation generation baselines on the GazeFollow image dataset. More importantly, our pseudo-annotation strategy, applied to a widely used supervised gaze following model (VAT), reduces the annotation need by 50%. Our method also performs the best on the VideoAttentionTarget dataset., Comment: Accepted to ECCV 2024
- Published
- 2024
21. Toward ultra-efficient high fidelity predictions of wind turbine wakes: Augmenting the accuracy of engineering models via LES-trained machine learning
- Author
-
Santoni, Christian, Zhang, Dichang, Zhang, Zexia, Samaras, Dimitris, Sotiropoulos, Fotis, and Khosronejad, Ali
- Subjects
Physics - Fluid Dynamics - Abstract
This study proposes a novel machine learning (ML) methodology for the efficient and cost-effective prediction of high-fidelity three-dimensional velocity fields in the wake of utility-scale turbines. The model consists of an auto-encoder convolutional neural network with U-Net skipped connections, fine-tuned using high-fidelity data from large-eddy simulations (LES). The trained model takes the low-fidelity velocity field cost-effectively generated from the analytical engineering wake model as input and produces the high-fidelity velocity fields. The accuracy of the proposed ML model is demonstrated in a utility-scale wind farm for which datasets of wake flow fields were previously generated using LES under various wind speeds, wind directions, and yaw angles. Comparing the ML model results with those of LES, the ML model was shown to reduce the error in the prediction from 20% obtained from the GCH model to less than 5%. In addition, the ML model captured the non-symmetric wake deflection observed for opposing yaw angles for wake steering cases, demonstrating a greater accuracy than the GCH model. The computational cost of the ML model is on par with that of the analytical wake model while generating numerical outcomes nearly as accurate as those of the high-fidelity LES.
- Published
- 2024
22. MI-NeRF: Learning a Single Face NeRF from Multiple Identities
- Author
-
Chatziagapi, Aggelina, Chrysos, Grigorios G., and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we introduce a method that learns a single dynamic neural radiance field (NeRF) from monocular talking face videos of multiple identities. NeRFs have shown remarkable results in modeling the 4D dynamics and appearance of human faces. However, they require per-identity optimization. Although recent approaches have proposed techniques to reduce the training and rendering time, increasing the number of identities can be expensive. We introduce MI-NeRF (multi-identity NeRF), a single unified network that models complex non-rigid facial motion for multiple identities, using only monocular videos of arbitrary length. The core premise in our method is to learn the non-linear interactions between identity and non-identity specific information with a multiplicative module. By training on multiple videos simultaneously, MI-NeRF not only reduces the total training time compared to standard single-identity NeRFs, but also demonstrates robustness in synthesizing novel expressions for any input identity. We present results for both facial expression transfer and talking face video synthesis. Our method can be further personalized for a target identity given only a short video., Comment: Project page: https://aggelinacha.github.io/MI-NeRF/
- Published
- 2024
23. Decoding the visual attention of pathologists to reveal their level of expertise
- Author
-
Chakraborty, Souradeep, Perez, Dana, Friedman, Paul, Sheuka, Natallia, Friedman, Constantin, Yaskiv, Oksana, Gupta, Rajarsi, Zelinsky, Gregory J., Saltz, Joel H., and Samaras, Dimitris
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a method for classifying the expertise of a pathologist based on how they allocated their attention during a cancer reading. We engage this decoding task by developing a novel method for predicting the attention of pathologists as they read whole-slide Images (WSIs) of prostate and make cancer grade classifications. Our ground truth measure of a pathologists' attention is the x, y and z (magnification) movement of their viewport as they navigated through WSIs during readings, and to date we have the attention behavior of 43 pathologists reading 123 WSIs. These data revealed that specialists have higher agreement in both their attention and cancer grades compared to general pathologists and residents, suggesting that sufficient information may exist in their attention behavior to classify their expertise level. To attempt this, we trained a transformer-based model to predict the visual attention heatmaps of resident, general, and specialist (GU) pathologists during Gleason grading. Based solely on a pathologist's attention during a reading, our model was able to predict their level of expertise with 75.3%, 56.1%, and 77.2% accuracy, respectively, better than chance and baseline models. Our model therefore enables a pathologist's expertise level to be easily and objectively evaluated, important for pathology training and competency assessment. Tools developed from our model could also be used to help pathology trainees learn how to read WSIs like an expert.
- Published
- 2024
24. Self-supervised co-salient object detection via feature correspondence at multiple scales
- Author
-
Chakraborty, Souradeep and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Our paper introduces a novel two-stage self-supervised approach for detecting co-occurring salient objects (CoSOD) in image groups without requiring segmentation annotations. Unlike existing unsupervised methods that rely solely on patch-level information (e.g. clustering patch descriptors) or on computation heavy off-the-shelf components for CoSOD, our lightweight model leverages feature correspondences at both patch and region levels, significantly improving prediction performance. In the first stage, we train a self-supervised network that detects co-salient regions by computing local patch-level feature correspondences across images. We obtain the segmentation predictions using confidence-based adaptive thresholding. In the next stage, we refine these intermediate segmentations by eliminating the detected regions (within each image) whose averaged feature representations are dissimilar to the foreground feature representation averaged across all the cross-attention maps (from the previous stage). Extensive experiments on three CoSOD benchmark datasets show that our self-supervised model outperforms the corresponding state-of-the-art models by a huge margin (e.g. on the CoCA dataset, our model has a 13.7% F-measure gain over the SOTA unsupervised CoSOD model). Notably, our self-supervised model also outperforms several recent fully supervised CoSOD models on the three test datasets (e.g., on the CoCA dataset, our model has a 4.6% F-measure gain over a recent supervised CoSOD model)., Comment: Accepted to ECCV 2024
- Published
- 2024
25. Evaluating the Correlation Between Stimulus Frequency Otoacoustic Emission Group Delays and Tuning Sharpness in a Cochlear Model
- Author
-
Xia, Yiwei, Samaras, George, and Meaud, Julien
- Published
- 2024
- Full Text
- View/download PDF
26. Current and Emerging Approaches for Primary Prevention of Coronary Artery Disease Using Cardiac Computed Tomography
- Author
-
Kampaktsis, Polydoros N., Hennecken, Carolyn, Shetty, Mrinali, McLaughlin, Laura, Rampidis, Georgios, Samaras, Athanasios, Avgerinos, Dimitrios, Spilias, Nikolaos, Kuno, Toshiki, Briasoulis, Alexandros, and Einstein, Andrew J.
- Published
- 2024
- Full Text
- View/download PDF
27. Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos
- Author
-
Rivero, Alfredo, Athar, ShahRukh, Shu, Zhixin, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Creating controllable 3D human portraits from casual smartphone videos is highly desirable due to their immense value in AR/VR applications. The recent development of 3D Gaussian Splatting (3DGS) has shown improvements in rendering quality and training efficiency. However, it still remains a challenge to accurately model and disentangle head movements and facial expressions from a single-view capture to achieve high-quality renderings. In this paper, we introduce Rig3DGS to address this challenge. We represent the entire scene, including the dynamic subject, using a set of 3D Gaussians in a canonical space. Using a set of control signals, such as head pose and expressions, we transform them to the 3D space with learned deformations to generate the desired rendering. Our key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model. This approach is highly efficient in training and effective in controlling facial expressions, head positions, and view synthesis across various captures. We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments. The project page can be found at http://shahrukhathar.github.io/2024/02/05/Rig3DGS.html
- Published
- 2024
28. Keratin 17 modulates the immune topography of pancreatic cancer.
- Author
-
Delgado-Coka, Lyanne, Horowitz, Michael, Torrente-Goncalves, Mariana, Roa-Peña, Lucia, Leiton, Cindy, Hasan, Mahmudul, Babu, Sruthi, Fassler, Danielle, Oentoro, Jaymie, Bai, Ji-Dong, Petricoin, Emanuel, Matrisian, Lynn, Blais, Edik, Marchenko, Natalia, Allard, Felicia, Jiang, Wei, Larson, Brent, Chen, Chao, Abousamra, Shahira, Samaras, Dimitris, Kurc, Tahsin, Saltz, Joel, Escobar-Hoyos, Luisa, Shroyer, Kenneth, and Hendifar, Andrew
- Subjects
Cancer biomarker ,Cancer immunology ,Digital pathology ,Immune microenvironment ,Keratin 17 ,Multiplexed immunohistochemistry ,Pancreatic ductal adenocarcinoma ,Humans ,Keratin-17 ,Pancreatic Neoplasms ,Tumor Microenvironment ,Female ,Carcinoma ,Pancreatic Ductal ,Male ,CD8-Positive T-Lymphocytes ,Macrophages ,Middle Aged ,Aged ,Receptors ,Cell Surface ,Antigens ,Differentiation ,Myelomonocytic ,Antigens ,CD - Abstract
BACKGROUND: The immune microenvironment impacts tumor growth, invasion, metastasis, and patient survival and may provide opportunities for therapeutic intervention in pancreatic ductal adenocarcinoma (PDAC). Although never studied as a potential modulator of the immune response in most cancers, Keratin 17 (K17), a biomarker of the most aggressive (basal) molecular subtype of PDAC, is intimately involved in the histogenesis of the immune response in psoriasis, basal cell carcinoma, and cervical squamous cell carcinoma. Thus, we hypothesized that K17 expression could also impact the immune cell response in PDAC, and that uncovering this relationship could provide insight to guide the development of immunotherapeutic opportunities to extend patient survival. METHODS: Multiplex immunohistochemistry (mIHC) and automated image analysis based on novel computational imaging technology were used to decipher the abundance and spatial distribution of T cells, macrophages, and tumor cells, relative to K17 expression in 235 PDACs. RESULTS: K17 expression had profound effects on the exclusion of intratumoral CD8+ T cells and was also associated with decreased numbers of peritumoral CD8+ T cells, CD16+ macrophages, and CD163+ macrophages (p
- Published
- 2024
29. SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
- Author
-
Kapse, Saarthak, Pati, Pushpak, Das, Srijan, Zhang, Jingwei, Chen, Chao, Vakalopoulou, Maria, Saltz, Joel, Samaras, Dimitris, Gupta, Rajarsi R., and Prasanna, Prateek
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.
- Published
- 2023
30. Learned representation-guided diffusion models for large-image generation
- Author
-
Graikos, Alexandros, Yellapragada, Srikar, Le, Minh-Quan, Kapse, Saarthak, Prasanna, Prateek, Saltz, Joel, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.
- Published
- 2023
31. -Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
- Author
-
Le, Minh-Quan, Graikos, Alexandros, Yellapragada, Srikar, Gupta, Rajarsi, Saltz, Joel, Samaras, Dimitris, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
32. Assessing Sample Quality via the Latent Space of Generative Models
- Author
-
Xu, Jingyi, Le, Hieu, Samaras, Dimitris, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
33. Industry Academic Partnerships Leading to Better Preparation of Graduates for Employment
- Author
-
Wood, Clare, Matsenjwa, Bongekile J., Hada, Jun Dongol, Rolland, Samuel, Samaras, Vasilios, Jones, Jason W., Bunting, Gavin, Hobson, Ian, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kandakatla, Rohit, editor, Kulkarni, Sushma, editor, and Auer, Michael E., editor
- Published
- 2025
- Full Text
- View/download PDF
34. Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple Scales
- Author
-
Chakraborty, Souradeep, Samaras, Dimitris, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
35. Diffusion-Refined VQA Annotations for Semi-supervised Gaze Following
- Author
-
Miao, Qiaomu, Graikos, Alexandros, Zhang, Jingwei, Mondal, Sounak, Hoai, Minh, Samaras, Dimitris, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
36. Look Hear: Gaze Prediction for Speech-Directed Human Attention
- Author
-
Mondal, Sounak, Ahn, Seoyoung, Yang, Zhibo, Balasubramanian, Niranjan, Samaras, Dimitris, Zelinsky, Gregory, Hoai, Minh, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
37. MORCIC: Model Order Reduction Techniques for Electromagnetic Models of Integrated Circuits
- Author
-
Garyfallou, Dimitrios, Stefanou, Athanasios, Giamouzis, Christos, Antoniadis, Moschos, Chararas, Georgios, Chatzis, Konstantinos, Samaras, Dimitris, Themeli, Rafaela, Michailidis, Anastasios, Gogolou, Vasiliki, Zachos, Nikos, Evmorfopoulos, Nestor, Noulis, Thomas, Pavlidis, Vasilis F., Hatzopoulos, Alkiviadis, Chatzineofytou, Elpida, and Moisiadis, Yiannis
- Subjects
Mathematics - Numerical Analysis ,Computer Science - Hardware Architecture ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Model order reduction (MOR) is crucial for the design process of integrated circuits. Specifically, the vast amount of passive RLCk elements in electromagnetic models extracted from physical layouts exacerbates the extraction time, the storage requirements, and, most critically, the post-layout simulation time of the analyzed circuits. The MORCIC project aims to overcome this problem by proposing new MOR techniques that perform better than commercial tools. Experimental evaluation on several analog and mixed-signal circuits with millions of elements indicates that the proposed methods lead to x5.5 smaller ROMs while maintaining similar accuracy compared to golden ROMs provided by ANSYS RaptorX., Comment: arXiv admin note: substantial text overlap with arXiv:2311.08478
- Published
- 2023
38. Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics
- Author
-
Chakraborty, Souradeep, Naha, Shujon, Bastan, Muhammet, C, Amit Kumar K, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model)., Comment: Accepted at IEEE WACV 2024
- Published
- 2023
39. ChiMera: Learning with noisy labels by contrasting mixed-up augmentations
- Author
-
Liu, Zixuan, Zhang, Xin, He, Junjun, Fu, Dan, Samaras, Dimitris, Tan, Robby, Wang, Xiao, and Wang, Sheng
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
Learning with noisy labels has been studied to address incorrect label annotations in real-world applications. In this paper, we present ChiMera, a two-stage learning-from-noisy-labels framework based on semi-supervised learning, developed based on a novel contrastive learning technique MixCLR. The key idea of MixCLR is to learn and refine the representations of mixed augmentations from two different images to better resist label noise. ChiMera jointly learns the representations of the original data distribution and mixed-up data distribution via MixCLR, introducing many additional augmented samples to fill in the gap between different classes. This results in a more smoothed representation space learned by contrastive learning with better alignment and a more robust decision boundary. By exploiting MixCLR, ChiMera also improves the label diffusion process in the semi-supervised noise recovery stage and further boosts its ability to diffuse correct label information. We evaluated ChiMera on seven real-world datasets and obtained state-of-the-art performance on both symmetric noise and asymmetric noise. Our method opens up new avenues for using contrastive learning on learning with noisy labels and we envision MixCLR to be broadly applicable to other applications.
- Published
- 2023
40. Zero-Shot Object Counting with Language-Vision Models
- Author
-
Xu, Jingyi, Le, Hieu, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Class-agnostic object counting aims to count object instances of an arbitrary class at test time. It is challenging but also enables many potential applications. Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories, especially for autonomous systems. Thus, we propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time. This obviates the need for human annotators and enables automated operation. To perform ZSC, we propose finding a few object crops from the input image and use them as counting exemplars. The goal is to identify patches containing the objects of interest while also being visually representative for all instances in the image. To do this, we first construct class prototypes using large language-vision models, including CLIP and Stable Diffusion, to select the patches containing the target objects. Furthermore, we propose a ranking model that estimates the counting error of each patch to select the most suitable exemplars for counting. Experimental results on a recent class-agnostic counting dataset, FSC-147, validate the effectiveness of our method., Comment: Extended version of CVPR23 arXiv:2303.02001 . Currently under review at T-PAMI
- Published
- 2023
41. The many tensions with dark-matter based models and implications on the nature of the Universe
- Author
-
Kroupa, Pavel, Gjergo, Eda, Asencio, Elena, Haslbauer, Moritz, Pflamm-Altenburg, Jan, Wittenburg, Nils, Samaras, Nick, Thies, Ingo, and Oehm, Wolfgang
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,General Relativity and Quantum Cosmology - Abstract
(Abridged) Fundamental tensions between observations and dark-matter based cosmological models have emerged. This updated review has two purposes: to explore new tensions that have arisen in recent years, compounding the unresolved tensions from previous studies, and to use the shortcomings of the current theory to guide the development of a successful model. Tensions arise in view of the profusion of thin disk galaxies, the pronounced symmetrical structure of the Local Group of Galaxies, the common occurrence of planes of satellite systems, the El Gordo and Bullet galaxy clusters, significant matter inhomogeneities on scales much larger than 100 Mpc, and the observed rapid formation of galaxies and super-massive black holes at redshifts larger than 7. Given the nature of the tensions, the real Universe needs to be described by a model in which gravitation is effectively stronger than Einsteinian/Newtonian gravitation at accelerations below Milgrom's acceleration scale. The promising nuHDM model, anchored on Milgromian dynamics but keeping the standard expansion history with dark energy, solves many of the above tensions. However galaxy formation appears to occur too late in this model, model galaxy clusters reach too large masses, and the mass function of model galaxy clusters is too flat and thus top-heavy in comparison to the observed mass function. Classes of models that reassess inflation, dark energy and the role of the CMB should be explored., Comment: 58 pages, 9 figures, 291 references, based on invited presentation and to appear in the proceedings of Corfu2022: Workshop on Tensions in Cosmology, Corfu Sept. 7-12., 2022 (organisers: E. Saridakis, S. Basilakos, S. Capozziello, E. Di Valentino, O. Mena, S. Pan, J. Levi Said); replaced version contains updated citations
- Published
- 2023
42. Controllable Dynamic Appearance for Neural 3D Portraits
- Author
-
Athar, ShahRukh, Shu, Zhixin, Xu, Zexiang, Luan, Fujun, Bi, Sai, Sunkavalli, Kalyan, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in Neural Radiance Fields (NeRFs) have made it possible to reconstruct and reanimate dynamic portrait scenes with control over head-pose, facial expressions and viewing direction. However, training such models assumes photometric consistency over the deformed region e.g. the face must be evenly lit as it deforms with changing head-pose and facial expression. Such photometric consistency across frames of a video is hard to maintain, even in studio environments, thus making the created reanimatable neural portraits prone to artifacts during reanimation. In this work, we propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions. CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations. The surface normals prediction is guided using 3DMM normals that act as a coarse prior for the normals of the human head, where direct prediction of normals is hard due to rigid and non-rigid deformations induced by head-pose and facial expression changes. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects. The project page can be found here: http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html
- Published
- 2023
43. Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning
- Author
-
Kapse, Saarthak, Das, Srijan, Zhang, Jingwei, Gupta, Rajarsi R., Saltz, Joel, Samaras, Dimitris, and Prasanna, Prateek
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insightful observation: sparsity in attention, i.e, models tends to localize most of their attention to some prominent patterns in the image. Although attention sparsity can be beneficial in natural images due to these prominent patterns being the object of interest itself, this can be sub-optimal in digital pathology; this is because, unlike natural images, digital pathology scans are not object-centric, but rather a complex phenotype of various spatially intermixed biological components. Inadequate diversification of attention in these complex images could result in crucial information loss. To address this, we leverage cell segmentation to densely extract multiple histopathology-specific representations, and then propose a prior-guided dense pretext task for SSL, designed to match the multiple corresponding representations between the views. Through this, the model learns to attend to various components more closely and evenly, thus inducing adequate diversification in attention for capturing context rich representations. Through quantitative and qualitative analysis on multiple tasks across cancer types, we demonstrate the efficacy of our method and observe that the attention is more globally distributed.
- Published
- 2023
44. PathLDM: Text conditioned Latent Diffusion Model for Histopathology
- Author
-
Yellapragada, Srikar, Graikos, Alexandros, Prasanna, Prateek, Kurc, Tahsin, Saltz, Joel, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
To achieve high-quality results, diffusion models must be trained on large datasets. This can be notably prohibitive for models in specialized domains, such as computational pathology. Conditioning on labeled data is known to help in data-efficient model training. Therefore, histopathology reports, which are rich in valuable clinical information, are an ideal choice as guidance for a histopathology generative model. In this paper, we introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Leveraging the rich contextual information provided by pathology text reports, our approach fuses image and textual data to enhance the generation process. By utilizing GPT's capabilities to distill and summarize complex text reports, we establish an effective conditioning mechanism. Through strategic conditioning and necessary architectural enhancements, we achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1., Comment: WACV 2024 publication
- Published
- 2023
45. The impact of steatotic liver disease on coronary artery disease through changes in the plasma lipidome
- Author
-
Elias Björnson, Dimitrios Samaras, Malin Levin, Fredrik Bäckhed, Göran Bergström, and Anders Gummesson
- Subjects
Mediation analysis ,Fatty acid ,Lipidomics ,Sphingolipids ,Glycerophospholipids ,LDL ,Medicine ,Science - Abstract
Abstract Steatotic liver disease has been shown to associate with cardiovascular disease independently of other risk factors. Lipoproteins have been shown to mediate some of this relationship but there remains unexplained variance. Here we investigate the plasma lipidomic changes associated with liver steatosis and the mediating effect of these lipids on coronary artery disease (CAD). In a population of 2579 Swedish participants of ages 50 to 65 years, lipids were measured by mass spectrometry, liver fat was measured using computed tomography (CT), and CAD status was defined as the presence of coronary artery calcification (CAC score > 0). Lipids associated with liver steatosis and CAD were identified and their mediating effects between the two conditions were investigated. Out of 458 lipids, 284 were found to associate with liver steatosis and 19 of them were found to also associate with CAD. Two fatty acids, docosatrienoate (22:3n6) and 2-hydroxyarachidate, presented the highest mediating effect between steatotic liver disease and CAD. Other mediators were also identified among sphingolipids and glycerophospholipids, although their mediating effects were attenuated when adjusting for circulating lipoproteins. Further research should investigate the role of docosatrienoate (22:3n6) and 2-hydroxyarachidate as mediators between steatotic liver disease and CAD alongside known risk factors.
- Published
- 2024
- Full Text
- View/download PDF
46. Learning from Pseudo-labeled Segmentation for Multi-Class Object Counting
- Author
-
Xu, Jingyi, Le, Hieu, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Class-agnostic counting (CAC) has numerous potential applications across various domains. The goal is to count objects of an arbitrary category during testing, based on only a few annotated exemplars. In this paper, we point out that the task of counting objects of interest when there are multiple object classes in the image (namely, multi-class object counting) is particularly challenging for current object counting models. They often greedily count every object regardless of the exemplars. To address this issue, we propose localizing the area containing the objects of interest via an exemplar-based segmentation model before counting them. The key challenge here is the lack of segmentation supervision to train this model. To this end, we propose a method to obtain pseudo segmentation masks using only box exemplars and dot annotations. We show that the segmentation model trained on these pseudo-labeled masks can effectively localize objects of interest for an arbitrary multi-class image based on the exemplars. To evaluate the performance of different methods on multi-class counting, we introduce two new benchmarks, a synthetic multi-class dataset and a new test set of real images in which objects from multiple classes are present. Our proposed method shows a significant advantage over the previous CAC methods on these two benchmarks.
- Published
- 2023
47. SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology
- Author
-
Zhang, Jingwei, Ma, Ke, Kapse, Saarthak, Saltz, Joel, Vakalopoulou, Maria, Prasanna, Prateek, and Samaras, Dimitris
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU., Comment: Submitted to MedAGI 2023
- Published
- 2023
48. Potential of the Julia programming language for high energy physics computing
- Author
-
Eschle, J., Gal, T., Giordano, M., Gras, P., Hegner, B., Heinrich, L., Acosta, U. Hernandez, Kluth, S., Ling, J., Mato, P., Mikhasenko, M., Briceño, A. Moreno, Pivarski, J., Samaras-Tsakiris, K., Schulz, O., Stewart, G. . A., Strube, J., and Vassilev, V.
- Subjects
High Energy Physics - Phenomenology ,Computer Science - Programming Languages ,High Energy Physics - Experiment ,Physics - Computational Physics ,J.2 - Abstract
Research in high energy physics (HEP) requires huge amounts of computing and storage, putting strong constraints on the code speed and resource usage. To meet these requirements, a compiled high-performance language is typically used; while for physicists, who focus on the application when developing the code, better research productivity pleads for a high-level programming language. A popular approach consists of combining Python, used for the high-level interface, and C++, used for the computing intensive part of the code. A more convenient and efficient approach would be to use a language that provides both high-level programming and high-performance. The Julia programming language, developed at MIT especially to allow the use of a single language in research activities, has followed this path. In this paper the applicability of using the Julia language for HEP research is explored, covering the different aspects that are important for HEP code development: runtime performance, handling of large projects, interface with legacy code, distributed computing, training, and ease of programming. The study shows that the HEP community would benefit from a large scale adoption of this programming language. The HEP-specific foundation libraries that would need to be consolidated are identified, Comment: 32 pages, 5 figures, 4 tables
- Published
- 2023
- Full Text
- View/download PDF
49. Conditional Generation from Unconditional Diffusion Models using Denoiser Representations
- Author
-
Graikos, Alexandros, Yellapragada, Srikar, and Samaras, Dimitris
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Denoising diffusion models have gained popularity as a generative modeling technique for producing high-quality and diverse images. Applying these models to downstream tasks requires conditioning, which can take the form of text, class labels, or other forms of guidance. However, providing conditioning information to these models can be challenging, particularly when annotations are scarce or imprecise. In this paper, we propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network. We demonstrate the effectiveness of our approach on various conditional generation tasks, including attribute-conditioned generation and mask-conditioned generation. Additionally, we show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%. Our approach provides a powerful and flexible way to adapt diffusion models to new conditions and generate high-quality augmented data for various conditional generation tasks.
- Published
- 2023
50. The impact of steatotic liver disease on coronary artery disease through changes in the plasma lipidome
- Author
-
Björnson, Elias, Samaras, Dimitrios, Levin, Malin, Bäckhed, Fredrik, Bergström, Göran, and Gummesson, Anders
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.