Author: "A, Cucchiara" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"A, Cucchiara"' showing total 491 results

Start Over Author "A, Cucchiara" Publication Type Electronic Resources

491 results on '"A, Cucchiara"'

1. Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Author: Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita, Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, and Cucchiara, Rita
Abstract: Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body. In the context of fashion design, computer vision techniques have the potential to enhance and streamline the design process. Departing from prior research primarily focused on virtual try-on, this paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures. To address this problem, we propose extending latent diffusion models to incorporate these multiple modalities and modifying the structure of the denoising network, taking multimodal prompts as input. To condition the proposed architecture on fabric textures, we employ textual inversion techniques and let diverse cross-attention layers of the denoising network attend to textual and texture information, thus incorporating different granularity conditioning details. Given the lack of datasets for the task, we extend two existing fashion datasets, Dress Code and VITON-HD, with multimodal annotations. Experimental evaluations demonstrate the effectiveness of our proposed approach in terms of realism and coherence concerning the provided multimodal inputs.
Published: 2024

2. Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images

Author: Cartella, Giuseppe, Cuculo, Vittorio, Cornia, Marcella, Cucchiara, Rita, Cartella, Giuseppe, Cuculo, Vittorio, Cornia, Marcella, and Cucchiara, Rita
Abstract: Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural language of your desired output is all you need to obtain breathtaking results. However, as the use of generative models grows, so do concerns about the propagation of malicious content and misinformation. Consequently, the research community is actively working on the development of novel fake detection techniques, primarily focusing on low-level features and possible fingerprints left by generative models during the image generation process. In a different vein, in our work, we leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. To achieve this, we collect a novel dataset of partially manipulated images using diffusion models and conduct an eye-tracking experiment to record the eye movements of different observers while viewing real and fake stimuli. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images. Statistical findings reveal that, when perceiving counterfeit samples, humans tend to focus on more confined regions of the image, in contrast to the more dispersed observational pattern observed when viewing genuine images. Our dataset is publicly available at: https://github.com/aimagelab/unveiling-the-truth., Comment: Accepted to IEEE Signal Processing Letters 2024
Published: 2024

3. Mapping High-level Semantic Regions in Indoor Environments without Object Recognition

Author: Bigazzi, Roberto, Baraldi, Lorenzo, Kousik, Shreyas, Cucchiara, Rita, Pavone, Marco, Bigazzi, Roberto, Baraldi, Lorenzo, Kousik, Shreyas, Cucchiara, Rita, and Pavone, Marco
Abstract: Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments. In the literature, there has been an extensive focus on object labeling and exhaustive scene graph generation; less effort has been focused on the task of purely identifying and mapping large semantic regions. The present work proposes a method for semantic region mapping via embodied navigation in indoor environments, generating a high-level representation of the knowledge of the agent. To enable region identification, the method uses a vision-to-language model to provide scene information for mapping. By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location. This mapping procedure is paired with a trained navigation policy to enable autonomous map generation. The proposed method significantly outperforms a variety of baselines, including an object-based system and a pretrained scene classifier, in experiments in a photorealistic simulator., Comment: Accepted by IEEE International Conference on Robotics and Automation (ICRA 2024)
Published: 2024

4. Trends, Applications, and Challenges in Human Attention Modelling

Author: Cartella, Giuseppe, Cornia, Marcella, Cuculo, Vittorio, D'Amelio, Alessandro, Zanca, Dario, Boccignone, Giuseppe, Cucchiara, Rita, Cartella, Giuseppe, Cornia, Marcella, Cuculo, Vittorio, D'Amelio, Alessandro, Zanca, Dario, Boccignone, Giuseppe, and Cucchiara, Rita
Abstract: Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges. For a comprehensive overview on the ongoing research refer to our dedicated repository available at https://github.com/aimagelab/awesome-human-visual-attention., Comment: Accepted at IJCAI 2024 Survey Track
Published: 2024

5. The Revolution of Multimodal Large Language Models: A Survey

Author: Caffagni, Davide, Cocchi, Federico, Barsellotti, Luca, Moratelli, Nicholas, Sarto, Sara, Baraldi, Lorenzo, Cornia, Marcella, Cucchiara, Rita, Caffagni, Davide, Cocchi, Federico, Barsellotti, Luca, Moratelli, Nicholas, Sarto, Sara, Baraldi, Lorenzo, Cornia, Marcella, and Cucchiara, Rita
Abstract: Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal Large Language Models (MLLMs). These models can seamlessly integrate visual and textual modalities, while providing a dialogue-based interface and instruction-following capabilities. In this paper, we provide a comprehensive review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques. We also conduct a detailed analysis of these models across a wide range of tasks, including visual grounding, image generation and editing, visual understanding, and domain-specific applications. Additionally, we compile and describe training datasets and evaluation benchmarks, conducting comparisons among existing models in terms of performance and computational requirements. Overall, this survey offers a comprehensive overview of the current state of the art, laying the groundwork for future MLLMs., Comment: ACL 2024 (Findings)
Published: 2024

6. VATr++: Choose Your Words Wisely for Handwritten Text Generation

Author: Vanherle, Bram, Pippi, Vittorio, Cascianelli, Silvia, Michiels, Nick, Van Reeth, Frank, Cucchiara, Rita, Vanherle, Bram, Pippi, Vittorio, Cascianelli, Silvia, Michiels, Nick, Van Reeth, Frank, and Cucchiara, Rita
Abstract: Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
Published: 2024

7. Key-Graph Transformer for Image Restoration

Author: Ren, Bin, Li, Yawei, Liang, Jingyun, Ranjan, Rakesh, Liu, Mengyuan, Cucchiara, Rita, Van Gool, Luc, Sebe, Nicu, Ren, Bin, Li, Yawei, Liang, Jingyun, Ranjan, Rakesh, Liu, Mengyuan, Cucchiara, Rita, Van Gool, Luc, and Sebe, Nicu
Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In response to these challenges, we introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes. Then the proposed Key-Graph Attention is conducted under the guidance of the Key-Graph only among selected nodes with linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed KGT's state-of-the-art performance, showcasing advancements both quantitatively and qualitatively., Comment: 9 pages, 6 figures
Published: 2024

8. DistFormer: Enhancing Local and Global Features for Monocular Per-Object Distance Estimation

Author: Panariello, Aniello, Mancusi, Gianluca, Ali, Fedy Haj, Porrello, Angelo, Calderara, Simone, Cucchiara, Rita, Panariello, Aniello, Mancusi, Gianluca, Ali, Fedy Haj, Porrello, Angelo, Calderara, Simone, and Cucchiara, Rita
Abstract: Accurate per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics. Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information, which encodes the semantics of the scene as well as the spatial relations with neighboring objects. However, these approaches may struggle with long-range objects and in the presence of strong occlusions or unusual visual patterns. In this respect, our work aims to strengthen both local and global cues. Our architecture -- named DistFormer -- builds upon three major components acting jointly: i) a robust context encoder extracting fine-grained per-object representations; ii) a masked encoder-decoder module exploiting self-supervision to promote the learning of useful per-object features; iii) a global refinement module that aggregates object representations and computes a joint, spatially-consistent estimation. To evaluate the effectiveness of DistFormer, we conduct experiments on the standard KITTI dataset and the large-scale NuScenes and MOTSynth datasets. Such datasets cover various indoor/outdoor environments, changing weather conditions, appearances, and camera viewpoints. Our comprehensive analysis shows that DistFormer outperforms existing methods. Moreover, we further delve into its generalization capabilities, showing its regularization benefits in zero-shot synth-to-real transfer.
Published: 2024

9. Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes

Author: Benaglia, Riccardo, Porrello, Angelo, Buzzega, Pietro, Calderara, Simone, Cucchiara, Rita, Benaglia, Riccardo, Porrello, Angelo, Buzzega, Pietro, Calderara, Simone, and Cucchiara, Rita
Abstract: Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents, e.g. basketball players engaged in intricate interactions with long-term intentions. Deep generative models offer a natural learning approach for trajectory forecasting, yet they encounter difficulties in achieving an optimal balance between sampling fidelity and diversity. We address this challenge by leveraging Vector Quantized Variational Autoencoders (VQ-VAEs), which utilize a discrete latent space to tackle the issue of posterior collapse. Specifically, we introduce an instance-based codebook that allows tailored latent representations for each example. In a nutshell, the rows of the codebook are dynamically adjusted to reflect contextual information (i.e., past motion patterns extracted from the observed trajectories). In this way, the discretization process gains flexibility, leading to improved reconstructions. Notably, instance-level dynamics are injected into the codebook through low-rank updates, which restrict the customization of the codebook to a lower dimension space. The resulting discrete space serves as the basis of the subsequent step, which regards the training of a diffusion-based predictive model. We show that such a two-fold framework, augmented with instance-level discretization, leads to accurate and diverse forecasts, yielding state-of-the-art performance on three established benchmarks., Comment: 15 pages, 3 figures, 5 tables
Published: 2024

10. Sharing Key Semantics in Transformer Makes Efficient Image Restoration

Author: Ren, Bin, Li, Yawei, Liang, Jingyun, Ranjan, Rakesh, Liu, Mengyuan, Cucchiara, Rita, Van Gool, Luc, Yang, Ming-Hsuan, Sebe, Nicu, Ren, Bin, Li, Yawei, Liang, Jingyun, Ranjan, Rakesh, Liu, Mengyuan, Cucchiara, Rita, Van Gool, Luc, Yang, Ming-Hsuan, and Sebe, Nicu
Abstract: Image Restoration (IR), a classic low-level vision task, has witnessed significant advancements through deep models that effectively model global information. Notably, the Vision Transformers (ViTs) emergence has further propelled these advancements. When computing, the self-attention mechanism, a cornerstone of ViTs, tends to encompass all global cues, even those from semantically unrelated objects or regions. This inclusivity introduces computational inefficiencies, particularly noticeable with high input resolution, as it requires processing irrelevant information, thereby impeding efficiency. Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction. To address these challenges, we propose boosting IR's performance by sharing the key semantics via Transformer for IR (i.e., SemanIR) in this paper. Specifically, SemanIR initially constructs a sparse yet comprehensive key-semantic dictionary within each transformer stage by establishing essential semantic connections for every degraded patch. Subsequently, this dictionary is shared across all subsequent transformer blocks within the same stage. This strategy optimizes attention calculation within each block by focusing exclusively on semantically related components stored in the key-semantic dictionary. As a result, attention calculation achieves linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed SemanIR's state-of-the-art performance, quantitatively and qualitatively showcasing advancements., Comment: 9 pages
Published: 2024

11. A Second-Order perspective on Compositionality and Incremental Learning

Author: Porrello, Angelo, Bonicelli, Lorenzo, Buzzega, Pietro, Millunzi, Monica, Calderara, Simone, Cucchiara, Rita, Porrello, Angelo, Bonicelli, Lorenzo, Buzzega, Pietro, Millunzi, Monica, Calderara, Simone, and Cucchiara, Rita
Abstract: The fine-tuning of deep pre-trained models has recently revealed compositional properties. This enables the arbitrary composition of multiple specialized modules into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin for achieving composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in specific tasks.
Published: 2024

12. Towards Retrieval-Augmented Architectures for Image Captioning

Author: Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Nicolosi, Alessandro, Cucchiara, Rita, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Nicolosi, Alessandro, and Cucchiara, Rita
Abstract: The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images. In recent years, researchers have leveraged deep learning-based models and made advances in the extraction of visual features and the design of multimodal connections to tackle this task. This work presents a novel approach towards developing image captioning models that utilize an external kNN memory to improve the generation process. Specifically, we propose two model variants that incorporate a knowledge retriever component that is based on visual similarities, a differentiable encoder to represent input images, and a kNN-augmented language model to predict tokens based on contextual cues and text retrieved from the external memory. We experimentally validate our approach on COCO and nocaps datasets and demonstrate that incorporating an explicit external memory can significantly enhance the quality of captions, especially with a larger retrieval corpus. This work provides valuable insights into retrieval-augmented captioning models and opens up new avenues for improving image captioning at a larger scale., Comment: ACM Transactions on Multimedia Computing, Communications and Applications (2024)
Published: 2024

13. AIGeN: An Adversarial Approach for Instruction Generation in VLN

Author: Rawal, Niyati, Bigazzi, Roberto, Baraldi, Lorenzo, Cucchiara, Rita, Rawal, Niyati, Bigazzi, Roberto, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: In the last few years, the research interest in Vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance., Comment: Accepted to 7th Multimodal Learning and Applications Workshop (MULA 2024) at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
Published: 2024

14. Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Author: Barsellotti, Luca, Amoroso, Roberto, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Barsellotti, Luca, Amoroso, Roberto, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: Open-vocabulary semantic segmentation aims at segmenting arbitrary categories expressed in textual form. Previous works have trained over large amounts of image-caption pairs to enforce pixel-level multimodal alignments. However, captions provide global information about the semantics of a given image but lack direct localization of individual concepts. Further, training on large-scale datasets inevitably brings significant computational costs. In this paper, we propose FreeDA, a training-free diffusion-augmented method for open-vocabulary semantic segmentation, which leverages the ability of diffusion models to visually localize generated concepts and local-global similarities to match class-agnostic regions with semantic classes. Our approach involves an offline stage in which textual-visual reference embeddings are collected, starting from a large set of captions and leveraging visual and semantic contexts. At test time, these are queried to support the visual matching process, which is carried out by jointly considering class-agnostic regions and global semantic similarities. Extensive analyses demonstrate that FreeDA achieves state-of-the-art performance on five datasets, surpassing previous methods by more than 7.0 average points in terms of mIoU and without requiring any training., Comment: CVPR 2024. Project page: https://aimagelab.github.io/freeda
Published: 2024

15. Binarizing Documents by Leveraging both Space and Frequency

Author: Quattrini, Fabio, Pippi, Vittorio, Cascianelli, Silvia, Cucchiara, Rita, Quattrini, Fabio, Pippi, Vittorio, Cascianelli, Silvia, and Cucchiara, Rita
Abstract: Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved. One of the main challenges of this task is that documents generally exhibit degradations and acquisition artifacts that can greatly vary throughout the page. Nonetheless, even when dealing with a local patch of the document, taking into account the overall appearance of a wide portion of the page can ease the prediction by enriching it with semantic information on the ink and background conditions. In this respect, approaches able to model both local and global information have been proven suitable for this task. In particular, recent applications of Vision Transformer (ViT)-based models, able to model short and long-range dependencies via the attention mechanism, have demonstrated their superiority over standard Convolution-based models, which instead struggle to model global dependencies. In this work, we propose an alternative solution based on the recently introduced Fast Fourier Convolutions, which overcomes the limitation of standard convolutions in modeling global information while requiring fewer parameters than ViTs. We validate the effectiveness of our approach via extensive experimental analysis considering different types of degradations., Comment: Accepted at ICDAR2024
Published: 2024

16. Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Author: Caffagni, Davide, Cocchi, Federico, Moratelli, Nicholas, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Caffagni, Davide, Cocchi, Federico, Moratelli, Nicholas, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: Multimodal LLMs are the natural evolution of LLMs, and enlarge their capabilities so as to work beyond the pure textual modality. As research is being carried out to design novel architectures and vision-and-language adapters, in this paper we concentrate on endowing such models with the capability of answering questions that require external knowledge. Our approach, termed Wiki-LLaVA, aims at integrating an external knowledge source of multimodal documents, which is accessed through a hierarchical retrieval pipeline. Relevant passages, using this approach, are retrieved from the external knowledge source and employed as additional context for the LLM, augmenting the effectiveness and precision of generated dialogues. We conduct extensive experiments on datasets tailored for visual question answering with external data and demonstrate the appropriateness of our approach., Comment: CVPR 2024 Workshop on What is Next in Multimodal Foundation Models
Published: 2024

17. Input Perturbation Reduces Exposure Bias in Diffusion Models

Author: Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, Cucchiara, Rita, Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, and Cucchiara, Rita
Abstract: Denoising Diffusion Probabilistic Models have shown an impressive generation quality although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the {\em exposure bias} problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting of perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5\% of the training time. The code is available at \url{https://github.com/forever208/DDPM-IP}.
Published: 2023

18. Input Perturbation Reduces Exposure Bias in Diffusion Models

Author: Sub Social and Affective Computing, Social and Affective Computing, Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, Cucchiara, Rita, Sub Social and Affective Computing, Social and Affective Computing, Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, and Cucchiara, Rita
Published: 2023

19. One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data

Author: Luetto, Simone, Garuti, Fabrizio, Sangineto, Enver, Forni, Lorenzo, Cucchiara, Rita, Luetto, Simone, Garuti, Fabrizio, Sangineto, Enver, Forni, Lorenzo, and Cucchiara, Rita
Abstract: There is a recent growing interest in applying Deep Learning techniques to tabular data, in order to replicate the success of other Artificial Intelligence areas in this structured domain. Specifically interesting is the case in which tabular data have a time dependence, such as, for instance financial transactions. However, the heterogeneity of the tabular values, in which categorical elements are mixed with numerical items, makes this adaptation difficult. In this paper we propose a Transformer architecture to represent heterogeneous time-dependent tabular data, in which numerical features are represented using a set of frequency functions and the whole network is uniformly trained with a unique loss function., Comment: 9 pages, 2 figures, 7 tables
Published: 2023

20. Input Perturbation Reduces Exposure Bias in Diffusion Models

Author: Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, Cucchiara, Rita, Ning, Mang, Sangineto, Enver, Porrello, Angelo, Calderara, Simone, and Cucchiara, Rita
Abstract: Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64$\times$64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. The code is publicly available at https://github.com/forever208/DDPM-IP, Comment: accepted by ICML 2023
Published: 2023

21. Embodied Agents for Efficient Exploration and Smart Scene Description

Author: Bigazzi, Roberto, Cornia, Marcella, Cascianelli, Silvia, Baraldi, Lorenzo, Cucchiara, Rita, Bigazzi, Roberto, Cornia, Marcella, Cascianelli, Silvia, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment by highlighting the prominent objects and the correlation between them as encountered during the exploration. To quantitatively assess the performance of the proposed approach, we also devise a specific score that takes into account both exploration and description skills. The experiments carried out on both photorealistic simulated environments and real-world ones demonstrate that our approach can effectively describe the robot's point of view during exploration, improving the human-friendly interpretability of its observations., Comment: Accepted by IEEE International Conference on Robotics and Automation (ICRA 2023)
Published: 2023

22. LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Author: Morelli, Davide, Baldrati, Alberto, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita, Morelli, Davide, Baldrati, Alberto, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, and Cucchiara, Rita
Abstract: The rapidly evolving fields of e-commerce and metaverse continue to seek innovative approaches to enhance the consumer experience. At the same time, recent advancements in the development of diffusion models have enabled generative networks to create remarkably realistic images. In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions. This work introduces LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task. The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module that exploits learnable skip connections to enhance the generation process preserving the model's characteristics. To effectively maintain the texture and details of the in-shop garment, we propose a textual inversion component that can map the visual features of the garment to the CLIP token embedding space and thus generate a set of pseudo-word token embeddings capable of conditioning the generation process. Experimental results on Dress Code and VITON-HD datasets demonstrate that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task. Source code and trained models are publicly available at: https://github.com/miccunifi/ladi-vton., Comment: ACM Multimedia 2023
Published: 2023

23. How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Author: Pippi, Vittorio, Cascianelli, Silvia, Kermorvant, Christopher, Cucchiara, Rita, Pippi, Vittorio, Cascianelli, Silvia, Kermorvant, Christopher, and Cucchiara, Rita
Abstract: Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets. Nonetheless, those models struggle to obtain the same performance when applied to manuscripts with peculiar characteristics, such as language, paper support, ink, and author handwriting. This issue is very relevant for valuable but small collections of documents preserved in historical archives, for which obtaining sufficient annotated training data is costly or, in some cases, unfeasible. To overcome this challenge, a possible solution is to pretrain HTR models on large datasets and then fine-tune them on small single-author collections. In this paper, we take into account large, real benchmark datasets and synthetic ones obtained with a styled Handwritten Text Generation model. Through extensive experimental analysis, also considering the amount of fine-tuning lines, we give a quantitative indication of the most relevant characteristics of such data for obtaining an HTR model able to effectively transcribe manuscripts in small collections with as little as five real fine-tuning lines., Comment: Accepted at ICDAR2023
Published: 2023

24. Evaluating Synthetic Pre-Training for Handwriting Processing Tasks

Author: Pippi, Vittorio, Cascianelli, Silvia, Baraldi, Lorenzo, Cucchiara, Rita, Pippi, Vittorio, Cascianelli, Silvia, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks. To this end, we build a large synthetic dataset of word images rendered in several handwriting fonts, which offers a complete supervision signal. We use it to train a simple convolutional neural network (ConvNet) with a fully supervised objective. The vector representations of the images obtained from the pre-trained ConvNet can then be considered as encodings of the handwriting style. We exploit such representations for Writer Retrieval, Writer Identification, Writer Verification, and Writer Classification and demonstrate that our pre-training strategy allows extracting rich representations of the writers' style that enable the aforementioned tasks with competitive results with respect to task-specific State-of-the-Art approaches.
Published: 2023

25. Multi-Class Explainable Unlearning for Image Classification via Weight Filtering

Author: Poppi, Samuele, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Poppi, Samuele, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network. While existing approaches have focused on unlearning either a small subset of the training data or a single class, in this paper we take a different path and devise a framework that can unlearn all classes of an image classification network in a single untraining round. Our proposed technique learns to modulate the inner components of an image classification network through memory matrices so that, after training, the same network can selectively exhibit an unlearning behavior over any of the classes. By discovering weights which are specific to each of the classes, our approach also recovers a representation of the classes which is explainable by-design. We test the proposed framework, which we name Weight Filtering network (WF-Net), on small-scale and medium-scale image classification datasets, with both CNN and Transformer-based backbones. Our work provides interesting insights in the development of explainable solutions for unlearning and could be easily extended to other vision tasks.
Published: 2023

26. Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Author: Baldrati, Alberto, Morelli, Davide, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita, Baldrati, Alberto, Morelli, Davide, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, and Cucchiara, Rita
Abstract: Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: https://github.com/aimagelab/multimodal-garment-designer., Comment: ICCV 2023
Published: 2023

27. Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images

Author: Amoroso, Roberto, Morelli, Davide, Cornia, Marcella, Baraldi, Lorenzo, Del Bimbo, Alberto, Cucchiara, Rita, Amoroso, Roberto, Morelli, Davide, Cornia, Marcella, Baraldi, Lorenzo, Del Bimbo, Alberto, and Cucchiara, Rita
Abstract: Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. While these models have numerous benefits across various sectors, they have also raised concerns about the potential misuse of fake images and cast new pressures on fake image detection. In this work, we pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models. Firstly, we conduct a comprehensive analysis of the performance of contrastive and classification-based visual features, respectively extracted from CLIP-based models and ResNet or ViT-based architectures trained on image classification datasets. Our results demonstrate that fake images share common low-level cues, which render them easily recognizable. Further, we devise a multimodal setting wherein fake images are synthesized by different textual captions, which are used as seeds for a generator. Under this setting, we quantify the performance of fake detection strategies and introduce a contrastive-based disentangling method that lets us analyze the role of the semantics of textual descriptions and low-level perceptual cues. Finally, we release a new dataset, called COCOFake, containing about 1.2M images generated from the original COCO image-caption pairs using two recent text-to-image diffusion models, namely Stable Diffusion v1.4 and v2.0., Comment: ACM Transactions on Multimedia Computing, Communications and Applications (2024)
Published: 2023

28. Handwritten Text Generation from Visual Archetypes

Author: Pippi, Vittorio, Cascianelli, Silvia, Cucchiara, Rita, Pippi, Vittorio, Cascianelli, Silvia, and Cucchiara, Rita
Abstract: Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters., Comment: Accepted at CVPR2023
Published: 2023

29. Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Author: Sarto, Sara, Barraco, Manuele, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Sarto, Sara, Barraco, Manuele, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures. In this paper, we propose a new recipe for a contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S), that in a novel way unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos, outperforming existing reference-based metrics like CIDEr and SPICE and reference-free metrics like CLIP-Score. Finally, we test the system-level correlation of the proposed metric when considering popular image captioning approaches, and assess the impact of employing different cross-modal features. Our source code and trained models are publicly available at: https://github.com/aimagelab/pacscore., Comment: CVPR 2023 (highlight paper)
Published: 2023

30. Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Author: Poppi, Samuele, Poppi, Tobia, Cocchi, Federico, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Poppi, Samuele, Poppi, Tobia, Cocchi, Federico, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concerns in their adoption. Our research introduces a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. In particular, our methodology seeks to sever "toxic" linguistic and visual concepts, unlearning the linkage between unsafe linguistic or visual items and unsafe regions of the embedding space. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator. We conduct extensive experiments on the resulting embedding space for cross-modal retrieval, text-to-image, and image-to-text generation, where we show that our model can be remarkably employed with pre-trained generative models. Our source code and trained models are available at: https://github.com/aimagelab/safe-clip.
Published: 2023

31. HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

Author: Pippi, Vittorio, Quattrini, Fabio, Cascianelli, Silvia, Cucchiara, Rita, Pippi, Vittorio, Quattrini, Fabio, Cascianelli, Silvia, and Cucchiara, Rita
Abstract: Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area., Comment: Accepted at BMVC2023
Published: 2023

32. Model order reduction by convex displacement interpolation

Author: Cucchiara, Simona, Iollo, Angelo, Taddei, Tommaso, Telib, Haysam, Cucchiara, Simona, Iollo, Angelo, Taddei, Tommaso, and Telib, Haysam
Abstract: We present a nonlinear interpolation technique for parametric fields that exploits optimal transportation of coherent structures of the solution to achieve accurate performance. The approach generalizes the nonlinear interpolation procedure introduced in [Iollo, Taddei, J. Comput. Phys., 2022] to multi-dimensional parameter domains and to datasets of several snapshots. Given a library of high-fidelity simulations, we rely on a scalar testing function and on a point set registration method to identify coherent structures of the solution field in the form of sorted point clouds. Given a new parameter value, we exploit a regression method to predict the new point cloud; then, we resort to a boundary-aware registration technique to define bijective mappings that deform the new point cloud into the point clouds of the neighboring elements of the dataset, while preserving the boundary of the domain; finally, we define the estimate as a weighted combination of modes obtained by composing the neighboring snapshots with the previously-built mappings. We present several numerical examples for compressible and incompressible, viscous and inviscid flows to demonstrate the accuracy of the method. Furthermore, we employ the nonlinear interpolation procedure to augment the dataset of simulations for linear-subspace projection-based model reduction: our data augmentation procedure is designed to reduce offline costs -- which are dominated by snapshot generation -- of model reduction techniques for nonlinear advection-dominated problems.
Published: 2023

33. OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Author: Cartella, Giuseppe, Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita, Cartella, Giuseppe, Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, and Cucchiara, Rita
Abstract: The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either defined a low generalizable supervised learning approach or more reusable CLIP-based techniques while, however, training on closed source data. In this work, we propose OpenFashionCLIP, a vision-and-language contrastive learning method that only adopts open-source fashion data stemming from diverse domains, and characterized by varying degrees of specificity. Our approach is extensively validated across several tasks and benchmarks, and experimental results highlight a significant out-of-domain generalization capability and consistent improvements over state-of-the-art methods both in terms of accuracy and recall. Source code and trained models are publicly available at: https://github.com/aimagelab/open-fashion-clip., Comment: International Conference on Image Analysis and Processing (ICIAP) 2023
Published: 2023

34. With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Author: Barraco, Manuele, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita, Barraco, Manuele, Sarto, Sara, Cornia, Marcella, Baraldi, Lorenzo, and Cucchiara, Rita
Abstract: Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net., Comment: ICCV 2023
Published: 2023

35. TrackFlow: Multi-Object Tracking with Normalizing Flows

Author: Mancusi, Gianluca, Panariello, Aniello, Porrello, Angelo, Fabbri, Matteo, Calderara, Simone, Cucchiara, Rita, Mancusi, Gianluca, Panariello, Aniello, Porrello, Angelo, Fabbri, Matteo, Calderara, Simone, and Cucchiara, Rita
Abstract: The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms., Comment: Accepted at ICCV 2023
Published: 2023

36. Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

Author: Quattrini, Fabio, Pippi, Vittorio, Cascianelli, Silvia, Cucchiara, Rita, Quattrini, Fabio, Pippi, Vittorio, Cascianelli, Silvia, and Cucchiara, Rita
Abstract: Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts. Among those, there has been an increasing interest in applying Artificial Intelligence techniques for virtually unwrapping and automatically detecting ink on the Herculaneum papyri collection. This collection consists of carbonized scrolls and fragments of documents, which have been digitized via X-ray tomography to allow the development of ad-hoc deep learning-based DDR solutions. In this work, we propose a modification of the Fast Fourier Convolution operator for volumetric data and apply it in a segmentation architecture for ink detection on the challenging Herculaneum papyri, demonstrating its suitability via deep experimental analysis. To encourage the research on this task and the application of the proposed operator to other tasks involving volumetric data, we will release our implementation (https://github.com/aimagelab/vffc), Comment: Accepted at the 4th ICCV Workshop on e-Heritage (in conjunction with ICCV 2023)
Published: 2023

37. CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

Author: Di Nucci, Davide, Simoni, Alessandro, Tomei, Matteo, Ciuffreda, Luca, Vezzani, Roberto, Cucchiara, Rita, Di Nucci, Davide, Simoni, Alessandro, Tomei, Matteo, Ciuffreda, Luca, Vezzani, Roberto, and Cucchiara, Rita
Abstract: Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images. Despite their efficiency, NeRF models can pose challenges in certain scenarios such as vehicle inspection, where the lack of sufficient data or the presence of challenging elements (e.g. reflections) strongly impact the accuracy of the reconstruction. To this aim, we introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques. The dataset is publicly released at https://aimagelab.ing.unimore.it/go/carpatch and can be used as an evaluation guide and as a baseline for future work on this challenging topic., Comment: Accepted at ICIAP2023
Published: 2023

38. Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Author: Betti, Federico, Staiano, Jacopo, Baraldi, Lorenzo, Cucchiara, Rita, Sebe, Nicu, Betti, Federico, Staiano, Jacopo, Baraldi, Lorenzo, Cucchiara, Rita, and Sebe, Nicu
Abstract: Research in Image Generation has recently made significant progress, particularly boosted by the introduction of Vision-Language models which are able to produce high-quality visual content based on textual inputs. Despite ongoing advancements in terms of generation quality and realism, no methodical frameworks have been defined yet to quantitatively measure the quality of the generated content and the adherence with the prompted requests: so far, only human-based evaluations have been adopted for quality satisfaction and for comparing different generative methods. We introduce a novel automated method for Visual Concept Evaluation (ViCE), i.e. to assess consistency between a generated/edited image and the corresponding prompt/instructions, with a process inspired by the human cognitive behaviour. ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment. This method outlines visual concepts, formulates image-specific verification questions, utilizes the Q&A system to investigate the image, and scores the combined outcome. Although this brave new hypothesis of mimicking humans in the image evaluation process is in its preliminary assessment stage, results are promising and open the door to a new form of automatic evaluation which could have significant impact as the image generation or the image target editing tasks become more and more sophisticated., Comment: Accepted as oral at ACM MultiMedia 2023 (Brave New Ideas track)
Published: 2023

39. Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

Author: Baraldi, Lorenzo, Amoroso, Roberto, Cornia, Marcella, Pilzer, Andrea, Cucchiara, Rita, Baraldi, Lorenzo, Amoroso, Roberto, Cornia, Marcella, Pilzer, Andrea, and Cucchiara, Rita
Abstract: The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of visual tasks such as image classification. In this context, recent approaches have employed the Masked Image Modeling paradigm, which pre-trains a backbone by reconstructing visual tokens associated with randomly masked image patches. This masking approach, however, introduces noise into the input data during pre-training, leading to discrepancies that can impair performance during the fine-tuning phase. Furthermore, input masking neglects the dependencies between corrupted patches, increasing the inconsistencies observed in downstream fine-tuning tasks. To overcome these issues, we propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT), that employs autoregressive and permuted predictions to capture intra-patch dependencies. In addition, MaPeT employs auxiliary positional information to reduce the disparity between the pre-training and fine-tuning phases. In our experiments, we employ a fair setting to ensure reliable and meaningful comparisons and conduct investigations on multiple visual tokenizers, including our proposed $k$-CLIP which directly employs discretized CLIP features. Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting. Source code and trained models are publicly available at: https://github.com/aimagelab/MaPeT.
Published: 2023

40. Preliminary results of an ongoing prospective clinical trial on the use of68ga-psma and68ga-dota-rm2 pet/mri in staging of high-risk prostate cancer patients

Author: Mapelli, P, Ghezzo, S, Samanes Gajate, A, Preza, E, Brembilla, G, Cucchiara, V, Ahmed, N, Bezzi, C, Presotto, L, Bettinardi, V, Savi, A, Magnani, P, Menichini, R, Coliva, A, Neri, I, Di Gaeta, E, Gianolli, L, Freschi, M, Briganti, A, De Cobelli, F, Scifo, P, Picchio, M, Mapelli P., Ghezzo S., Samanes Gajate A. M., Preza E., Brembilla G., Cucchiara V., Ahmed N., Bezzi C., Presotto L., Bettinardi V., Savi A., Magnani P., Menichini R., Coliva A., Neri I., Di Gaeta E., Gianolli L., Freschi M., Briganti A., De Cobelli F., Scifo P., Picchio M., Mapelli, P, Ghezzo, S, Samanes Gajate, A, Preza, E, Brembilla, G, Cucchiara, V, Ahmed, N, Bezzi, C, Presotto, L, Bettinardi, V, Savi, A, Magnani, P, Menichini, R, Coliva, A, Neri, I, Di Gaeta, E, Gianolli, L, Freschi, M, Briganti, A, De Cobelli, F, Scifo, P, Picchio, M, Mapelli P., Ghezzo S., Samanes Gajate A. M., Preza E., Brembilla G., Cucchiara V., Ahmed N., Bezzi C., Presotto L., Bettinardi V., Savi A., Magnani P., Menichini R., Coliva A., Neri I., Di Gaeta E., Gianolli L., Freschi M., Briganti A., De Cobelli F., Scifo P., and Picchio M.
Abstract: The aim of the present study is to investigate the synergic role of68Ga-PSMA PET/MRI and68Ga-DOTA-RM2 PET/MRI in prostate cancer (PCa) staging. We present pilot data on twenty-two patients with biopsy-proven PCa that underwent68Ga-PSMA PET/MRI for staging purposes, with 19/22 also undergoing68Gaa-DOTA-RM2 PET/MRI. TNM classification based on image findings was performed and quantitative imaging parameters were collected for each scan. Furthermore, twelve patients underwent radical prostatectomy with the availability of histological data that were used as the gold standard to validate intraprostatic findings. A DICE score between regions of interest manually segmented on the primary tumour on68Ga-PSMA PET,68Ga-DOTA-RM2 PET and on T2 MRI was computed. All imaging modalities detected the primary PCa in 18/19 patients, with68Ga-DOTA-RM2 PET not detecting any lesion in 1/19 patients. In the remaining patients,68Ga-PSMA and MRI were concordant. Seven patients presented seminal vesicles involvement on MRI, with two of these being also detected by68Ga-PSMA, and68Ga-DOTA-RM2 PET being negative. Regarding extraprostatic disease,68Ga-PSMA PET,68Ga-DOTA-RM2 PET and MRI resulted positive in seven, four and five patients at lymph-nodal level, respectively, and at a bone level in three, zero and one patients, respectively. These preliminary results suggest the potential complementary role of68Ga-PSMA PET,68Ga-DOTA-RM2 PET and MRI in PCa characterization during the staging phase.
Published: 2021

41. Disability After Minor Stroke and Transient Ischemic Attack in the POINT Trial.

Author: Cucchiara, Brett, Cucchiara, Brett, Elm, Jordan, Easton, J Donald, Coutts, Shelagh B, Willey, Joshua Z, Biros, Michelle H, Ross, Michael A, Johnston, S Claiborne, Cucchiara, Brett, Cucchiara, Brett, Elm, Jordan, Easton, J Donald, Coutts, Shelagh B, Willey, Joshua Z, Biros, Michelle H, Ross, Michael A, and Johnston, S Claiborne
Abstract: Background and Purpose- While combination aspirin and clopidogrel reduces recurrent stroke compared with aspirin alone in patients with transient ischemic attack (TIA) or minor stroke, the effect on disability is uncertain. Methods- The POINT trial (Platelet-Oriented Inhibition in New TIA and Minor Ischemic Stroke) randomized patients with TIA or minor stroke (National Institutes of Health Stroke Scale score ≤3) within 12 hours of onset to dual antiplatelet therapy (DAPT) with aspirin plus clopidogrel versus aspirin alone. The primary outcome measure was a composite of stroke, myocardial infarction, or vascular death. We performed a post hoc exploratory analysis to examine the effect of treatment on overall disability (defined as modified Rankin Scale score >1) at 90 days, as well as disability ascribed by the local investigator to index or recurrent stroke. We also evaluated predictors of disability. Results- At 90 days, 188 of 1964 (9.6%) of patients enrolled with TIA and 471 of 2586 (18.2%) of those enrolled with stroke were disabled. Overall disability was similar between patients assigned DAPT versus aspirin alone (14.7% versus 14.3%; odds ratio, 0.97 [95% CI, 0.82-1.14]; P=0.69). However, there were numerically fewer patients with disability in conjunction with a primary outcome event in the DAPT arm (3.0% versus 4.0%; odds ratio, 0.73 [95% CI, 0.53-1.01]; P=0.06) and significantly fewer patients in the DAPT arm with disability attributed by the investigators to either the index event or recurrent stroke (5.9% versus 7.4%; odds ratio, 0.78 [95% CI, 0.62-0.99]; P=0.04). Notably, disability attributed to the index event accounted for the majority of this difference (4.5% versus 6.0%; odds ratio, 0.74 [95% CI, 0.57-0.96]; P=0.02). In multivariate analysis, age, subsequent ischemic stroke, serious adverse events, and major bleeding were significantly associated with disability in TIA; for those with stroke, female sex, hypertension, or diabetes mellitus, Nation
Published: 2020

42. Probing the extragalactic fast transient sky at minute time-scales with DECam

Author: Andreoni, I, Cooke, J, Webb, S, Rest, A, Pritchard, T, Caleb, M, Chang, S, Farah, W, Lien, A, Möller, A, Ravasio, M, Abbott, T, Bhandari, S, Cucchiara, A, Flynn, C, Jankowski, F, Keane, E, Moriya, T, Onken, C, Parthasarathy, A., P, D., C, Petroff, E, Ryder, S, Vohl, D, Wolf, C, Andreoni, I., Cooke, J., Webb, S., Rest, A., Pritchard, T., Caleb, M., Chang, S. -W., Farah, W., Lien, A., Möller, A., Ravasio, M. E., Abbott, T. M. C., Bhandari, S., Cucchiara, A., Flynn, C., Jankowski, F., Keane, E. F., Moriya, T. J., Onken, C. A., A. Price, D. C., Petroff, E., Ryder, S., Vohl, D., Wolf, C., Andreoni, I, Cooke, J, Webb, S, Rest, A, Pritchard, T, Caleb, M, Chang, S, Farah, W, Lien, A, Möller, A, Ravasio, M, Abbott, T, Bhandari, S, Cucchiara, A, Flynn, C, Jankowski, F, Keane, E, Moriya, T, Onken, C, Parthasarathy, A., P, D., C, Petroff, E, Ryder, S, Vohl, D, Wolf, C, Andreoni, I., Cooke, J., Webb, S., Rest, A., Pritchard, T., Caleb, M., Chang, S. -W., Farah, W., Lien, A., Möller, A., Ravasio, M. E., Abbott, T. M. C., Bhandari, S., Cucchiara, A., Flynn, C., Jankowski, F., Keane, E. F., Moriya, T. J., Onken, C. A., A. Price, D. C., Petroff, E., Ryder, S., Vohl, D., and Wolf, C.
Abstract: Searches for optical transients are usually performed with a cadence of days to weeks, optimized for supernova discovery. The optical fast transient sky is still largely unexplored, with only a few surveys to date having placed meaningful constraints on the detection of extragalactic transients evolving at sub-hour time-scales. Here, we present the results of deep searches for dim, minute-time-scale extragalactic fast transients using the Dark Energy Camera, a core facility of our all-wavelength and all-messenger Deeper, Wider, Faster programme. We used continuous 20 s exposures to systematically probe time-scales down to 1.17 min at magnitude limits g > 23 (AB), detecting hundreds of transient and variable sources. Nine candidates passed our strict criteria on duration and non-stellarity, all of which could be classified as flare stars based on deep multiband imaging. Searches for fast radio burst and gamma-ray counterparts during simultaneous multifacility observations yielded no counterparts to the optical transients. Also, no long-term variability was detected with pre-imaging and follow-up observations using the SkyMapper optical telescope. We place upper limits for minute-time-scale fast optical transient rates for a range of depths and time-scales. Finally, we demonstrate that optical g-band light-curve behaviour alone cannot discriminate between confirmed extragalactic fast transients such as prompt GRB flashes and Galactic stellar flares.
Published: 2020

43. A blast from the infant Universe: the very high-z GRB 210905A

Author: Rossi, A., Frederiks, D. D., Kann, D. A., De Pasquale, M., Pian, E., D'Avanzo, P., Izzo, L., Lamb, G., Malesani, D. B., Melandri, A., Guelbenzu, A. Nicuesa, Schulze, S., Strausbaugh, R., Amati, L., Campana, S., Cucchiara, A., Ghirlanda, G., Della Valle, M., Klose, S., Salvaterra, R., Starling, R., Stratta, G., Tanvir, N. R., Tsvetkova, A. E., Vergani, S. D., D'Ai, A., Burgarella, D., Covino, S., D'Elia, V., Postigo, A. de Ugarte, Fausey, H., Fynbo, J. P. U., Frontera, F., Guidorzi, C., Heintz, K. E., Masetti, N., Maiorano, E., Mundell, C. G., Oates, S. R., Page, M. J., Palazzi, E., Palmerio, J., Pugliese, G., Rau, A., Saccardi, A., Sbarufatti, B., Svinkin, D. S., Tagliaferri, G., van der Horst, A. J., Watson, D., Ulanov, M. V., Wiersema, K., Xu, D., Zhang, J., Rossi, A., Frederiks, D. D., Kann, D. A., De Pasquale, M., Pian, E., D'Avanzo, P., Izzo, L., Lamb, G., Malesani, D. B., Melandri, A., Guelbenzu, A. Nicuesa, Schulze, S., Strausbaugh, R., Amati, L., Campana, S., Cucchiara, A., Ghirlanda, G., Della Valle, M., Klose, S., Salvaterra, R., Starling, R., Stratta, G., Tanvir, N. R., Tsvetkova, A. E., Vergani, S. D., D'Ai, A., Burgarella, D., Covino, S., D'Elia, V., Postigo, A. de Ugarte, Fausey, H., Fynbo, J. P. U., Frontera, F., Guidorzi, C., Heintz, K. E., Masetti, N., Maiorano, E., Mundell, C. G., Oates, S. R., Page, M. J., Palazzi, E., Palmerio, J., Pugliese, G., Rau, A., Saccardi, A., Sbarufatti, B., Svinkin, D. S., Tagliaferri, G., van der Horst, A. J., Watson, D., Ulanov, M. V., Wiersema, K., Xu, D., and Zhang, J.
Abstract: We present the discovery of the very energetic GRB 210905A at the high redshift z=6.312 and its luminous X-ray and optical afterglow. We obtained photometric and spectroscopic follow-up in the optical and near-infrared (NIR), covering both the prompt and afterglow emission from a few minutes up to 7.5 Ms after burst. With an isotropic gamma-ray energy of Eiso=1.27x10^54erg, GRB 210905A lies in the top ~7% GRBs in terms of energy released. Its afterglow is among the most luminous ever observed and, in particular, it is one of the most luminous in the optical at t<0.5 d, in the rest frame. The afterglow starts with a shallow evolution that can be explained by energy injection, and is followed by a steeper decay, while the spectral energy distribution is in agreement with slow cooling in a constant-density environment within the standard fireball theory. A jet break at 39+-21 d has been observed in the X-ray light curve; however, it is hidden in the H-band, potentially due to a constant contribution from an unknown component, most likely a foreground intervening galaxy and/or the host galaxy. We derived a half-opening angle of 7.9+-1.6 degrees, the highest ever measured for a z>~6 burst but within the range covered by closer events. The resulting collimation-corrected gamma-ray energy of 10^52erg is also among the highest ever measured. The moderately large half-opening angle argues against recent claims of an inverse dependence of the half-opening angle on the redshift. The total jet energy is likely too large for a standard magnetar, and suggests that the central engine of this burst was a newly formed black hole. Despite the outstanding energetics and luminosity of both GRB 210905A and its afterglow, we demonstrate that they are consistent within 2swith those of less distant bursts, indicating that the powering mechanisms and progenitors do not evolve significantly with redshift., Comment: 18 pages, 12 figures, 5 tables, submitted to Astronomy & Astrophysics
Published: 2022

44. Optimizing Cadences with Realistic Light-curve Filtering for Serendipitous Kilonova Discovery with Vera Rubin Observatory

Author: Andreoni, Igor, Coughlin, Michael W., Almualla, Mouza, Bellm, Eric C., Bianco, Federica B., Bulla, Mattia, Cucchiara, Antonino, Dietrich, Tim, Goobar, Ariel, Kool, Erik C., Li, Xiaolong, Ragosta, Fabio, Sagués Carracedo, Ana, Singer, Leo P., Andreoni, Igor, Coughlin, Michael W., Almualla, Mouza, Bellm, Eric C., Bianco, Federica B., Bulla, Mattia, Cucchiara, Antonino, Dietrich, Tim, Goobar, Ariel, Kool, Erik C., Li, Xiaolong, Ragosta, Fabio, Sagués Carracedo, Ana, and Singer, Leo P.
Abstract: Current and future optical and near-infrared wide-field surveys have the potential to find kilonovae, the optical and infrared counterparts to neutron star mergers, independently of gravitational-wave or high-energy gamma-ray burst triggers. The ability to discover fast and faint transients such as kilonovae largely depends on the area observed, the depth of those observations, the number of revisits per field in a given time frame, and the filters adopted by the survey; it also depends on the ability to perform rapid follow-up observations to confirm the nature of the transients. In this work, we assess kilonova detectability in existing simulations of the Legacy Survey of Space and Time strategy for the Vera C. Rubin Wide Fast Deep survey, with focus on comparing rolling to baseline cadences. Although currently available cadences can enable the detection of >300 kilonovae out to ∼1400 Mpc over the 10 year survey, we can expect only 3–32 kilonovae similar to GW170817 to be recognizable as fast-evolving transients. We also explore the detectability of kilonovae over the plausible parameter space, focusing on viewing angle and ejecta masses. We find that observations in redder izy bands are crucial for identification of nearby (within 300 Mpc) kilonovae that could be spectroscopically classified more easily than more distant sources. Rubin's potential for serendipitous kilonova discovery could be increased by gain of efficiency with the employment of individual 30 s exposures (as opposed to 2 × 15 s snap pairs), with the addition of red-band observations coupled with same-night observations in g or r bands, and possibly with further development of a new rolling-cadence strategy.
Published: 2022
Full Text: View/download PDF

45. Dissemination of the European Association of Urology Guidelines Through Social Media: Strategy, Results, and Future Developments

Author: Pradere, B., Esperto, F., Oort, I.M. van, Bhatt, N.R., Czarniecki, S.W., Gurp, M. van, Bloemberg, J., Darraugh, J., Garcia-Rojo, E., Cucchiara, V., Teoh, J.Y., N'Dow, J., Giannarini, G., Ribal, M.J., Pradere, B., Esperto, F., Oort, I.M. van, Bhatt, N.R., Czarniecki, S.W., Gurp, M. van, Bloemberg, J., Darraugh, J., Garcia-Rojo, E., Cucchiara, V., Teoh, J.Y., N'Dow, J., Giannarini, G., and Ribal, M.J.
Abstract: Item does not contain fulltext, Over the past decade, social media (SoMe) platforms have been embraced by the medical community across all specialties. This engagement creates a valuable opportunity for scientific organizations to use the broad reach, accessibility, functionality, and informal environment of SoMe to raise awareness, reinforce trust with stakeholders, and disseminate scientific information. In this field, the European Association of Urology (EAU) Guidelines Office has been a pioneer and has constantly set out to disseminate the recommendations established annually by its guidelines panels. Here we describe the dissemination strategy used by the EAU Guidelines Office and the results obtained in the past few years. The EAU Guidelines Office proposes various types of content to disseminate on the different SoMe platforms. An ad hoc dissemination committee adapts attractive content for different target audiences to fit the specific requirements of the platforms on which it is published. Over the past 5 yr, the dissemination committee has been able to constantly improve the engagement of different audiences, especially using Twitter, Facebook, and, more recently, Instagram. It has been shown that use of a multifaceted strategy to improve dissemination of the guidelines, such as campaigns for awareness days, is successful. PATIENT SUMMARY: We describe the strategy used by the European Association of Urology Guidelines Office to disseminate recommendations from the association's guidelines to different target audiences via social media and we summarize the main results.
Published: 2022

46. Short GRB Host Galaxies. I. Photometric and Spectroscopic Catalogs, Host Associations, and Galactocentric Offsets

Author: Fong, W.-f., Nugent, A.E., Dong, Y., Berger, E., Paterson, K., Chornock, R., Levan, A.J., Blanchard, P., Alexander, K.D., Andrews, J., Cobb, B.E., Cucchiara, A., Fox, D., Fryer, C.L., Gordon, A.C., Kilpatrick, C.D., Lunnan, R., Margutti, R., Miller, A., Milne, P., Nicholl, M., Perley, D., Rastinejad, J., Escorial, A.R., Schroeder, G., Smith, N., Tanvir, N., Terreran, G., Fong, W.-f., Nugent, A.E., Dong, Y., Berger, E., Paterson, K., Chornock, R., Levan, A.J., Blanchard, P., Alexander, K.D., Andrews, J., Cobb, B.E., Cucchiara, A., Fox, D., Fryer, C.L., Gordon, A.C., Kilpatrick, C.D., Lunnan, R., Margutti, R., Miller, A., Milne, P., Nicholl, M., Perley, D., Rastinejad, J., Escorial, A.R., Schroeder, G., Smith, N., Tanvir, N., and Terreran, G.
Abstract: Contains fulltext : 284948.pdf (Publisher’s version ) (Open Access)
Published: 2022

47. Short GRB Host Galaxies. I. Photometric and Spectroscopic Catalogs, Host Associations, and Galactocentric Offsets

Author: Fong, W.-f., Nugent, A.E., Dong, Y., Berger, E., Paterson, K., Chornock, R., Levan, A.J., Blanchard, P., Alexander, K.D., Andrews, J., Cobb, B.E., Cucchiara, A., Fox, D., Fryer, C.L., Gordon, A.C., Kilpatrick, C.D., Lunnan, R., Margutti, R., Miller, A., Milne, P., Nicholl, M., Perley, D., Rastinejad, J., Escorial, A.R., Schroeder, G., Smith, N., Tanvir, N., Terreran, G., Fong, W.-f., Nugent, A.E., Dong, Y., Berger, E., Paterson, K., Chornock, R., Levan, A.J., Blanchard, P., Alexander, K.D., Andrews, J., Cobb, B.E., Cucchiara, A., Fox, D., Fryer, C.L., Gordon, A.C., Kilpatrick, C.D., Lunnan, R., Margutti, R., Miller, A., Milne, P., Nicholl, M., Perley, D., Rastinejad, J., Escorial, A.R., Schroeder, G., Smith, N., Tanvir, N., and Terreran, G.
Abstract: Contains fulltext : 284948.pdf (Publisher’s version ) (Open Access)
Published: 2022

48. Dissemination of the European Association of Urology Guidelines Through Social Media: Strategy, Results, and Future Developments

Author: Pradere, B., Esperto, F., Oort, I.M. van, Bhatt, N.R., Czarniecki, S.W., Gurp, M. van, Bloemberg, J., Darraugh, J., Garcia-Rojo, E., Cucchiara, V., Teoh, J.Y., N'Dow, J., Giannarini, G., Ribal, M.J., Pradere, B., Esperto, F., Oort, I.M. van, Bhatt, N.R., Czarniecki, S.W., Gurp, M. van, Bloemberg, J., Darraugh, J., Garcia-Rojo, E., Cucchiara, V., Teoh, J.Y., N'Dow, J., Giannarini, G., and Ribal, M.J.
Abstract: Item does not contain fulltext, Over the past decade, social media (SoMe) platforms have been embraced by the medical community across all specialties. This engagement creates a valuable opportunity for scientific organizations to use the broad reach, accessibility, functionality, and informal environment of SoMe to raise awareness, reinforce trust with stakeholders, and disseminate scientific information. In this field, the European Association of Urology (EAU) Guidelines Office has been a pioneer and has constantly set out to disseminate the recommendations established annually by its guidelines panels. Here we describe the dissemination strategy used by the EAU Guidelines Office and the results obtained in the past few years. The EAU Guidelines Office proposes various types of content to disseminate on the different SoMe platforms. An ad hoc dissemination committee adapts attractive content for different target audiences to fit the specific requirements of the platforms on which it is published. Over the past 5 yr, the dissemination committee has been able to constantly improve the engagement of different audiences, especially using Twitter, Facebook, and, more recently, Instagram. It has been shown that use of a multifaceted strategy to improve dissemination of the guidelines, such as campaigns for awareness days, is successful. PATIENT SUMMARY: We describe the strategy used by the European Association of Urology Guidelines Office to disseminate recommendations from the association's guidelines to different target audiences via social media and we summarize the main results.
Published: 2022

49. SeeFar : Vehicle Speed Estimation and Flow Analysis from a Moving UAV

Author: Ning, Mang, Ma, Xiaoliang, Lu, Yao, Calderara, Simone, Cucchiara, Rita, Ning, Mang, Ma, Xiaoliang, Lu, Yao, Calderara, Simone, and Cucchiara, Rita
Abstract: Visual perception from drones has been largely investigated for Intelligent Traffic Monitoring System (ITMS) recently. In this paper, we introduce SeeFar to achieve vehicle speed estimation and traffic flow analysis based on YOLOv5 and DeepSORT from a moving drone. SeeFar differs from previous works in three key ways: the speed estimation and flow analysis components are integrated into a unified framework; our method of predicting car speed has the least constraints while maintaining a high accuracy; our flow analysor is direction-aware and outlier-aware. Specifically, we design the speed estimator only using the camera imaging geometry, where the transformation between world space and image space is completed by the variable Ground Sampling Distance. Besides, previous papers do not evaluate their speed estimators at scale due to the difficulty of obtaining the ground truth, we therefore propose a simple yet efficient approach to estimate the true speeds of vehicles via the prior size of the road signs. We evaluate SeeFar on our ten videos that contain 929 vehicle samples. Experiments on these sequences demonstrate the effectiveness of SeeFar by achieving 98.0% accuracy of speed estimation and 99.1% accuracy of traffic volume prediction, respectively., QC 20221111Part of proceedings: ISBN 978-3-031-06433-3; 978-3-031-06432-6
Published: 2022
Full Text: View/download PDF

50. Definition and Impact on Oncologic Outcomes of Persistently Elevated Prostate-specific Antigen After Salvage Lymph Node Dissection for Node-only Recurrent Prostate Cancer After Radical Prostatectomy: Clinical Implications for Multimodal Therapy

Author: Bravi, Carlo A., Droghetti, Matteo, Fossati, Nicola, Gandaglia, Giorgio, Suardi, Nazareno, Mazzone, Elio, Cucchiara, Vito, Scuderi, Simone, Barletta, Francesco, Schiavina, Riccardo, Osmonov, Daniar, Juenemann, Klaus-Peter, Boeri, Luca, Karnes, R. Jeffrey, Kretschmer, Alexander, Buchner, Alexander, Stief, Christian, Hiester, Andreas, Nini, Alessandro, Albers, Peter, Devos, Gaetan, Joniau, Steven, Van Poppel, Hendrik, Grubmueller, Bernhard, Shariat, Shahrokh F., Heidenreich, Axel, Pfister, David, Tilki, Derya, Graefen, Markus, Gill, Inderbir S., Mottrie, Alexandre, Karakiewicz, Pierre, I, Montorsi, Francesco, Briganti, Alberto, Bravi, Carlo A., Droghetti, Matteo, Fossati, Nicola, Gandaglia, Giorgio, Suardi, Nazareno, Mazzone, Elio, Cucchiara, Vito, Scuderi, Simone, Barletta, Francesco, Schiavina, Riccardo, Osmonov, Daniar, Juenemann, Klaus-Peter, Boeri, Luca, Karnes, R. Jeffrey, Kretschmer, Alexander, Buchner, Alexander, Stief, Christian, Hiester, Andreas, Nini, Alessandro, Albers, Peter, Devos, Gaetan, Joniau, Steven, Van Poppel, Hendrik, Grubmueller, Bernhard, Shariat, Shahrokh F., Heidenreich, Axel, Pfister, David, Tilki, Derya, Graefen, Markus, Gill, Inderbir S., Mottrie, Alexandre, Karakiewicz, Pierre, I, Montorsi, Francesco, and Briganti, Alberto
Abstract: Background: The optimal definition and prognostic significance of persistently elevated prostate-specific antigen (PSA) after salvage lymph node dissection (sLND) for node-only recurrent prostate cancer (PCa) remain unknown. Objective: To assess the definition and clinical implications of persistently elevated PSA after sLND for node-only recurrent PCa after radical prostatectomy. Design, setting, and participants: The study included 579 patients treated with sLND at 11 high-volume centers between 2000 and 2016. Outcome measurements and statistical analysis: We assessed the linear relationship between the first PSA after sLND and death from PCa. Different definitions of PSA persistence were included in a multivariable model predicting cancer-specific mortality (CSM) after surgery to identify the best cutoff value. We investigated the association between PSA persistence and oncologic outcomes using multivariable regression models. Moreover, the effect of early androgen deprivation therapy (ADT) after sLND was tested according to PSA persistence status and estimated risk of CSM. Results and limitations: We found an inverse relationship between the first PSA after sLND and the probability of cancer-specific survival. PSA persistence defined as first postoperative PSA >= 0.3 ng/ml provided the best discrimination accuracy (C index 0.757). According to this cutoff, 331 patients (57%) experienced PSA persistence. The median follow-up for survivors was 48 mo (interquartile range 27-74). After adjusting for confounders, men with persistently elevated PSA had higher risk of clinical recurrence (hazard ratio [HR] 1.61), overall mortality (HR 2.20), and CSM (HR 2.59; all p < 0.001) after sLND. Early ADT administration after sLND improved survival only for patients with PSA persistence after surgery (HR 0.49; p = 0.024). Similarly, when PSA persistence status was included in multivariable models accounting for pathologic features, early ADT use after sLND was beneficial only fo
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

491 results on '"A, Cucchiara"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources