Author: "Adeli, Ehsan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Adeli, Ehsan"' showing total 801 results

Start Over Author "Adeli, Ehsan"

801 results on '"Adeli, Ehsan"'

1. Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models

Author: Wang, Yanchen, Turnbull, Adam, Xiang, Tiange, Xu, Yunlong, Zhou, Sa, Masoud, Adnan, Azizi, Shekoofeh, Lin, Feng Vankee, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural decoding, the process of understanding how brain activity corresponds to different stimuli, has been a primary objective in cognitive sciences. Over the past three decades, advancements in functional Magnetic Resonance Imaging and machine learning have greatly improved our ability to map visual stimuli to brain activity, especially in the visual cortex. Concurrently, research has expanded into decoding more complex processes like language and memory across the whole brain, utilizing techniques to handle greater variability and improve signal accuracy. We argue that "seeing" involves more than just mapping visual stimuli onto the visual cortex; it engages the entire brain, as various emotions and cognitive states can emerge from observing different scenes. In this paper, we develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps while individuals are exposed to visual stimuli. We utilize large-scale fMRI encoders and Image generative models pre-trained on large public datasets, which are then fine-tuned through Image-fMRI contrastive learning. Our models hence can decode visual experience across the entire cerebral cortex, surpassing the traditional confines of the visual cortex. We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%. A network ablation analysis suggests that beyond the visual cortex, the default mode network contributes most to decoding stimuli, in line with the proposed role of this network in sense-making and semantic processing. Additionally, we implemented zero-shot imagination decoding on an extra validation dataset, achieving a p-value of 0.0206 for mapping the reconstructed images and ground-truth text stimuli, which substantiates the model's capability to capture semantic meanings across various scenarios.
Published: 2024

2. SOE: SO(3)-Equivariant 3D MRI Encoding

Author: He, Shizhe, Paschali, Magdalini, Ouyang, Jiahong, Masood, Adnan, Chaudhari, Akshay, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Representation learning has become increasingly important, especially as powerful models have shifted towards learning latent representations before fine-tuning for downstream tasks. This approach is particularly valuable in leveraging the structural information within brain anatomy. However, a common limitation of recent models developed for MRIs is their tendency to ignore or remove geometric information, such as translation and rotation, thereby creating invariance with respect to geometric operations. We contend that incorporating knowledge about these geometric transformations into the model can significantly enhance its ability to learn more detailed anatomical information within brain structures. As a result, we propose a novel method for encoding 3D MRIs that enforces equivariance with respect to all rotations in 3D space, in other words, SO(3)-equivariance (SOE). By explicitly modeling this geometric equivariance in the representation space, we ensure that any rotational operation applied to the input image space is also reflected in the embedding representation space. This approach requires moving beyond traditional representation learning methods, as we need a representation vector space that allows for the application of the same SO(3) operation in that space. To facilitate this, we leverage the concept of vector neurons. The representation space formed by our method captures the brain's structural and anatomical information more effectively. We evaluate SOE pretrained on the structural MRIs of two public data sets with respect to the downstream task of predicting age and diagnosing Alzheimer's Disease from T1-weighted brain scans of the ADNI data set. We demonstrate that our approach not only outperforms other methods but is also robust against various degrees of rotation along different axes. The code is available at https://github.com/shizhehe/SOE-representation-learning.
Published: 2024

3. GAMMA-PD: Graph-based Analysis of Multi-Modal Motor Impairment Assessments in Parkinson's Disease

Author: Nerrise, Favour, Heiman, Alice Louise, and Adeli, Ehsan
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Quantitative Biology - Neurons and Cognition
Abstract: The rapid advancement of medical technology has led to an exponential increase in multi-modal medical data, including imaging, genomics, and electronic health records (EHRs). Graph neural networks (GNNs) have been widely used to represent this data due to their prominent performance in capturing pairwise relationships. However, the heterogeneity and complexity of multi-modal medical data still pose significant challenges for standard GNNs, which struggle with learning higher-order, non-pairwise relationships. This paper proposes GAMMA-PD (Graph-based Analysis of Multi-modal Motor Impairment Assessments in Parkinson's Disease), a novel heterogeneous hypergraph fusion framework for multi-modal clinical data analysis. GAMMA-PD integrates imaging and non-imaging data into a "hypernetwork" (patient population graph) by preserving higher-order information and similarity between patient profiles and symptom subtypes. We also design a feature-based attention-weighted mechanism to interpret feature-level contributions towards downstream decision tasks. We evaluate our approach with clinical data from the Parkinson's Progression Markers Initiative (PPMI) and a private dataset. We demonstrate gains in predicting motor impairment symptoms in Parkinson's disease. Our end-to-end framework also learns associations between subsets of patient characteristics to generate clinically relevant explanations for disease and symptom profiles. The source code is available at https://github.com/favour-nerrise/GAMMA-PD., Comment: Accepted by the 6th Workshop on GRaphs in biomedicAl Image anaLysis (GRAIL) at the 27th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024). 12 pages, 3 figures, 2 tables, Source Code: https://github.com/favour-nerrise/GAMMA-PD
Published: 2024

4. SpaRG: Sparsely Reconstructed Graphs for Generalizable fMRI Analysis

Author: González, Camila, Miraoui, Yanis, Fan, Yiran, Adeli, Ehsan, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Deep learning can help uncover patterns in resting-state functional Magnetic Resonance Imaging (rs-fMRI) associated with psychiatric disorders and personal traits. Yet the problem of interpreting deep learning findings is rarely more evident than in fMRI analyses, as the data is sensitive to scanning effects and inherently difficult to visualize. We propose a simple approach to mitigate these challenges grounded on sparsification and self-supervision. Instead of extracting post-hoc feature attributions to uncover functional connections that are important to the target task, we identify a small subset of highly informative connections during training and occlude the rest. To this end, we jointly train a (1) sparse input mask, (2) variational autoencoder (VAE), and (3) downstream classifier in an end-to-end fashion. While we need a portion of labeled samples to train the classifier, we optimize the sparse mask and VAE with unlabeled data from additional acquisition sites, retaining only the input features that generalize well. We evaluate our method - Sparsely Reconstructed Graphs (SpaRG) - on the public ABIDE dataset for the task of sex classification, training with labeled cases from 18 sites and adapting the model to two additional out-of-distribution sites with a portion of unlabeled samples. For a relatively coarse parcellation (64 regions), SpaRG utilizes only 1% of the original connections while improving the classification accuracy across domains. Our code can be found at github.com/yanismiraoui/SpaRG.
Published: 2024

5. Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning

Author: Wang, Yixin, Peng, Wei, Zhang, Yu, Adeli, Ehsan, Zhao, Qingyu, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Many longitudinal neuroimaging studies aim to improve the understanding of brain aging and diseases by studying the dynamic interactions between brain function and cognition. Doing so requires accurate encoding of their multidimensional relationship while accounting for individual variability over time. For this purpose, we propose an unsupervised learning model (called \underline{\textbf{Co}}ntrastive Learning-based \underline{\textbf{Gra}}ph Generalized \underline{\textbf{Ca}}nonical Correlation Analysis (CoGraCa)) that encodes their relationship via Graph Attention Networks and generalized Canonical Correlational Analysis. To create brain-cognition fingerprints reflecting unique neural and cognitive phenotype of each person, the model also relies on individualized and multimodal contrastive learning. We apply CoGraCa to longitudinal dataset of healthy individuals consisting of resting-state functional MRI and cognitive measures acquired at multiple visits for each participant. The generated fingerprints effectively capture significant individual differences and outperform current single-modal and CCA-based multimodal models in identifying sex and age. More importantly, our encoding provides interpretable interactions between those two modalities.
Published: 2024

6. Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis

Author: Lüpke, Sven, Yeganeh, Yousef, Adeli, Ehsan, Navab, Nassir, and Farshad, Azade
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in generative models for medical imaging have shown promise in representing multiple modalities. However, the variability in modality availability across datasets limits the general applicability of the synthetic data they produce. To address this, we present a novel physics-informed generative model capable of synthesizing a variable number of brain MRI modalities, including those not present in the original dataset. Our approach utilizes latent diffusion models and a two-step generative process: first, unobserved physical tissue property maps are synthesized using a latent diffusion model, and then these maps are combined with a physical signal model to generate the final MRI scan. Our experiments demonstrate the efficacy of this approach in generating unseen MR contrasts and preserving physical plausibility. Furthermore, we validate the distributions of generated tissue properties by comparing them to those measured in real brain tissue., Comment: 5th International Workshop on Multiscale Multimodal Medical Imaging (MICCAI 2024), Project page: https://sven-luepke.github.io/phy-ldm-mri/
Published: 2024

7. Latent 3D Brain MRI Counterfactual

Author: Peng, Wei, Xia, Tian, Ribeiro, Fabio De Sousa, Bosschieter, Tomas, Adeli, Ehsan, Zhao, Qingyu, Glocker, Ben, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The number of samples in structural brain MRI studies is often too small to properly train deep learning models. Generative models show promise in addressing this issue by effectively learning the data distribution and generating high-fidelity MRI. However, they struggle to produce diverse, high-quality data outside the distribution defined by the training data. One way to address the issue is using causal models developed for 3D volume counterfactuals. However, accurately modeling causality in high-dimensional spaces is a challenge so that these models generally generate 3D brain MRIS of lower quality. To address these challenges, we propose a two-stage method that constructs a Structural Causal Model (SCM) within the latent space. In the first stage, we employ a VQ-VAE to learn a compact embedding of the MRI volume. Subsequently, we integrate our causal model into this latent space and execute a three-step counterfactual procedure using a closed-form Generalized Linear Model (GLM). Our experiments conducted on real-world high-resolution MRI data (1mm) demonstrate that our method can generate high-quality 3D MRI counterfactuals.
Published: 2024

8. OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

Author: Sun, Adam, Xiang, Tiange, Delp, Scott, Fei-Fei, Li, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Most existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of the human. Considering this, we present OccFusion, an approach that utilizes efficient 3D Gaussian splatting supervised by pretrained 2D diffusion models for efficient and high-fidelity human rendering. We propose a pipeline consisting of three stages. In the Initialization stage, complete human masks are generated from partial visibility masks. In the Optimization stage, 3D human Gaussians are optimized with additional supervision by Score-Distillation Sampling (SDS) to create a complete geometry of the human. Finally, in the Refinement stage, in-context inpainting is designed to further improve rendering quality on the less observed human body parts. We evaluate OccFusion on ZJU-MoCap and challenging OcMotion sequences and find that it achieves state-of-the-art performance in the rendering of occluded humans.
Published: 2024

9. Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)

Author: Durante, Zane, Harries, Robathan, Vendrow, Edward, Luo, Zelun, Kyuragi, Yuta, Kozuka, Kazuki, Fei-Fei, Li, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involving multi-person interactions in home environments. In this paper, we propose a new dataset and benchmark, InteractADL, for understanding complex ADLs that involve interaction between humans (and objects). Furthermore, complex ADLs occurring in home environments comprise a challenging long-tailed distribution due to the rarity of multi-person interactions, and pose fine-grained visual recognition tasks due to the presence of semantically and visually similar classes. To address these issues, we propose a novel method for fine-grained few-shot video classification called Name Tuning that enables greater semantic separability by learning optimal class name vectors. We show that Name Tuning can be combined with existing prompt tuning strategies to learn the entire input text (rather than only learning the prompt or class names) and demonstrate improved performance for few-shot classification on InteractADL and 4 other fine-grained visual classification benchmarks. For transparency and reproducibility, we release our code at https://github.com/zanedurante/vlm_benchmark.
Published: 2024

10. A Gentle Approach to Multi-Sensor Fusion Data Using Linear Kalman Filter

Author: Veysi, Parsa, Adeli, Mohsen, Naziri, Nayerosadat Peirov, and Adeli, Ehsan
Subjects: Computer Science - Computers and Society, Computer Science - Computational Engineering, Finance, and Science
Abstract: This research paper delves into the Linear Kalman Filter (LKF), highlighting its importance in merging data from multiple sensors. The Kalman Filter is known for its recursive solution to the linear filtering problem in discrete data, making it ideal for estimating states in dynamic systems by reducing noise in measurements and processes. Our focus is on linear dynamic systems due to the LKF's assumptions about system dynamics, measurement noise, and initial conditions. We thoroughly explain the principles, assumptions, and mechanisms of the LKF, emphasizing its practical application in multi-sensor data fusion. This fusion is essential for integrating diverse sensory inputs, thereby improving the accuracy and reliability of state estimations. To illustrate the LKF's real-world applicability and versatility, the paper presents two physical examples where the LKF significantly enhances precision and stability in dynamic systems. These examples not only demonstrate the theoretical concepts but also provide practical insights into implementing the LKF in multi-sensor data fusion scenarios. Our discussion underscores the LKF's crucial role in fields such as robotics, navigation, and signal processing. By combining an in-depth exploration of the LKF's theoretical foundations with practical examples, this paper aims to provide a comprehensive and accessible understanding of multi-sensor data fusion. Our goal is to contribute to the growing body of knowledge in this important area of research, promoting further innovations and advancements in data fusion technologies and encouraging their wider adoption across various scientific and industrial fields.
Published: 2024

11. Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation

Author: Hwa, Jensen, Zhao, Qingyu, Lahiri, Aditya, Masood, Adnan, Salimi, Babak, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn invariant features with respect to a protected variable (e.g., race when classifying sex from face images) or enforce CI relative to the protected attribute only on the model output (e.g., the sex label). Neither of these methods are effective in enforcing CI in high-dimensional feature spaces. In this paper, we focus on a nascent approach characterizing the CI constraint in terms of two Jensen-Shannon divergence terms, and we extend it to high-dimensional feature spaces using a novel dynamic sampling strategy. In doing so, we introduce a new training paradigm that can be applied to any encoder architecture. We are able to enforce conditional independence of the diffusion autoencoder latent representation with respect to any protected attribute under the equalized odds constraint and show that this approach enables causal image generation with controllable latent spaces. Our experimental results demonstrate that our approach can achieve high accuracy on downstream tasks while upholding equality of odds., Comment: To appear at the 2024 IEEE CVPR Workshop on Fair, Data-Efficient, and Trusted Computer Vision
Published: 2024

12. Towards Robust 3D Pose Transfer with Adversarial Learning

Author: Chen, Haoyu, Tang, Hao, Adeli, Ehsan, and Zhao, Guoying
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However, to obtain those clean pose sources, cumbersome but necessary pre-processing pipelines are inevitable, hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore, we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE), a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides, we demonstrate the strong generalizability of our method on various poses, different domains, and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models., Comment: CVPR 2024
Published: 2024

13. A health-equity framework for tailoring digital non-pharmacological interventions in aging

Author: Turnbull, Adam, Odden, Michelle C., Gould, Christine E., Adeli, Ehsan, Kaplan, Robert M., and Lin, Feng Vankee
Published: 2024
Full Text: View/download PDF

14. Profiles of brain topology for dual-functional stability in old age

Author: Zhou, Sa, Anthony, Mia, Adeli, Ehsan, and Lin, F. Vankee
Published: 2024
Full Text: View/download PDF

15. Data-driven discovery of movement-linked heterogeneity in neurodegenerative diseases

Author: Endo, Mark, Nerrise, Favour, Zhao, Qingyu, Sullivan, Edith V., Fei-Fei, Li, Henderson, Victor W., Pohl, Kilian M., Poston, Kathleen L., and Adeli, Ehsan
Published: 2024
Full Text: View/download PDF

16. Neurocognitive Latent Space Regularization for Multi-label Diagnosis from MRI

Author: Manasseh-Lewis, Jocasta, Godoy, Felipe, Peng, Wei, Paul, Robert, Adeli, Ehsan, Pohl, Kilian, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rekik, Islem, editor, Adeli, Ehsan, editor, Park, Sang Hyun, editor, and Cintas, Celia, editor
Published: 2025
Full Text: View/download PDF

17. An Interactive Agent Foundation Model

Author: Durante, Zane, Sarkar, Bidipta, Gong, Ran, Taori, Rohan, Noda, Yusuke, Tang, Paul, Adeli, Ehsan, Lakshmikanth, Shrinidhi Kowshika, Schulman, Kevin, Milstein, Arnold, Terzopoulos, Demetri, Famoti, Ade, Kuno, Noboru, Llorens, Ashley, Vo, Hoi, Ikeuchi, Katsu, Fei-Fei, Li, Gao, Jianfeng, Wake, Naoki, and Huang, Qiuyuan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.
Published: 2024

18. Wild2Avatar: Rendering Humans Behind Occlusions

Author: Xiang, Tiange, Sun, Adam, Delp, Scott, Kozuka, Kazuki, Fei-Fei, Li, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Rendering the visual appearance of moving humans from occluded monocular videos is a challenging task. Most existing research renders 3D humans under ideal conditions, requiring a clear and unobstructed scene. Those methods cannot be used to render humans in real-world scenes where obstacles may block the camera's view and lead to partial occlusions. In this work, we present Wild2Avatar, a neural rendering approach catered for occluded in-the-wild monocular videos. We propose occlusion-aware scene parameterization for decoupling the scene into three parts - occlusion, human, and background. Additionally, extensive objective functions are designed to help enforce the decoupling of the human from both the occlusion and the background and to ensure the completeness of the human model. We verify the effectiveness of our approach with experiments on in-the-wild videos.
Published: 2023

19. Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

Author: Kim, Soopil, An, Sion, Chikontwe, Philip, Kang, Myeongkyun, Adeli, Ehsan, Pohl, Kilian M., and Park, Sang Hyun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are some prior few-shot or unsupervised co-part segmentation algorithms, they often fail on images with industrial object. These images have components with similar textures and shapes, and a precise differentiation proves challenging. In this study, we introduce a novel component segmentation model for LA detection that leverages a few labeled samples and unlabeled images sharing logical constraints. To ensure consistent segmentation across unlabeled images, we employ a histogram matching loss in conjunction with an entropy loss. As segmentation predictions play a crucial role, we propose to enhance both local and global sample validity detection by capturing key aspects from visual semantics via three memory banks: class histograms, component composition embeddings and patch-level representations. For effective LA detection, we propose an adaptive scaling strategy to standardize anomaly scores from different memory banks in inference. Extensive experiments on the public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA detection vs. 89.6% from competing methods., Comment: Accepted in AAAI2024
Published: 2023

20. PRISM: Progressive Restoration for Scene Graph-based Image Manipulation

Author: Jahoda, Pavel, Farshad, Azade, Yeganeh, Yousef, Adeli, Ehsan, and Navab, Nassir
Subjects: Computer Science - Machine Learning
Abstract: Scene graphs have emerged as accurate descriptive priors for image generation and manipulation tasks, however, their complexity and diversity of the shapes and relations of objects in data make it challenging to incorporate them into the models and generate high-quality results. To address these challenges, we propose PRISM, a novel progressive multi-head image manipulation approach to improve the accuracy and quality of the manipulated regions in the scene. Our image manipulation framework is trained using an end-to-end denoising masked reconstruction proxy task, where the masked regions are progressively unmasked from the outer regions to the inner part. We take advantage of the outer part of the masked area as they have a direct correlation with the context of the scene. Moreover, our multi-head architecture simultaneously generates detailed object-specific regions in addition to the entire image to produce higher-quality images. Our model outperforms the state-of-the-art methods in the semantic image manipulation task on the CLEVR and Visual Genome datasets. Our results demonstrate the potential of our approach for enhancing the quality and precision of scene graph-based image manipulation.
Published: 2023

21. 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

Author: Chen, Jieneng, Mei, Jieru, Li, Xianhang, Lu, Yongyi, Yu, Qihang, Wei, Qingyue, Luo, Xiangde, Xie, Yutong, Adeli, Ehsan, Wang, Yan, Lungren, Matthew, Xing, Lei, Lu, Le, Yuille, Alan, and Zhou, Yuyin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning. The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks. However, U-Net's convolution-based operations inherently limit its ability to model long-range dependencies effectively. To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms, as alternative architectures. One popular network is our previous TransUNet, which leverages Transformers' self-attention to complement U-Net's localized information with the global context. In this paper, we extend the 2D TransUNet architecture to a 3D network by building upon the state-of-the-art nnU-Net architecture, and fully exploring Transformers' potential in both the encoder and decoder design. We introduce two key components: 1) A Transformer encoder that tokenizes image patches from a convolution neural network (CNN) feature map, enabling the extraction of global contexts, and 2) A Transformer decoder that adaptively refines candidate regions by utilizing cross-attention between candidate proposals and U-Net features. Our investigations reveal that different medical tasks benefit from distinct architectural designs. The Transformer encoder excels in multi-organ segmentation, where the relationship among organs is crucial. On the other hand, the Transformer decoder proves more beneficial for dealing with small and challenging segmented targets such as tumor segmentation. Extensive experiments showcase the significant potential of integrating a Transformer-based encoder and decoder into the u-shaped medical image segmentation architecture. TransUNet outperforms competitors in various medical applications., Comment: Code and models are available at https://github.com/Beckschen/3D-TransUNet
Published: 2023

22. Metadata-Conditioned Generative Models to Synthesize Anatomically-Plausible 3D Brain MRIs

Author: Peng, Wei, Bosschieter, Tomas, Ouyang, Jiahong, Paul, Robert, Adeli, Ehsan, Zhao, Qingyu, and Pohl, Kilian M.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Generative AI models hold great potential in creating synthetic brain MRIs that advance neuroimaging studies by, for example, enriching data diversity. However, the mainstay of AI research only focuses on optimizing the visual quality (such as signal-to-noise ratio) of the synthetic MRIs while lacking insights into their relevance to neuroscience. To gain these insights with respect to T1-weighted MRIs, we first propose a new generative model, BrainSynth, to synthesize metadata-conditioned (e.g., age- and sex-specific) MRIs that achieve state-of-the-art visual quality. We then extend our evaluation with a novel procedure to quantify anatomical plausibility, i.e., how well the synthetic MRIs capture macrostructural properties of brain regions, and how accurately they encode the effects of age and sex. Results indicate that more than half of the brain regions in our synthetic MRIs are anatomically accurate, i.e., with a small effect size between real and synthetic MRIs. Moreover, the anatomical plausibility varies across cortical regions according to their geometric complexity. As is, our synthetic MRIs can significantly improve the training of a Convolutional Neural Network to identify accelerated aging effects in an independent study. These results highlight the opportunities of using generative AI to aid neuroimaging research and point to areas for further improvement.
Published: 2023

23. LSOR: Longitudinally-Consistent Self-Organized Representation Learning

Author: Ouyang, Jiahong, Zhao, Qingyu, Adeli, Ehsan, Peng, Wei, Zaharchuk, Greg, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Interpretability is a key issue when applying deep learning models to longitudinal brain MRIs. One way to address this issue is by visualizing the high-dimensional latent spaces generated by deep learning via self-organizing maps (SOM). SOM separates the latent space into clusters and then maps the cluster centers to a discrete (typically 2D) grid preserving the high-dimensional relationship between clusters. However, learning SOM in a high-dimensional latent space tends to be unstable, especially in a self-supervision setting. Furthermore, the learned SOM grid does not necessarily capture clinically interesting information, such as brain age. To resolve these issues, we propose the first self-supervised SOM approach that derives a high-dimensional, interpretable representation stratified by brain age solely based on longitudinal brain MRIs (i.e., without demographic or cognitive information). Called Longitudinally-consistent Self-Organized Representation learning (LSOR), the method is stable during training as it relies on soft clustering (vs. the hard cluster assignments used by existing SOM). Furthermore, our approach generates a latent space stratified according to brain age by aligning trajectories inferred from longitudinal MRIs to the reference vector associated with the corresponding SOM cluster. When applied to longitudinal MRIs of the Alzheimer's Disease Neuroimaging Initiative (ADNI, N=632), LSOR generates an interpretable latent space and achieves comparable or higher accuracy than the state-of-the-art representations with respect to the downstream tasks of classification (static vs. progressive mild cognitive impairment) and regression (determining ADAS-Cog score of all subjects). The code is available at https://github.com/ouyangjiahong/longitudinal-som-single-modality.
Published: 2023

24. Rendering Humans from Object-Occluded Monocular Videos

Author: Xiang, Tiange, Sun, Adam, Wu, Jiajun, Adeli, Ehsan, and Fei-Fei, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D understanding and rendering of moving humans from monocular videos is a challenging task. Despite recent progress, the task remains difficult in real-world scenarios, where obstacles may block the camera view and cause partial occlusions in the captured videos. Existing methods cannot handle such defects due to two reasons. First, the standard rendering strategy relies on point-point mapping, which could lead to dramatic disparities between the visible and occluded areas of the body. Second, the naive direct regression approach does not consider any feasibility criteria (ie, prior information) for rendering under occlusions. To tackle the above drawbacks, we present OccNeRF, a neural rendering method that achieves better rendering of humans in severely occluded scenes. As direct solutions to the two drawbacks, we propose surface-based rendering by integrating geometry and visibility priors. We validate our method on both simulated and real-world occlusions and demonstrate our method's superiority., Comment: ICCV 2023, project page: https://cs.stanford.edu/~xtiange/projects/occnerf/
Published: 2023

25. An Explainable Geometric-Weighted Graph Attention Network for Identifying Functional Networks Associated with Gait Impairment

Author: Nerrise, Favour, Zhao, Qingyu, Poston, Kathleen L., Pohl, Kilian M., and Adeli, Ehsan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Image and Video Processing, Quantitative Biology - Neurons and Cognition
Abstract: One of the hallmark symptoms of Parkinson's Disease (PD) is the progressive loss of postural reflexes, which eventually leads to gait difficulties and balance problems. Identifying disruptions in brain function associated with gait impairment could be crucial in better understanding PD motor progression, thus advancing the development of more effective and personalized therapeutics. In this work, we present an explainable, geometric, weighted-graph attention neural network (xGW-GAT) to identify functional networks predictive of the progression of gait difficulties in individuals with PD. xGW-GAT predicts the multi-class gait impairment on the MDS Unified PD Rating Scale (MDS-UPDRS). Our computational- and data-efficient model represents functional connectomes as symmetric positive definite (SPD) matrices on a Riemannian manifold to explicitly encode pairwise interactions of entire connectomes, based on which we learn an attention mask yielding individual- and group-level explainability. Applied to our resting-state functional MRI (rs-fMRI) dataset of individuals with PD, xGW-GAT identifies functional connectivity patterns associated with gait impairment in PD and offers interpretable explanations of functional subnetworks associated with motor impairment. Our model successfully outperforms several existing methods while simultaneously revealing clinically-relevant connectivity patterns. The source code is available at https://github.com/favour-nerrise/xGW-GAT ., Comment: Accepted by the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). MICCAI Student-Author Registration (STAR) Award. 11 pages, 2 figures, 1 table, appendix. Source Code: https://github.com/favour-nerrise/xGW-GAT
Published: 2023

26. HomE: Homography-Equivariant Video Representation Learning

Author: Sriram, Anirudh, Gaidon, Adrien, Wu, Jiajun, Niebles, Juan Carlos, Fei-Fei, Li, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent advances in self-supervised representation learning have enabled more efficient and robust model performance without relying on extensive labeled data. However, most works are still focused on images, with few working on videos and even fewer on multi-view videos, where more powerful inductive biases can be leveraged for self-supervision. In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE). Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views. We evaluate our HomE representation via action recognition and pedestrian intent prediction as downstream tasks. On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101 dataset, better than most state-of-the-art self-supervised learning methods. Similarly, on the STIP dataset, we outperform the state-of-the-art by 6% for pedestrian intent prediction one second into the future while also obtaining an accuracy of 91.2% for pedestrian action (cross vs. not-cross) classification. Code is available at https://github.com/anirudhs123/HomE., Comment: 10 pages, 4 figures, 4 tables
Published: 2023

27. SCOPE: Structural Continuity Preservation for Medical Image Segmentation

Author: Yeganeh, Yousef, Farshad, Azade, Guevercin, Goktug, Abu-zer, Amr, Xiao, Rui, Tang, Yongjian, Adeli, Ehsan, and Navab, Nassir
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Although the preservation of shape continuity and physiological anatomy is a natural assumption in the segmentation of medical images, it is often neglected by deep learning methods that mostly aim for the statistical modeling of input data as pixels rather than interconnected structures. In biological structures, however, organs are not separate entities; for example, in reality, a severed vessel is an indication of an underlying problem, but traditional segmentation models are not designed to strictly enforce the continuity of anatomy, potentially leading to inaccurate medical diagnoses. To address this issue, we propose a graph-based approach that enforces the continuity and connectivity of anatomical topology in medical images. Our method encodes the continuity of shapes as a graph constraint, ensuring that the network's predictions maintain this continuity. We evaluate our method on two public benchmarks on retinal vessel segmentation, showing significant improvements in connectivity metrics compared to traditional methods while getting better or on-par performance on segmentation metrics.
Published: 2023

28. DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

Author: Yeganeh, Yousef, Farshad, Azade, Weinberger, Peter, Ahmadi, Seyed-Ahmad, Adeli, Ehsan, and Navab, Nassir
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.
Published: 2023

29. Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions

Author: Wang, Yanchen, Turnbull, Adam, Xu, Yunlong, Heffner, Kathi, Lin, Feng Vankee, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Computerized cognitive training (CCT) is a scalable, well-tolerated intervention that has promise for slowing cognitive decline. Outcomes from CCT are limited by a lack of effective engagement, which is decreased by factors such as mental fatigue, particularly in older adults at risk for dementia. There is a need for scalable, automated measures that can monitor mental fatigue during CCT. Here, we develop and validate a novel Recurrent Video Transformer (RVT) method for monitoring real-time mental fatigue in older adults with mild cognitive impairment from video-recorded facial gestures during CCT. The RVT model achieved the highest balanced accuracy(78%) and precision (0.82) compared to the prior state-of-the-art models for binary and multi-class classification of mental fatigue and was additionally validated via significant association (p=0.023) with CCT reaction time. By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately measure real-time mental fatigue, laying the foundation for future personalized CCT that increase effective engagement., Comment: 23 pages, 6 figures
Published: 2023

30. Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model

Author: Peng, Wei, Adeli, Ehsan, Bosschieter, Tomas, Park, Sang Hyun, Zhao, Qingyu, and Pohl, Kilian M.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: As acquiring MRIs is expensive, neuroscience studies struggle to attain a sufficient number of them for properly training deep learning models. This challenge could be reduced by MRI synthesis, for which Generative Adversarial Networks (GANs) are popular. GANs, however, are commonly unstable and struggle with creating diverse and high-quality data. A more stable alternative is Diffusion Probabilistic Models (DPMs) with a fine-grained training strategy. To overcome their need for extensive computational resources, we propose a conditional DPM (cDPM) with a memory-efficient process that generates realistic-looking brain MRIs. To this end, we train a 2D cDPM to generate an MRI subvolume conditioned on another subset of slices from the same MRI. By generating slices using arbitrary combinations between condition and target slices, the model only requires limited computational resources to learn interdependencies between slices even if they are spatially far apart. After having learned these dependencies via an attention network, a new anatomy-consistent 3D brain MRI is generated by repeatedly applying the cDPM. Our experiments demonstrate that our method can generate high-quality 3D MRIs that share a similar distribution to real MRIs while still diversifying the training set. The code is available at https://github.com/xiaoiker/mask3DMRI_diffusion and also will be released as part of MONAI, at https://github.com/Project-MONAI/GenerativeModels.
Published: 2022

31. SOM2LM: Self-Organized Multi-Modal Longitudinal Maps

Author: Ouyang, Jiahong, Zhao, Qingyu, Adeli, Ehsan, Zaharchuk, Greg, Pohl, Kilian M., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
Published: 2024
Full Text: View/download PDF

32. SCOPE: Structural Continuity Preservation for Retinal Vessel Segmentation

Author: Yeganeh, Yousef, Güvercin, Göktuğ, Xiao, Rui, Abuzer, Amr, Adeli, Ehsan, Farshad, Azade, Navab, Nassir, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ahmadi, Seyed-Ahmad, editor, and Pereira, Sérgio, editor
Published: 2024
Full Text: View/download PDF

33. Medical Image Segmentation Review: The success of U-Net

Author: Azad, Reza, Aghdam, Ehsan Khodapanah, Rauland, Amelie, Jia, Yiwei, Avval, Atlas Haddadi, Bozorgpour, Afshin, Karimijafarbigloo, Sanaz, Cohen, Joseph Paul, Adeli, Ehsan, and Merhof, Dorit
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository., Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journal
Published: 2022

34. Joint Graph Convolution for Analyzing Brain Structural and Functional Connectome

Author: Li, Yueting, Wei, Qingyue, Adeli, Ehsan, Pohl, Kilian M., and Zhao, Qingyu
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The white-matter (micro-)structural architecture of the brain promotes synchrony among neuronal populations, giving rise to richly patterned functional connections. A fundamental problem for systems neuroscience is determining the best way to relate structural and functional networks quantified by diffusion tensor imaging and resting-state functional MRI. As one of the state-of-the-art approaches for network analysis, graph convolutional networks (GCN) have been separately used to analyze functional and structural networks, but have not been applied to explore inter-network relationships. In this work, we propose to couple the two networks of an individual by adding inter-network edges between corresponding brain regions, so that the joint structure-function graph can be directly analyzed by a single GCN. The weights of inter-network edges are learnable, reflecting non-uniform structure-function coupling strength across the brain. We apply our Joint-GCN to predict age and sex of 662 participants from the public dataset of the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA) based on their functional and micro-structural white-matter networks. Our results support that the proposed Joint-GCN outperforms existing multi-modal graph learning approaches for analyzing structural and functional networks.
Published: 2022

35. SoMoFormer: Multi-Person Pose Forecasting with Transformers

Author: Vendrow, Edward, Kumar, Satyajit, Adeli, Ehsan, and Rezatofighi, Hamid
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human pose forecasting is a challenging problem involving complex human body motion and posture dynamics. In cases that there are multiple people in the environment, one's motion may also be influenced by the motion and dynamic movements of others. Although there are several previous works targeting the problem of multi-person dynamic pose forecasting, they often model the entire pose sequence as time series (ignoring the underlying relationship between joints) or only output the future pose sequence of one person at a time. In this paper, we present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting. Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence, allowing us to perform attention over joints while predicting an entire future motion sequence for each joint in parallel. We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries. Using learned embeddings to denote the type of joint, person identity, and global position, our model learns the relationships between joints and between people, attending more strongly to joints from the same or nearby people. SoMoFormer outperforms state-of-the-art methods for long-term motion prediction on the SoMoF benchmark as well as the CMU-Mocap and MuPoTS-3D datasets. Code will be made available after publication., Comment: 10 pages, 6 figures. Submitted to WACV 2023. Our method was submitted to the SoMoF benchmark leaderboard dated March 2022. See https://somof.stanford.edu/result/217/
Published: 2022

36. Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

Author: Su, Stephen, Kwong, Samuel, Zhao, Qingyu, Huang, De-An, Niebles, Juan Carlos, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: There has been an increasing interest in multi-task learning for video understanding in recent years. In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on. We employ Necessary Condition Analysis (NCA) as a data-driven approach for deciding what category these tasks should fall in. Our novel proposed framework, Adversarial Multi-Task Neural Networks (AMT), penalizes adversarial tasks, determined by NCA to be scene recognition in the Holistic Video Understanding (HVU) dataset, to improve action recognition. This upends the common assumption that the model should always be encouraged to do well on all tasks in multi-task learning. Simultaneously, AMT still retains all the benefits of multi-task learning as a generalization of existing methods and uses object recognition as an auxiliary task to aid action recognition. We introduce two challenging Scene-Invariant test splits of HVU, where the model is evaluated on action-scene co-occurrences not encountered in training. We show that our approach improves accuracy by ~3% and encourages the model to attend to action features instead of correlation-biasing scene features.
Published: 2022

37. Multiple Instance Neuroimage Transformer

Author: Singla, Ayush, Zhao, Qingyu, Do, Daniel K., Zhou, Yuyin, Pohl, Kilian M., and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: For the first time, we propose using a multiple instance learning based convolution-free transformer model, called Multiple Instance Neuroimage Transformer (MINiT), for the classification of T1weighted (T1w) MRIs. We first present several variants of transformer models adopted for neuroimages. These models extract non-overlapping 3D blocks from the input volume and perform multi-headed self-attention on a sequence of their linear projections. MINiT, on the other hand, treats each of the non-overlapping 3D blocks of the input MRI as its own instance, splitting it further into non-overlapping 3D patches, on which multi-headed self-attention is computed. As a proof-of-concept, we evaluate the efficacy of our model by training it to identify sex from T1w-MRIs of two public datasets: Adolescent Brain Cognitive Development (ABCD) and the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA). The learned attention maps highlight voxels contributing to identifying sex differences in brain morphometry. The code is available at https://github.com/singlaayush/MINIT.
Published: 2022
Full Text: View/download PDF

38. TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation

Author: Azad, Reza, Heidari, Moein, Shariatnia, Moein, Aghdam, Ehsan Khodapanah, Karimijafarbigloo, Sanaz, Adeli, Ehsan, and Merhof, Dorit
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab
Published: 2022

39. Bridging the Gap between Deep Learning and Hypothesis-Driven Analysis via Permutation Testing

Author: Paschali, Magdalini, Zhao, Qingyu, Adeli, Ehsan, and Pohl, Kilian M.
Subjects: Computer Science - Machine Learning
Abstract: A fundamental approach in neuroscience research is to test hypotheses based on neuropsychological and behavioral measures, i.e., whether certain factors (e.g., related to life events) are associated with an outcome (e.g., depression). In recent years, deep learning has become a potential alternative approach for conducting such analyses by predicting an outcome from a collection of factors and identifying the most "informative" ones driving the prediction. However, this approach has had limited impact as its findings are not linked to statistical significance of factors supporting hypotheses. In this article, we proposed a flexible and scalable approach based on the concept of permutation testing that integrates hypothesis testing into the data-driven deep learning analysis. We apply our approach to the yearly self-reported assessments of 621 adolescent participants of the National Consortium of Alcohol and Neurodevelopment in Adolescence (NCANDA) to predict negative valence, a symptom of major depressive disorder according to the NIMH Research Domain Criteria (RDoC). Our method successfully identifies categories of risk factors that further explain the symptom., Comment: Accepted at the 5th workshop on PRedictive Intelligence in Medicine (PRIME 2022) - MICCAI 2022
Published: 2022

40. A Penalty Approach for Normalizing Feature Distributions to Build Confounder-Free Models

Author: Vento, Anthony, Zhao, Qingyu, Paul, Robert, Pohl, Kilian M., and Adeli, Ehsan
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Translating machine learning algorithms into clinical applications requires addressing challenges related to interpretability, such as accounting for the effect of confounding variables (or metadata). Confounding variables affect the relationship between input training data and target outputs. When we train a model on such data, confounding variables will bias the distribution of the learned features. A recent promising solution, MetaData Normalization (MDN), estimates the linear relationship between the metadata and each feature based on a non-trainable closed-form solution. However, this estimation is confined by the sample size of a mini-batch and thereby may cause the approach to be unstable during training. In this paper, we extend the MDN method by applying a Penalty approach (referred to as PDMN). We cast the problem into a bi-level nested optimization problem. We then approximate this optimization problem using a penalty method so that the linear parameters within the MDN layer are trainable and learned on all samples. This enables PMDN to be plugged into any architectures, even those unfit to run batch-level operations, such as transformers and recurrent models. We show improvement in model accuracy and greater independence from confounders using PMDN over MDN in a synthetic experiment and a multi-label, multi-site dataset of magnetic resonance images (MRIs).
Published: 2022

41. GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation

Author: Endo, Mark, Poston, Kathleen L., Sullivan, Edith V., Fei-Fei, Li, Pohl, Kilian M., and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Parkinson's disease (PD) is a neurological disorder that has a variety of observable motor-related symptoms such as slow movement, tremor, muscular rigidity, and impaired posture. PD is typically diagnosed by evaluating the severity of motor impairments according to scoring systems such as the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). Automated severity prediction using video recordings of individuals provides a promising route for non-intrusive monitoring of motor impairments. However, the limited size of PD gait data hinders model ability and clinical potential. Because of this clinical data scarcity and inspired by the recent advances in self-supervised large-scale language models like GPT-3, we use human motion forecasting as an effective self-supervised pre-training task for the estimation of motor impairment severity. We introduce GaitForeMer, Gait Forecasting and impairment estimation transforMer, which is first pre-trained on public datasets to forecast gait movements and then applied to clinical data to predict MDS-UPDRS gait impairment severity. Our method outperforms previous approaches that rely solely on clinical data by a large margin, achieving an F1 score of 0.76, precision of 0.79, and recall of 0.75. Using GaitForeMer, we show how public human movement data repositories can assist clinical use cases through learning universal motion representations. The code is available at https://github.com/markendo/GaitForeMer ., Comment: Accepted as a conference paper at MICCAI (Medical Image Computing and Computer Assisted Intervention) 2022
Published: 2022

42. Combining Counterfactuals With Shapley Values To Explain Image Models

Author: Lahiri, Aditya, Alipour, Kamran, Adeli, Ehsan, and Salimi, Babak
Subjects: Computer Science - Machine Learning
Abstract: With the widespread use of sophisticated machine learning models in sensitive applications, understanding their decision-making has become an essential task. Models trained on tabular data have witnessed significant progress in explanations of their underlying decision making processes by virtue of having a small number of discrete features. However, applying these methods to high-dimensional inputs such as images is not a trivial task. Images are composed of pixels at an atomic level and do not carry any interpretability by themselves. In this work, we seek to use annotated high-level interpretable features of images to provide explanations. We leverage the Shapley value framework from Game Theory, which has garnered wide acceptance in general XAI problems. By developing a pipeline to generate counterfactuals and subsequently using it to estimate Shapley values, we obtain contrastive and interpretable explanations with strong axiomatic guarantees.
Published: 2022

43. Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces

Author: Alipour, Kamran, Lahiri, Aditya, Adeli, Ehsan, Salimi, Babak, and Pazzani, Michael
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Despite their high accuracies, modern complex image classifiers cannot be trusted for sensitive tasks due to their unknown decision-making process and potential biases. Counterfactual explanations are very effective in providing transparency for these black-box algorithms. Nevertheless, generating counterfactuals that can have a consistent impact on classifier outputs and yet expose interpretable feature changes is a very challenging task. We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers using pretrained generative models without any re-training or conditioning. The generative models in this technique are not bound to be trained on the same data as the target classifier. We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers. On the task of face attribute classification, we show how different attributes influence the classifier output by providing both causal and contrastive feature attributions, and the corresponding counterfactual images.
Published: 2022

44. PrivHAR: Recognizing Human Actions From Privacy-preserving Lens

Author: Hinojosa, Carlos, Marquez, Miguel, Arguello, Henry, Adeli, Ehsan, Fei-Fei, Li, and Niebles, Juan Carlos
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: The accelerated use of digital cameras prompts an increasing concern about privacy and security, particularly in applications such as action recognition. In this paper, we propose an optimizing framework to provide robust visual privacy protection along the human action recognition pipeline. Our framework parameterizes the camera lens to successfully degrade the quality of the videos to inhibit privacy attributes and protect against adversarial attacks while maintaining relevant features for activity recognition. We validate our approach with extensive simulations and hardware experiments., Comment: Oral paper presented at European Conference on Computer Vision (ECCV) 2022, in Tel Aviv, Israel
Published: 2022
Full Text: View/download PDF

45. Affective Medical Estimation and Decision Making via Visualized Learning and Deep Learning

Author: Eslami, Mohammad, Tabarestani, Solale, Adeli, Ehsan, Elwyn, Glyn, Elze, Tobias, Wang, Mengyu, Zebardast, Nazlee, Navab, Nassir, and Adjouadi, Malek
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository.
Published: 2022

46. An advanced spatio-temporal convolutional recurrent neural network for storm surge predictions

Author: Adeli, Ehsan, Sun, Luning, Wang, Jianxun, and Taflanidis, Alexandros A.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: In this research paper, we study the capability of artificial neural network models to emulate storm surge based on the storm track/size/intensity history, leveraging a database of synthetic storm simulations. Traditionally, Computational Fluid Dynamics solvers are employed to numerically solve the storm surge governing equations that are Partial Differential Equations and are generally very costly to simulate. This study presents a neural network model that can predict storm surge, informed by a database of synthetic storm simulations. This model can serve as a fast and affordable emulator for the very expensive CFD solvers. The neural network model is trained with the storm track parameters used to drive the CFD solvers, and the output of the model is the time-series evolution of the predicted storm surge across multiple nodes within the spatial domain of interest. Once the model is trained, it can be deployed for further predictions based on new storm track inputs. The developed neural network model is a time-series model, a Long short-term memory, a variation of Recurrent Neural Network, which is enriched with Convolutional Neural Networks. The convolutional neural network is employed to capture the correlation of data spatially. Therefore, the temporal and spatial correlations of data are captured by the combination of the mentioned models, the ConvLSTM model. As the problem is a sequence to sequence time-series problem, an encoder-decoder ConvLSTM model is designed. Some other techniques in the process of model training are also employed to enrich the model performance. The results show the proposed convolutional recurrent neural network outperforms the Gaussian Process implementation for the examined synthetic storm database.
Published: 2022

47. Intervertebral Disc Labeling With Learning Shape Information, A Look Once Approach

Author: Azad, Reza, Heidari, Moein, Cohen-Adad, Julien, Adeli, Ehsan, and Merhof, Dorit
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Accurate and automatic segmentation of intervertebral discs from medical images is a critical task for the assessment of spine-related diseases such as osteoporosis, vertebral fractures, and intervertebral disc herniation. To date, various approaches have been developed in the literature which routinely relies on detecting the discs as the primary step. A disadvantage of many cohort studies is that the localization algorithm also yields false-positive detections. In this study, we aim to alleviate this problem by proposing a novel U-Net-based structure to predict a set of candidates for intervertebral disc locations. In our design, we integrate the image shape information (image gradients) to encourage the model to learn rich and generic geometrical information. This additional signal guides the model to selectively emphasize the contextual representation and suppress the less discriminative features. On the post-processing side, to further decrease the false positive rate, we propose a permutation invariant 'look once' model, which accelerates the candidate recovery procedure. In comparison with previous studies, our proposed approach does not need to perform the selection in an iterative fashion. The proposed method was evaluated on the spine generic public multi-center dataset and demonstrated superior performance compared to previous work. We have provided the implementation code in https://github.com/rezazad68/intervertebral-lookonce
Published: 2022

48. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

Author: Chen, Jieneng, Mei, Jieru, Li, Xianhang, Lu, Yongyi, Yu, Qihang, Wei, Qingyue, Luo, Xiangde, Xie, Yutong, Adeli, Ehsan, Wang, Yan, Lungren, Matthew P., Zhang, Shaoting, Xing, Lei, Lu, Le, Yuille, Alan, and Zhou, Yuyin
Published: 2024
Full Text: View/download PDF

49. The Transition From Homogeneous to Heterogeneous Machine Learning in Neuropsychiatric Research

Author: Zhao, Qingyu, Nooner, Kate B., Tapert, Susan F., Adeli, Ehsan, Pohl, Kilian M., Kuceyeski, Amy, and Sabuncu, Mert R.
Published: 2025
Full Text: View/download PDF

50. An advanced spatio-temporal convolutional recurrent neural network for storm surge predictions

Author: Adeli, Ehsan, Sun, Luning, Wang, Jianxun, and Taflanidis, Alexandros A.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

801 results on '"Adeli, Ehsan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources