Author: "Khayatkhoei, Mahyar" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Khayatkhoei, Mahyar"' showing total 23 results

Start Over Author "Khayatkhoei, Mahyar"

23 results on '"Khayatkhoei, Mahyar"'

1. Look, Learn and Leverage (L$^3$): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment

Author: Xie, Hanchen, Zhu, Jiageng, Khayatkhoei, Mahyar, Li, Jiazhi, and AbdAlmageed, Wael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Modern deep learning models have demonstrated outstanding performance on discovering the underlying mechanisms when both visual appearance and intrinsic relations (e.g., causal structure) data are sufficient, such as Disentangled Representation Learning (DRL), Causal Representation Learning (CRL) and Visual Question Answering (VQA) methods. However, generalization ability of these models is challenged when the visual domain shifts and the relations data is absent during finetuning. To address this challenge, we propose a novel learning framework, Look, Learn and Leverage (L$^3$), which decomposes the learning process into three distinct phases and systematically utilize the class-agnostic segmentation masks as the common symbolic space to align visual domains. Thus, a relations discovery model can be trained on the source domain, and when the visual domain shifts and the intrinsic relations are absent, the pretrained relations discovery model can be directly reused and maintain a satisfactory performance. Extensive performance evaluations are conducted on three different tasks: DRL, CRL and VQA, and show outstanding results on all three tasks, which reveals the advantages of L$^3$., Comment: 17 pages, 9 figures, 6 tables
Published: 2024

2. An Investigation on The Position Encoding in Vision-Based Dynamics Prediction

Author: Zhu, Jiageng, Xie, Hanchen, Li, Jiazhi, Khayatkhoei, Mahyar, and AbdAlmageed, Wael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the success of vision-based dynamics prediction models, which predict object states by utilizing RGB images and simple object descriptions, they were challenged by environment misalignments. Although the literature has demonstrated that unifying visual domains with both environment context and object abstract, such as semantic segmentation and bounding boxes, can effectively mitigate the visual domain misalignment challenge, discussions were focused on the abstract of environment context, and the insight of using bounding box as the object abstract is under-explored. Furthermore, we notice that, as empirical results shown in the literature, even when the visual appearance of objects is removed, object bounding boxes alone, instead of being directly fed into the network, can indirectly provide sufficient position information via the Region of Interest Pooling operation for dynamics prediction. However, previous literature overlooked discussions regarding how such position information is implicitly encoded in the dynamics prediction model. Thus, in this paper, we provide detailed studies to investigate the process and necessary conditions for encoding position information via using the bounding box as the object abstract into output features. Furthermore, we study the limitation of solely using object abstracts, such that the dynamics prediction performance will be jeopardized when the environment context varies., Comment: 13 pages, 4 tables, and 3 figures. Accepted to ECCV2024 eXCV workshop
Published: 2024

3. ManiFPT: Defining and Analyzing Fingerprints of Generative Models

Author: Song, Hae Jin, Khayatkhoei, Mahyar, and AbdAlmageed, Wael
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent works have shown that generative models leave traces of their underlying generative process on the generated samples, broadly referred to as fingerprints of a generative model, and have studied their utility in detecting synthetic images from real ones. However, the extend to which these fingerprints can distinguish between various types of synthetic image and help identify the underlying generative process remain under-explored. In particular, the very definition of a fingerprint remains unclear, to our knowledge. To that end, in this work, we formalize the definition of artifact and fingerprint in generative models, propose an algorithm for computing them in practice, and finally study its effectiveness in distinguishing a large array of different generative models. We find that using our proposed definition can significantly improve the performance on the task of identifying the underlying generative process from samples (model attribution) compared to existing methods. Additionally, we study the structure of the fingerprints, and observe that it is very predictive of the effect of different design choices on the generative process., Comment: Accepted to CVPR 2024
Published: 2024

4. Exploring Perceptual Limitation of Multimodal Large Language Models

Author: Zhang, Jiarui, Hu, Jinyi, Khayatkhoei, Mahyar, Ilievski, Filip, and Sun, Maosong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception. In particular, while prior works have provided anecdotal evidence of MLLMs' sensitivity to object size, this phenomenon and its underlying causes have not been explored comprehensively. In this work, we quantitatively study the perception of small visual objects in several state-of-the-art MLLMs and reveal a pervasive limitation in answering questions about small objects in images. Next, we identify four independent factors that can contribute to this limitation -- object quality, size, distractors, and location -- and conduct controlled intervention studies to measure the effect of each factor on MLLMs' perception. In particular, we find that lower object quality and smaller object size can both independently reduce MLLMs' ability to answer visual questions. More surprisingly, we find that the location of the object in the image and the presence of visual distractors can also significantly reduce MLLMs' question answering accuracy. Our study provides a better understanding of the perceptual limitation of MLLMs and contributes new evaluation protocols for analyzing the perception of future MLLMs. To facilitate further investigations, we release our code and data., Comment: 14 pages, 14 figures, 3 tables
Published: 2024

5. Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies

Author: Tian, Mulin, Khayatkhoei, Mahyar, Mathai, Joe, and AbdAlmageed, Wael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging task that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments. The fundamental hypothesis behind the proposed detection method is that motion or identity inconsistencies are inevitable in deepfake videos. We will mathematically and empirically support this hypothesis, and then proceed to constructing our method grounded in our theoretical analysis. Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset, and also has several additional advantages: it is scalable because it does not require pristine (real) samples for each identity during inference and therefore can apply to arbitrarily many identities, generalizable because it is trained only on real videos and therefore does not rely on a particular deepfake method, reliable because it does not rely on any likelihood estimation in high dimensions, and explainable because it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert., Comment: 11 pages, 3 figures, 3 tables
Published: 2023

6. SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

Author: Li, Jiazhi, Khayatkhoei, Mahyar, Zhu, Jiageng, Xie, Hanchen, Hussein, Mohamed E., and AbdAlmageed, Wael
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for prediction is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. To that end, in this work, we mathematically and empirically reveal the limitation of existing attribute bias removal methods in presence of strong bias and propose a new method that can mitigate this limitation. Specifically, we first derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength, revealing that they are effective only when the inherent bias in the dataset is relatively weak. Next, we derive a necessary condition for the existence of any method that can remove attribute bias regardless of the bias strength. Inspired by this condition, we then propose a new method using an adversarial objective that directly filters out protected attributes in the input space while maximally preserving all other attributes, without requiring any specific target label. The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets, to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias., Comment: 35 pages, 18 figures, 32 tables. This work is an extended version of our paper (arXiv:2310.04955). Code will be released at https://github.com/jiazhi412/strong_attribute_bias
Published: 2023

7. Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

Author: Zhang, Jiarui, Khayatkhoei, Mahyar, Chhikara, Prateek, and Ilievski, Filip
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Multimodal Large Language Models (MLLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate whether MLLMs can perceive small details as well as large details in images. In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to 46% with size. Furthermore, we show that this effect is causal by observing that human visual cropping can significantly mitigate their sensitivity to size. Inspired by the usefulness of human cropping, we then propose five automatic visual cropping methods -- leveraging either external localization models or the decision process of the given MLLM itself -- as inference time mechanisms to improve the zero-shot performance of MLLMs. We study their effectiveness on four popular VQA datasets, and a subset of the VQAv2 dataset tailored towards fine visual details. Our findings suggest that MLLMs should be used with caution in detail-sensitive VQA applications, and that visual cropping is a promising direction to improve their zero-shot performance. To facilitate further investigation of MLLMs' behaviors, our code and data are publicly released., Comment: 20 pages, 12 figures, 7 tables
Published: 2023

8. Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks

Author: Li, Jiazhi, Khayatkhoei, Mahyar, Zhu, Jiageng, Xie, Hanchen, Hussein, Mohamed E., and AbdAlmageed, Wael
Subjects: Computer Science - Machine Learning
Abstract: Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation., Comment: 15 pages, 4 figures, 3 tables. To appear in Algorithmic Fairness through the Lens of Time Workshop at NeurIPS 2023
Published: 2023

9. Shadow Datasets, New challenging datasets for Causal Representation Learning

Author: Zhu, Jiageng, Xie, Hanchen, Wu, Jianhua, Li, Jiazhi, Khayatkhoei, Mahyar, Hussein, Mohamed E., and AbdAlmageed, Wael
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them.
Published: 2023

10. Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions

Author: Khayatkhoei, Mahyar and AbdAlmageed, Wael
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Precision and Recall are two prominent metrics of generative performance, which were proposed to separately measure the fidelity and diversity of generative models. Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions. Specifically, we empirically and theoretically show that as the number of dimensions grows, two model distributions with supports at equal point-wise distance from the support of the real distribution, can have vastly different Precision and Recall regardless of their respective distributions, hence an emergent asymmetry in high dimensions. Based on our theoretical insights, we then provide simple yet effective modifications to these metrics to construct symmetric metrics regardless of the number of dimensions. Finally, we provide experiments on real-world datasets to illustrate that the identified flaw is not merely a pathological case, and that our proposed metrics are effective in alleviating its impact., Comment: To appear in ICML 2023. Updated proof in Appendix B
Published: 2023

11. Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

Author: Zhang, Jiarui, Khayatkhoei, Mahyar, Chhikara, Prateek, and Ilievski, Filip
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions? Given the recent success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP model. We define three controlled subsets of the popular VQA-v2 benchmark to measure whether cropping can help model performance. Besides human cropping, we devise two automatic cropping strategies based on multi-modal embedding by CLIP and BLIP visual QA model gradients. Our experiments demonstrate that the performance of BLIP model variants can be significantly improved through human cropping, and automatic cropping methods can produce comparable benefits. A deeper dive into our findings indicates that the performance enhancement is more pronounced in zero-shot models than in fine-tuned models and more salient with smaller bounding boxes than larger ones. We perform case studies to connect quantitative differences with qualitative observations across question types and datasets. Finally, we see that the cropping enhancement is robust, as we gain an improvement of 4.59% (absolute) in the general VQA-random task by simply inputting a concatenation of the original and gradient-based cropped images. We make our code available to facilitate further innovation on visual cropping methods for question answering., Comment: 16 pages, 5 figures, 7 tables
Published: 2023

12. A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment

Author: Xie, Hanchen, Zhu, Jiageng, Khayatkhoei, Mahyar, Li, Jiazhi, Hussein, Mohamed E., and AbdAlmageed, Wael
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Dynamics prediction, which is the problem of predicting future states of scene objects based on current and prior states, is drawing increasing attention as an instance of learning physics. To solve this problem, Region Proposal Convolutional Interaction Network (RPCIN), a vision-based model, was proposed and achieved state-of-the-art performance in long-term prediction. RPCIN only takes raw images and simple object descriptions, such as the bounding box and segmentation mask of each object, as input. However, despite its success, the model's capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. The datasets cover two domains and two contexts. Using RPCIN as a probe, experiments conducted on the combinations of the proposed datasets reveal potential weaknesses of the vision-based long-term dynamics prediction model. Furthermore, we propose a promising direction to mitigate the Cross-Domain challenge and provide concrete evidence supporting such a direction, which provides dramatic alleviation of the challenge on the proposed datasets., Comment: 14 pages, 5 figures, 10 tables. Accepted to ICML 2023
Published: 2023

13. Spatial Frequency Bias in Convolutional Generative Adversarial Networks

Author: Khayatkhoei, Mahyar and Elgammal, Ahmed
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Statistics - Machine Learning
Abstract: As the success of Generative Adversarial Networks (GANs) on natural images quickly propels them into various real-life applications across different domains, it becomes more and more important to clearly understand their limitations. Specifically, understanding GANs' capability across the full spectrum of spatial frequencies, i.e. beyond the low-frequency dominant spectrum of natural images, is critical for assessing the reliability of GAN generated data in any detail-sensitive application (e.g. denoising, filling and super-resolution in medical and satellite images). In this paper, we show that the ability of convolutional GANs to learn a distribution is significantly affected by the spatial frequency of the underlying carrier signal, that is, GANs have a bias against learning high spatial frequencies. Crucially, we show that this bias is not merely a result of the scarcity of high frequencies in natural images, rather, it is a systemic bias hindering the learning of high frequencies regardless of their prominence in a dataset. Furthermore, we explain why large-scale GANs' ability to generate fine details on natural images does not exclude them from the adverse effects of this bias. Finally, we propose a method for manipulating this bias with minimal computational overhead. This method can be used to explicitly direct computational resources towards any specific spatial frequency of interest in a dataset, extending the flexibility of GANs.
Published: 2020

14. Disconnected Manifold Learning for Generative Adversarial Networks

Author: Khayatkhoei, Mahyar, Elgammal, Ahmed, and Singh, Maneesh
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Natural images may lie on a union of disjoint manifolds rather than one globally connected manifold, and this can cause several difficulties for the training of common Generative Adversarial Networks (GANs). In this work, we first show that single generator GANs are unable to correctly model a distribution supported on a disconnected manifold, and investigate how sample quality, mode dropping and local convergence are affected by this. Next, we show how using a collection of generators can address this problem, providing new insights into the success of such multi-generator GANs. Finally, we explain the serious issues caused by considering a fixed prior over the collection of generators and propose a novel approach for learning the prior and inferring the necessary number of generators without any supervision. Our proposed modifications can be applied on top of any other GAN model to enable learning of distributions supported on disconnected manifolds. We conduct several experiments to illustrate the aforementioned shortcoming of GANs, its consequences in practice, and the effectiveness of our proposed modifications in alleviating these issues., Comment: NeurIPS 2018
Published: 2018

15. Interactive Diversity Optimization of Environments

Author: Berseth, Glen, Khayatkhoei, Mahyar, Haworth, Brandon, Usman, Muhammad, Kapadia, Mubbasir, and Faloutsos, Petros
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Computational Engineering, Finance, and Science
Abstract: The design of a building requires an architect to balance a wide range of constraints: aesthetic, geometric, usability, lighting, safety, etc. At the same time, there are often a multiplicity of diverse designs that can meet these constraints equally well. Architects must use their skills and artistic vision to explore these rich but highly constrained design spaces. A number of computer-aided design tools use automation to provide useful analytical data and optimal designs with respect to certain fitness criteria. However, this automation can come at the expense of a designer's creative control. We propose uDOME, a user-in-the-loop system for computer-aided design exploration that balances automation and control by efficiently exploring, analyzing, and filtering the space of environment layouts to better inform an architect's decision-making. At each design iteration, uDOME provides a set of diverse designs which satisfy user-defined constraints and optimality criteria within a user defined parameterization of the design space. The user then selects a design and performs a similar optimization with the same or different parameters and objectives. This exploration process can be repeated as many times as the designer wishes. Our user studies indicates that \DOME, with its diversity-based approach, improves the efficiency and effectiveness of even novice users with minimal training, without compromising the quality of their designs., Comment: 20 pages
Published: 2018

16. Spatial Frequency Bias in Convolutional Generative Adversarial Networks

Author: Khayatkhoei, Mahyar and Elgammal, Ahmed
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Image and Video Processing (eess.IV), FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (stat.ML), General Medicine, Electrical Engineering and Systems Science - Image and Video Processing, Machine Learning (cs.LG)
Abstract: As the success of Generative Adversarial Networks (GANs) on natural images quickly propels them into various real-life applications across different domains, it becomes more and more important to clearly understand their limitations. Specifically, understanding GANs' capability across the full spectrum of spatial frequencies, i.e. beyond the low-frequency dominant spectrum of natural images, is critical for assessing the reliability of GAN generated data in any detail-sensitive application (e.g. denoising, filling and super-resolution in medical and satellite images). In this paper, we show that the ability of convolutional GANs to learn a distribution is significantly affected by the spatial frequency of the underlying carrier signal, that is, GANs have a bias against learning high spatial frequencies. Crucially, we show that this bias is not merely a result of the scarcity of high frequencies in natural images, rather, it is a systemic bias hindering the learning of high frequencies regardless of their prominence in a dataset. Furthermore, we explain why large-scale GANs' ability to generate fine details on natural images does not exclude them from the adverse effects of this bias. Finally, we propose a method for manipulating this bias with minimal computational overhead. This method can be used to explicitly direct computational resources towards any specific spatial frequency of interest in a dataset, extending the flexibility of GANs.
Published: 2022

17. Geometric and spectral limitations in generative adversarial networks

Author: Khayatkhoei, Mahyar
Abstract: Generative Adversarial Networks (GANs) have become one of the most successful and popular generative models in the recent years, with a wide variety of applications across the robust intelligence domains, such as image manipulation, text and audio synthesis, style transfer, and semi-supervised learning, to name a few. The main advantage of GANs over their classical counterparts stems from the use of Deep Neural Networks (DNNs), which can utilize the ongoing revolution in the availability of data and computation power to effectively discover complex patterns. Yet, with this exceptional power, comes an exceptional limitation: the black-box behavior associated with DNNs. This lack of understanding and clarity not only places the profound promise of GANs under a shadow of mistrust, but also greatly hinders any effort to increase their efficiency. As such, studying GANs' limitations and biases is perhaps as important, if not more important, as advancing their design and performance. The main focus of this dissertation is to study two fundamental limitations in GANs, namely a geometric and a spectral limitation. We investigate these limitations in depth, both empirically and theoretically, unveil their causes and consequences across different applications, and finally provide solutions to these limitations. We start by providing an introduction to density estimation and generative modeling in Chapter 1. In this chapter, we review different approaches to density estimation, highlight the advantages and disadvantages of each method, and discuss the issues that motivated the development of modern DNN-based generative models. The main goal of this chapter is to draw both a historic and a pragmatic line from classical density estimation methods to the modern GANs. Chapters 2 and 3 elaborate and extend on the results presented in Khayatkhoei et al. (2018). In Chapter 2, we expose and study the limitation of GANs in learning distributions with disconnected support. We first discuss why having a disconnected support is not a singular and pathological phenomenon, rather a common property of many real world data distributions. Then we theoretically and empirically illustrate the difficulties of GANs in learning such distributions, and its ramifications for the practitioner. In Chapter 3, we propose and evaluate an approach for dealing with the geometric limitation discussed in the previous chapter. This model is based on using an ensemble of generative DNNs, each of which will learn to focus on one connected component of the distribution's support. Moreover, a prior learning approach is proposed to address the problem of how to choose the ``best'' ensemble for a given distribution. The final GAN model, denoted DM-WGAN, trains end-to-end, can learn distributions supported on connected and disconnected manifolds, and infers the required number of members in the ensemble automatically without any explicit supervision. We conclude this chapter by reviewing several existing variants of GANs and their relation with the introduced geometric limitation and our proposed solution. Chapters 4 and 5 elaborate and extend on the results presented in Khayatkhoei and Elgammal (2020). In Chapter 4, we uncover another fundamental limitation in GANs: a spatial frequency bias. Specifically, we empirically and theoretically show that GANs' performance is not indifferent to the frequency of the underlying signal that carries a distribution. This chapter provides an insight into which datasets and domains are more prone to sub-optimal learning when GANs are used, and perhaps more importantly, what part of a signal is more likely to be missed by GANs. The findings are particularly crucial to the applications that use GANs to manipulate high resolution data, such as in medical and satellite imaging, or where GANs are used to augment or extrapolate data, such as in semi-supervised learning and simulation. In Chapter 5, we propose an efficient approach for matching the spatial bias of GANs to the known biases of a distribution. This approach, denoted Frequency Shifted Generators, utilizes the observation that the spatial frequency bias is not a fixed bias and can be efficiently translated to construct a generative DNN that is specifically targeted at a desired spatial frequency. We also show that it is possible to construct an ensemble of such shifted generators, each focusing on a specific frequency, to address the spatial frequency bias in a more general sense. Finally, in Chapter 6, we connect our separate discussions of the two fundamental limitations in the previous chapters, and discuss the broader impact of our findings on the bigger picture of distribution learning and generative modeling. We particularly comment on the open questions and directions of future research into the limitations of GANs, and more generally, of DNNs.
Published: 2021
Full Text: View/download PDF

18. Interactive Architectural Design with Diverse Solution Exploration

Author: Berseth, Glen, primary, Haworth, Brandon, additional, Usman, Muhammad, additional, Schaumann, Davide, additional, Khayatkhoei, Mahyar, additional, Kapadia, Mubbasir, additional, and Faloutsos, Petros, additional
Published: 2021
Full Text: View/download PDF

19. CODE: Crowd-optimized design of environments

Author: Haworth, Brandon, primary, Usman, Muhammad, additional, Berseth, Glen, additional, Khayatkhoei, Mahyar, additional, Kapadia, Mubbasir, additional, and Faloutsos, Petros, additional
Published: 2017
Full Text: View/download PDF

20. Towards Computer Assisted Crowd Aware Architectural Design

Author: Haworth, Brandon, primary, Usman, Muhammad, additional, Berseth, Glen, additional, Khayatkhoei, Mahyar, additional, Kapadia, Mubbasir, additional, and Faloutsos, Petros, additional
Published: 2016
Full Text: View/download PDF

21. Using synthetic crowds to inform building pillar placements

Author: Haworth, Brandon, primary, Usman, Muhammad, additional, Berseth, Glen, additional, Khayatkhoei, Mahyar, additional, Kapadia, Mubbasir, additional, and Faloutsos, Petros, additional
Published: 2016
Full Text: View/download PDF

22. A low-cost vision-based system for displacement analysis in earthquake research

Author: Binaee, Bijan, primary, Khayatkhoei, Mahyar, additional, Moradi, Hadi, additional, Nazemi, Amirhossein, additional, and Hosseini, Abdollah, additional
Published: 2015
Full Text: View/download PDF

23. A low-cost vision-based system for displacement analysis in earthquake research.

Author: Nazemi, Amirhossein, Hosseini, Abdollah, Binaee, Bijan, Khayatkhoei, Mahyar, and Moradi, Hadi
Published: 2015
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Khayatkhoei, Mahyar"'

1. Look, Learn and Leverage (L$^3$): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment

2. An Investigation on The Position Encoding in Vision-Based Dynamics Prediction

3. ManiFPT: Defining and Analyzing Fingerprints of Generative Models

4. Exploring Perceptual Limitation of Multimodal Large Language Models

5. Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies

6. SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

7. Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

8. Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks

9. Shadow Datasets, New challenging datasets for Causal Representation Learning

10. Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions

11. Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

12. A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment

13. Spatial Frequency Bias in Convolutional Generative Adversarial Networks

14. Disconnected Manifold Learning for Generative Adversarial Networks

15. Interactive Diversity Optimization of Environments

16. Spatial Frequency Bias in Convolutional Generative Adversarial Networks

17. Geometric and spectral limitations in generative adversarial networks

18. Interactive Architectural Design with Diverse Solution Exploration

19. CODE: Crowd-optimized design of environments

20. Towards Computer Assisted Crowd Aware Architectural Design

21. Using synthetic crowds to inform building pillar placements

22. A low-cost vision-based system for displacement analysis in earthquake research

23. A low-cost vision-based system for displacement analysis in earthquake research.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

23 results on '"Khayatkhoei, Mahyar"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources