Author: "Pouransari, Hadi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pouransari, Hadi"' showing total 39 results

Start Over Author "Pouransari, Hadi"

39 results on '"Pouransari, Hadi"'

1. Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Author: Fang, Ching, Sandino, Christopher, Mahasseni, Behrooz, Minxha, Juri, Pouransari, Hadi, Azemi, Erdrin, Moin, Ali, and Zippi, Ellen
Subjects: Computer Science - Machine Learning
Abstract: Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclear which pretraining strategies are most effective given the diversity of physiological signals. This is partly due to challenges in multimodal health data: obtaining data across many patients is difficult and costly, there is a lot of inter-subject variability, and modalities are often heterogeneously informative across downstream tasks. Here, we explore these challenges in the PhysioNet 2018 dataset. We use a masked autoencoding objective to pretrain a multimodal model. We show that the model learns representations that can be linearly probed for a diverse set of downstream tasks. We hypothesize that cross-modal reconstruction objectives are important for successful multimodal training, as they encourage the model to integrate information across modalities. We demonstrate that modality dropout in the input space improves performance across downstream tasks. We also find that late-fusion models pretrained with contrastive learning objectives are less effective across multiple tasks. Finally, we analyze the model's representations, showing that attention weights become more cross-modal and temporally aligned with our pretraining strategy. The learned embeddings also become more distributed in terms of the modalities encoded by each unit. Overall, our work demonstrates the utility of multimodal foundation models with health data, even across diverse physiological data sources. We further argue that explicit methods for inducing cross-modality may enhance multimodal pretraining strategies., Comment: NeurIPS 2024 AIM-FM Workshop
Published: 2024

2. Generalizable autoregressive modeling of time series through functional narratives

Author: Liu, Ran, Ma, Wenrui, Zippi, Ellen, Pouransari, Hadi, Xiao, Jingyun, Sandino, Chris, Mahasseni, Behrooz, Minxha, Juri, Azemi, Erdrin, Dyer, Eva L., and Moin, Ali
Subjects: Computer Science - Machine Learning
Abstract: Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of generated sequence, we train an autoregressive transformer that progressively recovers the original sample from the most simplified variant. Analogous to the next word prediction task in languages that learns narratives by connecting different words, our autoregressive transformer aims to learn the Narratives of Time Series (NoTS) by connecting different functions in time. Theoretically, we justify the construction of the alternative sequence through its advantages in approximating functions. When learning time series data with transformers, constructing sequences of temporal functions allows for a broader class of approximable functions (e.g., differentiation) compared to sequences of time periods, leading to a 26\% performance improvement in synthetic feature regression experiments. Experimentally, we validate NoTS in 3 different tasks across 22 real-world datasets, where we show that NoTS significantly outperforms other pre-training methods by up to 6\%. Additionally, combining NoTS on top of existing transformer architectures can consistently boost the performance. Our results demonstrate the potential of NoTS as a general-purpose dynamic learner, offering a viable alternative for developing foundation models for time series analysis.
Published: 2024

3. MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Author: Echterhoff, Jessica, Faghri, Fartash, Vemulapalli, Raviteja, Hu, Ting-Yao, Li, Chun-Liang, Tuzel, Oncel, and Pouransari, Hadi
Subjects: Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) are regularly updated to enhance performance, typically through changes in data or architecture. Within the update process, developers often prioritize improving overall performance metrics, paying less attention to maintaining compatibility with earlier model versions. Instance-level degradation (instance regression) of performance from one model version to the next can interfere with a user's mental model of the capabilities of a particular language model. Users having to adapt their mental model with every update can lead to dissatisfaction, especially when the new model has degraded compared to a prior version for a known use case (model update regression). We find that when pretrained LLM base models are updated, fine-tuned user-facing downstream task adapters experience negative flips -- previously correct instances are now predicted incorrectly. We observe model update regression between different model versions on a diverse set of tasks and models, even when the downstream task training procedures remain identical. We argue for the importance of maintaining model update compatibility during updates, and present evaluation metrics designed specifically for generative tasks, while also being applicable to discriminative tasks. We propose a training strategy to minimize the extent of instance regression in model updates, involving training of a compatibility adapter that can enhance task fine-tuned language models. We show negative flips reduce by up to 40% e.g. when updating Llama 1 to Llama 2 with our proposed method.
Published: 2024

4. DataComp-LM: In search of the next generation of training sets for language models

Author: Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, and Shankar, Vaishaal
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation., Comment: Project page: https://www.datacomp.ai/dclm/
Published: 2024

5. Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

Author: Pouransari, Hadi, Li, Chun-Liang, Chang, Jen-Hao Rick, Vasu, Pavan Kumar Anasosalu, Koc, Cem, Shankar, Vaishaal, and Tuzel, Oncel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal nor computationally efficient. Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. We decompose a dataset into a union of buckets, each containing sequences of the same size extracted from a unique document. During training, we use variable sequence length and batch size, sampling simultaneously from all buckets with a curriculum. In contrast to the concat-and-chunk baseline, which incurs a fixed attention cost at every step of training, our proposed method incurs a penalty proportional to the actual document lengths at each step, resulting in significant savings in training time. We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach. Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks, reaching target accuracy 3x faster compared to the baseline. Our method not only enables efficient pretraining on long sequences but also scales effectively with dataset size. Lastly, we shed light on a critical yet less studied aspect of training large language models: the distribution and curriculum of sequence lengths, which results in a non-negligible difference in performance.
Published: 2024

6. CLIP with Quality Captions: A Strong Pretraining for Vision Tasks

Author: Vasu, Pavan Kumar Anasosalu, Pouransari, Hadi, Faghri, Fartash, and Tuzel, Oncel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: CLIP models perform remarkably well on zero-shot classification and retrieval tasks. But recent studies have shown that learnt representations in CLIP are not well suited for dense prediction tasks like object detection, semantic segmentation or depth estimation. More recently, multi-stage training methods for CLIP models was introduced to mitigate the weak performance of CLIP on downstream tasks. In this work, we find that simply improving the quality of captions in image-text datasets improves the quality of CLIP's visual representations, resulting in significant improvement on downstream dense prediction vision tasks. In fact, we find that CLIP pretraining with good quality captions can surpass recent supervised, self-supervised and weakly supervised pretraining methods. We show that when CLIP model with ViT-B/16 as image encoder is trained on well aligned image-text pairs it obtains 12.1% higher mIoU and 11.5% lower RMSE on semantic segmentation and depth estimation tasks over recent state-of-the-art Masked Image Modeling (MIM) pretraining methods like Masked Autoencoder (MAE). We find that mobile architectures also benefit significantly from CLIP pretraining. A recent mobile vision architecture, MCi2, with CLIP pretraining obtains similar performance as Swin-L, pretrained on ImageNet-22k for semantic segmentation task while being 6.1$\times$ smaller. Moreover, we show that improving caption quality results in $10\times$ data efficiency when finetuning for dense prediction tasks.
Published: 2024

7. Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Author: Vemulapalli, Raviteja, Pouransari, Hadi, Faghri, Fartash, Mehta, Sachin, Farajtabar, Mehrdad, Rastegari, Mohammad, and Tuzel, Oncel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?", and propose a simple task-oriented knowledge transfer approach as a highly effective solution to this problem. Our experimental results on five target tasks show that the proposed approach outperforms task-agnostic VFM distillation, web-scale CLIP pretraining, supervised ImageNet pretraining, and self-supervised DINO pretraining by up to 11.6%, 22.1%, 13.7%, and 29.8%, respectively. Furthermore, the proposed approach also demonstrates up to 9x, 4x and 15x reduction in pretraining compute cost when compared to task-agnostic VFM distillation, ImageNet pretraining and DINO pretraining, respectively, while outperforming them. We also show that the dataset used for transferring knowledge has a significant effect on the final target task performance, and introduce a retrieval-augmented knowledge transfer strategy that uses web-scale image retrieval to curate effective transfer sets., Comment: International Conference on Machine Learning, 2024
Published: 2023

8. MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Author: Vasu, Pavan Kumar Anasosalu, Pouransari, Hadi, Faghri, Fartash, Vemulapalli, Raviteja, and Tuzel, Oncel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream tasks. However, these models utilize large transformer-based encoders with significant memory and latency overhead which pose challenges for deployment on mobile devices. In this work, we introduce MobileCLIP -- a new family of efficient image-text models optimized for runtime performance along with a novel and efficient training approach, namely multi-modal reinforced training. The proposed training approach leverages knowledge transfer from an image captioning model and an ensemble of strong CLIP encoders to improve the accuracy of efficient models. Our approach avoids train-time compute overhead by storing the additional knowledge in a reinforced dataset. MobileCLIP sets a new state-of-the-art latency-accuracy tradeoff for zero-shot classification and retrieval tasks on several datasets. Our MobileCLIP-S2 variant is 2.3$\times$ faster while more accurate compared to previous best CLIP model based on ViT-B/16. We further demonstrate the effectiveness of our multi-modal reinforced training by training a CLIP model based on ViT-B/16 image backbone and achieving +2.9% average performance improvement on 38 evaluation benchmarks compared to the previous best. Moreover, we show that the proposed approach achieves 10$\times$-1000$\times$ improved learning efficiency when compared with non-reinforced CLIP training. Code and models are available at https://github.com/apple/ml-mobileclip ., Comment: CVPR 2024
Published: 2023

9. TiC-CLIP: Continual Training of CLIP Models

Author: Garg, Saurabh, Farajtabar, Mehrdad, Pouransari, Hadi, Vemulapalli, Raviteja, Mehta, Sachin, Tuzel, Oncel, Shankar, Vaishaal, and Faghri, Fartash
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch. Code is available at https://github.com/apple/ml-tic-clip., Comment: ICLR 2024
Published: 2023

10. SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Author: Wang, Haoxiang, Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Vemulapalli, Raviteja, Farajtabar, Mehrdad, Mehta, Sachin, Rastegari, Mohammad, Tuzel, Oncel, and Pouransari, Hadi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that absorbs their expertise. Our method integrates techniques of multi-task learning, continual learning, and distillation. Further, it demands significantly less computational cost compared to traditional multi-task training from scratch, and it only needs a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we obtain SAM-CLIP: a unified model that combines the capabilities of SAM and CLIP into a single vision transformer. Compared with deploying SAM and CLIP independently, our merged model, SAM-CLIP, reduces storage and compute costs for inference, making it well-suited for edge device applications. We show that SAM-CLIP not only retains the foundational strengths of SAM and CLIP, but also introduces synergistic functionalities, notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.
Published: 2023

11. CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

Author: Salehi, Mohammadreza, Farajtabar, Mehrdad, Horton, Maxwell, Faghri, Fartash, Pouransari, Hadi, Vemulapalli, Raviteja, Tuzel, Oncel, Farhadi, Ali, Rastegari, Mohammad, and Mehta, Sachin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual representations? Towards this end, we leverage open-source task-specific vision models to generate pseudo-labels for an uncurated and noisy image-text dataset. Subsequently, we train CLIP models on these pseudo-labels in addition to the contrastive training on image and text pairs. This simple setup shows substantial improvements of up to 16.3% across different vision tasks, including segmentation, detection, depth estimation, and surface normal estimation. Importantly, these enhancements are achieved without compromising CLIP's existing capabilities, including its proficiency in promptable zero-shot classification.
Published: 2023

12. Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Author: Liu, Ran, Zippi, Ellen L., Pouransari, Hadi, Sandino, Chris, Nie, Jingping, Goh, Hanlin, Azemi, Erdrin, and Moin, Ali
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Signal Processing
Abstract: Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code is available at https://github.com/apple/ml-famae ., Comment: Extended version of ICLR 2024 Learning from Time Series for Health workshop
Published: 2023

13. Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

Author: Faghri, Fartash, Pouransari, Hadi, Mehta, Sachin, Farajtabar, Mehrdad, Farhadi, Ali, Rastegari, Mohammad, and Tuzel, Oncel
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the ImageNet validation set is also reduced by 9.9%. Using this backbone with Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny, we observe significant improvements on ImageNet-R/A/C of up to 20% improved robustness. Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy. The code, datasets, and pretrained models are available at https://github.com/apple/ml-dr., Comment: Accepted at International Conference on Computer Vision (ICCV) 2023. v2: Camera-ready version with new Tables 9 and 10. v3: Correction to Table 7-Avg. column
Published: 2023

14. FastFill: Efficient Compatible Model Update

Author: Jaeckle, Florian, Faghri, Fartash, Farhadi, Ali, Tuzel, Oncel, and Pouransari, Hadi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: In many retrieval systems the original high dimensional data (e.g., images) is mapped to a lower dimensional feature through a learned embedding model. The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features. When the embedding model is updated, it might produce features that are not comparable/compatible with features already in the gallery computed with the old model. Subsequently, all features in the gallery need to be re-computed using the new embedding model -- a computationally expensive process called backfilling. Recently, compatible representation learning methods have been proposed to avoid backfilling. Despite their relative success, there is an inherent trade-off between the new model performance and its compatibility with the old model. In this work, we introduce FastFill: a compatible model update process using feature alignment and policy based partial backfilling to promptly elevate retrieval performance. We show that previous backfilling strategies suffer from decreased performance and demonstrate the importance of both the training objective and the ordering in online partial backfilling. We propose a new training method for feature alignment between old and new embedding models using uncertainty estimation. Compared to previous works, we obtain significantly improved backfilling results on a variety of datasets: mAP on ImageNet (+4.4\%), Places-365 (+2.7\%), and VGG-Face2 (+1.3\%). Further, we demonstrate that when updating a biased model with FastFill, the minority subgroup accuracy gap promptly vanishes with a small fraction of partial backfilling., Comment: To appear in The Eleventh International Conference on Learning Representations
Published: 2023

15. APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

Author: Rosenfeld, Elan, Nakkiran, Preetum, Pouransari, Hadi, Tuzel, Oncel, and Faghri, Fartash
Subjects: Computer Science - Machine Learning
Abstract: Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets. In this work, we ask whether it is possible to achieve similar results with substantially less training time and data. We achieve this by taking advantage of existing pretrained unimodal encoders and careful curation of alignment data relevant to the downstream task of interest. We study a natural approach to aligning existing encoders via small auxiliary functions, and we find that this method is competitive with (or outperforms) state of the art in many settings while being less prone to overfitting, less costly to train, and more robust to distribution shift. With a properly chosen alignment distribution, our method surpasses prior state of the art for ImageNet zero-shot classification on public data while using two orders of magnitude less time and data and training 77% fewer parameters.
Published: 2022

16. Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Author: Ramanujan, Vivek, Vasu, Pavan Kumar Anasosalu, Farhadi, Ali, Tuzel, Oncel, and Pouransari, Hadi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In visual retrieval systems, updating the embedding model requires recomputing features for every piece of data. This expensive process is referred to as backfilling. Recently, the idea of backward compatible training (BCT) was proposed. To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model. However, BCT can significantly hinder the performance of the new model. In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT). In FCT, when the old model is trained, we also prepare for a future unknown version of the model. We propose learning side-information, an auxiliary feature for each sample which facilitates future updates of the model. To develop a powerful and flexible framework for model compatibility, we combine side-information with a forward transformation from old to new embeddings. Training of the new model is not modified, hence, its accuracy is not degraded. We demonstrate significant retrieval accuracy improvement compared to BCT for various datasets: ImageNet-1k (+18.1%), Places-365 (+5.4%), and VGG-Face2 (+8.3%). FCT obtains model compatibility when the new and old models are trained across different datasets, losses, and architectures., Comment: 14 pages with appendix. In proceedings at the conference on Computer Vision and Pattern Recognition 2022
Published: 2021

17. Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Author: Pouransari, Hadi, Javaheripi, Mojan, Sharma, Vinay, and Tuzel, Oncel
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 16% regression error reduction on the MPIIGaze dataset, +3.4% to +9.1% improvement in top-1 classification accuracy on the CIFAR100 dataset, and +2.9% top-1 improvement on the ImageNet dataset.
Published: 2020

18. Least squares binary quantization of neural networks

Author: Pouransari, Hadi, Tu, Zhucheng, and Tuzel, Oncel
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the quantization error. In this work, we focus on the binary quantization, in which values are mapped to -1 and 1. We provide a unified framework to analyze different scaling strategies. Inspired by the pareto-optimality of 2-bits versus 1-bit quantization, we introduce a novel 2-bits quantization with provably least squares error. Our quantization algorithms can be implemented efficiently on the hardware using bitwise operations. We present proofs to show that our proposed methods are optimal, and also provide empirical error analysis. We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.
Published: 2020

19. Democratizing Production-Scale Distributed Deep Learning

Author: Ma, Minghuang, Pouransari, Hadi, Chao, Daniel, Adya, Saurabh, Serrano, Santiago Akle, Qin, Yi, Gimnicher, Dan, and Walsh, Dominic
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved. One consequence is that the responsibility of orchestrating these complex components is often left to one-off scripts and glue code customized for specific problems. To address these restrictions, we introduce \emph{Alchemist} - an internal service built at Apple from the ground up for \emph{easy}, \emph{fast}, and \emph{scalable} distributed training. We discuss its design, implementation, and examples of running different flavors of distributed training. We also present case studies of its internal adoption in the development of autonomous systems, where training times have been reduced by 10x to keep up with the ever-growing data collection.
Published: 2018

20. A distributed-memory hierarchical solver for general sparse linear systems

Author: Chen, Chao, Pouransari, Hadi, Rajamanickam, Sivasankaran, Boman, Erik G., and Darve, Eric
Subjects: Mathematics - Numerical Analysis, Computer Science - Mathematical Software, Computer Science - Numerical Analysis, 65F50
Abstract: We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
Published: 2017

21. Particle-to-fluid heat transfer in particle-laden turbulence

Author: Pouransari, Hadi and Mani, Ali
Subjects: Physics - Fluid Dynamics
Abstract: Preferential concentration of inertial particles by turbulence is a well recognized phenomenon. This study investigates how this phenomenon impacts the mean heat transfer between the fluid phase and the particle phase. Using direct numerical simulations of homogeneous and isotropic turbulent flows coupled with Lagrangian point particle tracking, we explore this phenomenon over wide range of input parameters. Among the nine independent dimensionless numbers defining this problem, we show that particle Stokes number, defined based on large eddy time, and a new identified number called heat mixing parameter have the most significant effect on particle to gas heat transfer, while variation in other non-dimensional numbers can be ignored. An investigation of regimes with significant particle mass loading, suggests that the mean heat transfer from particles to gas is hardly affected by momentum two-way coupling. Using our numerical results we propose an algebraic reduced order model for heat transfer in particle-laden turbulence.
Published: 2017
Full Text: View/download PDF

22. Sparse Hierarchical Solvers with Guaranteed Convergence

Author: Yang, Kai, Pouransari, Hadi, and Darve, Eric
Subjects: Mathematics - Numerical Analysis
Abstract: Solving sparse linear systems from discretized PDEs is challenging. Direct solvers have in many cases quadratic complexity (depending on geometry), while iterative solvers require problem dependent preconditioners to be robust and efficient. Approximate factorization preconditioners, such as incomplete LU factorization, provide cheap approximations to the system matrix. However, even a highly accurate preconditioner may have deteriorating performance when the condition number of the system matrix increases. By increasing the accuracy on low-frequency errors, we propose a novel hierarchical solver with improved robustness with respect to the condition number of the linear system. This solver retains the linear computational cost and memory footprint of the original algorithm.
Published: 2016

23. Parallel variable-density particle-laden turbulence simulation

Author: Pouransari, Hadi, Mortazavi, Milad, and Mani, Ali
Subjects: Physics - Computational Physics, Physics - Fluid Dynamics
Abstract: We have developed a fully parallel C++/MPI based simulation code for variable-density particle-laden turbulent flows. The fluid is represented through a uniform Eulerian staggered grid, while particles are modeled using a Lagrangian point-particle framework. Spatial discretization is second-order accurate, and time integration has a fourth-order accuracy. Two-way coupling of the particles with the background flow is considered in both momentum and energy equations. The code is fully modular and abstracted, and easily can be extended or modified. We have considered two different boundary conditions. We have also developed a novel parallel linear solver for the variable density Poisson equation that arises in the calculation., Comment: In 2015, Annual Research Briefs, Center for Turbulence Research, Stanford University
Published: 2016

24. Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation

Author: Pouransari, Hadi, Coulier, Pieter, and Darve, Eric
Subjects: Mathematics - Numerical Analysis, Computer Science - Data Structures and Algorithms, Computer Science - Numerical Analysis
Abstract: Inversion of sparse matrices with standard direct solve schemes is robust, but computationally expensive. Iterative solvers, on the other hand, demonstrate better scalability; but, need to be used with an appropriate preconditioner (e.g., ILU, AMG, Gauss-Seidel, etc.) for proper convergence. The choice of an effective preconditioner is highly problem dependent. We propose a novel fully algebraic sparse matrix solve algorithm, which has linear complexity with the problem size. Our scheme is based on the Gauss elimination. For a given matrix, we approximate the LU factorization with a tunable accuracy determined a priori. This method can be used as a stand-alone direct solver with linear complexity and tunable accuracy, or it can be used as a black-box preconditioner in conjunction with iterative methods such as GMRES. The proposed solver is based on the low-rank approximation of fill-ins generated during the elimination. Similar to H-matrices, fill-ins corresponding to blocks that are well-separated in the adjacency graph are represented via a hierarchical structure. The linear complexity of the algorithm is guaranteed if the blocks corresponding to well-separated clusters of variables are numerically low-rank.
Published: 2015
Full Text: View/download PDF

25. Optimizing the adaptive fast multipole method for fractal sets

Author: Pouransari, Hadi and Darve, Eric
Subjects: Mathematics - Numerical Analysis, 28A80, 65F99, 70F10
Abstract: We have performed a detailed analysis of the fast multipole method (FMM) in the adaptive case, in which the depth of the FMM tree is non-uniform. Previous works in this area have focused mostly on special types of adaptive distributions, for example when points accumulate on a 2D manifold or accumulate around a few points in space. Instead, we considered a more general situation in which fractal sets, e.g., Cantor sets and generalizations, are used to create adaptive sets of points. Such sets are characterized by their dimension, a number between 0 and 3. We introduced a mathematical framework to define a converging sequence of octrees, and based on that, demonstrated how to increase $N \to \infty$. A new complexity analysis for the adaptive FMM is introduced. It is shown that the ${\cal{O}}(N)$ complexity is achievable for any distribution of particles, when a modified adaptive FMM is exploited. We analyzed how the FMM performs for fractal point distributions, and how optimal parameters can be picked, e.g., the criterion used to stop the subdivision of an FMM cell. A new subdividing double-threshold method is introduced, and better performance demonstrated. Parameters in the FMM are modeled as a function of particle distribution dimension, and the optimal values are obtained. A three dimensional kernel independent black box adaptive FMM is implemented and used for all calculations.
Published: 2015
Full Text: View/download PDF

26. The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems

Author: Coulier, Pieter, Pouransari, Hadi, and Darve, Eric
Subjects: Mathematics - Numerical Analysis, Computer Science - Numerical Analysis
Abstract: Although some preconditioners are available for solving dense linear systems, there are still many matrices for which preconditioners are lacking, in particular in cases where the size of the matrix $N$ becomes very large. There remains hence a great need to develop general purpose preconditioners whose cost scales well with the matrix size $N$. In this paper, we propose a preconditioner with broad applicability and with cost $\mathcal{O}(N)$ for dense matrices, when the matrix is given by a smooth kernel. Extending the method using the same framework to general $\mathcal{H}^2$-matrices is relatively straightforward. These preconditioners have a controlled accuracy (machine accuracy can be achieved if needed) and scale linearly with $N$. They are based on an approximate direct solve of the system. The linear scaling of the algorithm is achieved by means of two key ideas. First, the $\mathcal{H}^2$-structure of the dense matrix is exploited to obtain an extended sparse system of equations. Second, fill-ins arising when performing the elimination are compressed as low-rank matrices if they correspond to well-separated interactions. This ensures that the sparsity pattern of the extended sparse matrix is preserved throughout the elimination, hence resulting in a very efficient algorithm with $\mathcal{O}(N \log(1/\varepsilon)^2 )$ computational cost and $\mathcal{O}(N \log 1/\varepsilon )$ memory requirement, for an error tolerance $0 < \varepsilon < 1$. The solver is inexact, although the error can be controlled and made as small as needed. These solvers are related to ILU in the sense that the fill-in is controlled. However, in ILU, most of the fill-in is simply discarded whereas here it is approximated using low-rank blocks, with a prescribed tolerance. Numerical examples are discussed to demonstrate the linear scaling of the method and to illustrate its effectiveness as a preconditioner., Comment: Revised version Submitted to the SIAM Journal on Scientific Computing. 35 pages, 29 figures
Published: 2015

27. A distributed-memory hierarchical solver for general sparse linear systems

Author: Chen, Chao, Pouransari, Hadi, Rajamanickam, Sivasankaran, Boman, Erik G., and Darve, Eric
Published: 2018
Full Text: View/download PDF

28. Particle-laden flows forced by the disperse phase: Comparison between Lagrangian and Eulerian simulations

Author: Vié, Aymeric, Pouransari, Hadi, Zamansky, Rémi, and Mani, Ali
Published: 2016
Full Text: View/download PDF

29. Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Author: Ramanujan, Vivek, primary, Anasosalu Vasu, Pavan Kumar, additional, Farhadi, Ali, additional, Tuzel, Oncel, additional, and Pouransari, Hadi, additional
Published: 2022
Full Text: View/download PDF

30. Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Author: Pouransari, Hadi, primary, Javaheripi, Mojan, additional, Sharma, Vinay, additional, and Tuzel, Oncel, additional
Published: 2021
Full Text: View/download PDF

31. Least squares binary quantization of neural networks

Author: Pouransari, Hadi, primary, Tu, Zhucheng, additional, and Tuzel, Oncel, additional
Published: 2020
Full Text: View/download PDF

32. Sparse hierarchical solvers with guaranteed convergence

Author: Yang, Kai, primary, Pouransari, Hadi, additional, and Darve, Eric, additional
Published: 2019
Full Text: View/download PDF

33. Particle-to-fluid heat transfer in particle-laden turbulence

Author: Pouransari, Hadi, primary and Mani, Ali, additional
Published: 2018
Full Text: View/download PDF

34. The Inverse Fast Multipole Method: Using a Fast Approximate Direct Solver as a Preconditioner for Dense Linear Systems

Author: Coulier, Pieter, primary, Pouransari, Hadi, additional, and Darve, Eric, additional
Published: 2017
Full Text: View/download PDF

35. Effects of Preferential Concentration on Heat Transfer in Particle-Based Solar Receivers

Author: Pouransari, Hadi, primary and Mani, Ali, additional
Published: 2016
Full Text: View/download PDF

36. Intelligent rescuer robot for detecting victims accurately in natural disasters

Author: Pouransari, Alireza, primary, Pouransari, Hadi, additional, and Inallou, Mohammad Madadpour, additional
Published: 2015
Full Text: View/download PDF

37. Optimizing the Adaptive Fast Multipole Method for Fractal Sets

Author: Pouransari, Hadi, primary and Darve, Eric, additional
Published: 2015
Full Text: View/download PDF

38. FAST HIERARCHICAL SOLVERS FOR SPARSE MATRICES USING EXTENDED SPARSIFICATION AND LOW-RANK APPROXIMATION.

Author: POURANSARI, HADI, COULIER, PIETER, and DARVE, ERIC
Subjects: *SPARSE matrices, *LU factorization, *GENERALIZED minimal residual method
Abstract: Inversion of sparse matrices with standard direct solve schemes is robust but computationally expensive. Iterative solvers, on the other hand, demonstrate better scalability but need to be used with an appropriate preconditioner (e.g., ILU, AMG, Gauss-Seidel) for proper convergence. The choice of an effective preconditioner is highly problem dependent. We propose a novel fully algebraic sparse matrix solve algorithm. The computational complexity is linear under the assumption that fill-in blocks have bounded rank. Our scheme is based on the Gauss elimination. For a given matrix, we approximate the LU factorization with a tunable accuracy determined a priori. This method can be used as a stand-alone direct solver, or it can be used as a black-box preconditioner in conjunction with iterative methods such as GMRES. The proposed solver is based on the low-rank approximation of fill-ins generated during the elimination. Similar to H-matrices, fill-ins corresponding to blocks that are well-separated in the adjacency graph are represented via a hierarchical structure. The linear complexity of the algorithm is guaranteed if the blocks corresponding to well-separated clusters of variables are numerically low-rank. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

39. Effects of Preferential Concentration on Heat Transfer in Particle-Based Solar Receivers.

Author: Pouransari, Hadi and Mani, Ali
Subjects: *HEAT transfer, *SOLAR receivers, *SOLAR energy
Abstract: The working principle of particle-based solar receivers is to utilize the absorptivity of a dispersed particle phase in an otherwise optically transparent carrier fluid. In comparison to their traditional counterparts, which use a solid surface for radiation absorption, particle-based receivers offer a number of opportunities for improved efficiency and heat transfer uniformity. The physical phenomena at the core of such receivers involve coupling between particle transport, fluid turbulence, and radiative heat transfer. Previous analyses of particle-based solar receivers ignored delicate aspects associated with this three-way coupling. Namely, these investigations considered the flow fields only in the mean sense and ignored turbulent fluctuations and the consequent particle preferential concentration. In the present work, we have performed three-dimensional direct numerical simulations of turbulent flows coupled with radiative heating and particle transport over a range of particle Stokes numbers. Our study demonstrates that the particle preferential concentration has strong implications on the heat transfer statistics. We demonstrate that "for a typical setting" the preferential concentration of particles reduces the effective heat transfer between particles and the gas by as much as 25%. Therefore, we conclude that a regime with Stokes number of order unity is the least preferred for heat transfer to the carrier fluid. We also provide a 1D model to capture the effect of particle spatial distribution in heat transfer. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

39 results on '"Pouransari, Hadi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources