Author: "Zeng, Belinda" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zeng, Belinda"' showing total 32 results

Start Over Author "Zeng, Belinda"

32 results on '"Zeng, Belinda"'

1. Diffusion Models For Multi-Modal Generative Modeling

Author: Chen, Changyou, Ding, Han, Sisman, Bunyamin, Xu, Yi, Xie, Ouye, Yao, Benjamin Z., Tran, Son Dinh, and Zeng, Belinda
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space. We define the forward diffusion process to be driven by an information aggregation from multiple types of task-data, e.g., images for a generation task and labels for a classification task. In the reverse process, we enforce information sharing by parameterizing a shared backbone denoising network with additional modality-specific decoder heads. Such a structure can simultaneously learn to generate different types of multi-modal data with a multi-task loss, which is derived from a new multi-modal variational lower bound that generalizes the standard diffusion model. We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling. Extensive experimental results on ImageNet indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations., Comment: Published as a conference paper at ICLR 2024
Published: 2024

2. GraphStorm: all-in-one graph machine learning framework for industry applications

Author: Zheng, Da, Song, Xiang, Zhu, Qi, Zhang, Jian, Vasiloudis, Theodore, Ma, Runjie, Zhang, Houyu, Wang, Zichen, Adeshina, Soji, Nisa, Israt, Mottini, Alejandro, Cui, Qingjun, Rangwala, Huzefa, Zeng, Belinda, Faloutsos, Christos, and Karypis, George
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm.
Published: 2024

3. VidLA: Video-Language Alignment at Scale

Author: Rizve, Mamshad Nayeem, Fei, Fan, Unnikrishnan, Jayakrishnan, Tran, Son, Yao, Benjamin Z., Zeng, Belinda, Shah, Mubarak, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos. By employing a simple two-tower architecture, we are able to initialize our video-language model with pretrained image-text foundation models, thereby boosting the final performance. Second, existing video-language alignment works struggle due to the lack of semantically aligned large-scale training data. To overcome it, we leverage recent LLMs to curate the largest video-language dataset to date with better visual grounding. Furthermore, unlike existing video-text datasets which only contain short clips, our dataset is enriched with video clips of varying durations to aid our temporally hierarchical data tokens in extracting better representations at varying temporal scales. Overall, empirical results show that our proposed approach surpasses state-of-the-art methods on multiple retrieval benchmarks, especially on longer videos, and performs competitively on classification benchmarks., Comment: Accepted to CVPR 2024
Published: 2024

4. Robust Multi-Task Learning with Excess Risks

Author: He, Yifei, Zhou, Shiji, Zhang, Guojun, Yun, Hyokun, Xu, Yi, Zeng, Belinda, Chilimbi, Trishul, and Zhao, Han
Subjects: Computer Science - Machine Learning
Abstract: Multi-task learning (MTL) considers learning a joint model for multiple tasks by optimizing a convex combination of all task losses. To solve the optimization problem, existing methods use an adaptive weight updating scheme, where task weights are dynamically adjusted based on their respective losses to prioritize difficult tasks. However, these algorithms face a great challenge whenever label noise is present, in which case excessive weights tend to be assigned to noisy tasks that have relatively large Bayes optimal errors, thereby overshadowing other tasks and causing performance to drop across the board. To overcome this limitation, we propose Multi-Task Learning with Excess Risks (ExcessMTL), an excess risk-based task balancing method that updates the task weights by their distances to convergence instead. Intuitively, ExcessMTL assigns higher weights to worse-trained tasks that are further from convergence. To estimate the excess risks, we develop an efficient and accurate method with Taylor approximation. Theoretically, we show that our proposed algorithm achieves convergence guarantees and Pareto stationarity. Empirically, we evaluate our algorithm on various MTL benchmarks and demonstrate its superior performance over existing methods in the presence of label noise. Our code is available at https://github.com/yifei-he/ExcessMTL., Comment: ICML 2024 camera-ready version
Published: 2024

5. Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Author: Xing, Yue, Lin, Xiaofeng, Song, Qifan, Xu, Yi, Zeng, Belinda, and Cheng, Guang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks., Comment: To appear in AISTATS2024
Published: 2024

6. ForeSeer: Product Aspect Forecasting Using Temporal Graph Embedding

Author: Liu, Zixuan, Hiranandani, Gaurush, Qian, Kun, Huang, Eddie W., Xu, Yi, Zeng, Belinda, Subbian, Karthik, and Wang, Sheng
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Developing text mining approaches to mine aspects from customer reviews has been well-studied due to its importance in understanding customer needs and product attributes. In contrast, it remains unclear how to predict the future emerging aspects of a new product that currently has little review information. This task, which we named product aspect forecasting, is critical for recommending new products, but also challenging because of the missing reviews. Here, we propose ForeSeer, a novel textual mining and product embedding approach progressively trained on temporal product graphs for this novel product aspect forecasting task. ForeSeer transfers reviews from similar products on a large product graph and exploits these reviews to predict aspects that might emerge in future reviews. A key novelty of our method is to jointly provide review, product, and aspect embeddings that are both time-sensitive and less affected by extremely imbalanced aspect frequencies. We evaluated ForeSeer on a real-world product review system containing 11,536,382 reviews and 11,000 products over 3 years. We observe that ForeSeer substantially outperformed existing approaches with at least 49.1\% AUPRC improvement under the real setting where aspect associations are not given. ForeSeer further improves future link prediction on the product graph and the review aspect association prediction. Collectively, Foreseer offers a novel framework for review forecasting by effectively integrating review text, product network, and temporal information, opening up new avenues for online shopping recommendation and e-commerce applications.
Published: 2023
Full Text: View/download PDF

7. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

Author: Xie, Han, Zheng, Da, Ma, Jun, Zhang, Houyu, Ioannidis, Vassilis N., Song, Xiang, Ping, Qing, Wang, Sheng, Yang, Carl, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GALM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We conduct extensive experiments on Amazon's real internal datasets and large public datasets. Comprehensive empirical results and in-depth analysis demonstrate the effectiveness of our proposed methods along with lessons learned., Comment: To be published in the KDD 2023 proceedings as a full paper
Published: 2023

8. Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Author: Jiang, Qian, Chen, Changyou, Zhao, Han, Chen, Liqun, Ping, Qing, Tran, Son Dinh, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly match each other in the latent space. Yet it remains an open question how the modality alignment affects the downstream task performance. In this paper, based on an information-theoretic argument, we first prove that exact modality alignment is sub-optimal in general for downstream prediction tasks. Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment. To this end, we propose three general approaches to construct latent modality structures. Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization. Extensive experiments are conducted on two popular multi-modal representation learning frameworks: the CLIP-based two-tower model and the ALBEF-based fusion model. We test our model on a variety of tasks including zero/few-shot image classification, image-text retrieval, visual question answering, visual reasoning, and visual entailment. Our method achieves consistent improvements over existing methods, demonstrating the effectiveness and generalizability of our proposed approach on latent modality structure regularization., Comment: 14 pages, 8 figure, CVPR 2023 accepted
Published: 2023

9. Efficient and effective training of language and graph neural network models

Author: Ioannidis, Vassilis N., Song, Xiang, Zheng, Da, Zhang, Houyu, Ma, Jun, Xu, Yi, Zeng, Belinda, Chilimbi, Trishul, and Karypis, George
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Can we combine heterogenous graph structure with text to learn high-quality semantic and behavioural representations? Graph neural networks (GNN)s encode numerical node attributes and graph structure to achieve impressive performance in a variety of supervised learning tasks. Current GNN approaches are challenged by textual features, which typically need to be encoded to a numerical vector before provided to the GNN that may incur some information loss. In this paper, we put forth an efficient and effective framework termed language model GNN (LM-GNN) to jointly train large-scale language models and graph neural networks. The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model. Several system and design optimizations are proposed to enable scalable and efficient training. LM-GNN accommodates node and edge classification as well as link prediction tasks. We evaluate the LM-GNN framework in different datasets performance and showcase the effectiveness of the proposed approach. LM-GNN provides competitive results in an Amazon query-purchase-product application.
Published: 2022

10. DynaMaR: Dynamic Prompt with Mask Token Representation

Author: Sun, Xiaodi, Rajagopalan, Sunny, Nigam, Priyanka, Lu, Weiyi, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific head; the model is then fine-tuned end-to-end. However, with the emergence of models like GPT-3, prompt-based fine-tuning has been proven to be a successful approach for few-shot tasks. Inspired by this work, we study discrete prompt technologies in practice. There are two issues that arise with the standard prompt approach. First, it can overfit on the prompt template. Second, it requires manual effort to formulate the downstream task as a language model problem. In this paper, we propose an improvement to prompt-based fine-tuning that addresses these two issues. We refer to our approach as DynaMaR -- Dynamic Prompt with Mask Token Representation. Results show that DynaMaR can achieve an average improvement of 10% in few-shot settings and improvement of 3.7% in data-rich settings over the standard fine-tuning approach on four e-commerce applications.
Published: 2022

11. Multi-modal Alignment using Representation Codebook

Author: Duan, Jiali, Chen, Liqun, Tran, Son, Yang, Jinyu, Xu, Yi, Zeng, Belinda, and Chilimbi, Trishul
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion. Since image and text typically reside in different regions of the feature space, directly aligning them at instance level is challenging especially when features are still evolving during training. In this paper, we propose to align at a higher and more stable level using cluster representation. Specifically, we treat image and text as two "views" of the same entity, and encode them into a joint vision-language coding space spanned by a dictionary of cluster centers (codebook). We contrast positive and negative samples via their cluster assignments while simultaneously optimizing the cluster centers. To further smooth out the learning process, we adopt a teacher-student distillation paradigm, where the momentum teacher of one view guides the student learning of the other. We evaluated our approach on common vision language benchmarks and obtain new SoTA on zero-shot cross modality retrieval while being competitive on various other transfer tasks., Comment: Accepted by CVPR 2022
Published: 2022

12. Vision-Language Pre-Training with Triple Contrastive Learning

Author: Yang, Jinyu, Duan, Jiali, Tran, Son, Xu, Yi, Chanda, Sampath, Chen, Liqun, Zeng, Belinda, Chilimbi, Trishul, and Huang, Junzhou
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-language representation learning largely benefits from image-text alignment through contrastive losses (e.g., InfoNCE loss). The success of this alignment strategy is attributed to its capability in maximizing the mutual information (MI) between an image and its matched text. However, simply performing cross-modal alignment (CMA) ignores data potential within each modality, which may result in degraded representations. For instance, although CMA-based models are able to map image-text pairs close together in the embedding space, they fail to ensure that similar inputs from the same modality stay close by. This problem can get even worse when the pre-training data is noisy. In this paper, we propose triple contrastive learning (TCL) for vision-language pre-training by leveraging both cross-modal and intra-modal self-supervision. Besides CMA, TCL introduces an intra-modal contrastive objective to provide complementary benefits in representation learning. To take advantage of localized and structural information from image and text input, TCL further maximizes the average MI between local regions of image/text and their global summary. To the best of our knowledge, ours is the first work that takes into account local structure information for multi-modality representation learning. Experimental evaluations show that our approach is competitive and achieves the new state of the art on various common down-stream vision-language tasks such as image-text retrieval and visual question answering., Comment: CVPR 2022; code: https://github.com/uta-smile/TCL
Published: 2022

13. Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

Author: He, Xuanli, Keivanloo, Iman, Xu, Yi, He, Xiang, Zeng, Belinda, Rajagopalan, Santosh, and Chilimbi, Trishul
Subjects: Computer Science - Computation and Language
Abstract: Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT. The former manages to save the computation via removing non-salient tokens, while the latter can fulfill the computation reduction by terminating the inference early before reaching the final layer, if the exiting condition is met. Our empirical studies demonstrate that compared to previous state of arts, MP is not only able to achieve a speed-adjustable inference but also to surpass token pruning and early exiting by reducing up to 70% giga floating point operations (GFLOPs) with less than 0.5% accuracy drop. Token pruning and early exiting express distinctive preferences to sequences with different lengths. However, MP is capable of achieving an average of 8.06x speedup on two popular text classification tasks, regardless of the sizes of the inputs., Comment: 8 pages
Published: 2021

14. MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

Author: Arici, Tarik, Seyfioglu, Mehmet Saygin, Neiman, Tal, Xu, Yi, Train, Son, Chilimbi, Trishul, Zeng, Belinda, and Tutar, Ismail
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Vision-and-Language Pre-training (VLP) improves model performance for downstream tasks that require image and text inputs. Current VLP approaches differ on (i) model architecture (especially image embedders), (ii) loss functions, and (iii) masking policies. Image embedders are either deep models like ResNet or linear projections that directly feed image-pixels into the transformer. Typically, in addition to the Masked Language Modeling (MLM) loss, alignment-based objectives are used for cross-modality interaction, and RoI feature regression and classification tasks for Masked Image-Region Modeling (MIRM). Both alignment and MIRM objectives mostly do not have ground truth. Alignment-based objectives require pairings of image and text and heuristic objective functions. MIRM relies on object detectors. Masking policies either do not take advantage of multi-modality or are strictly coupled with alignments generated by other models. In this paper, we present Masked Language and Image Modeling (MLIM) for VLP. MLIM uses two loss functions: Masked Language Modeling (MLM) loss and image reconstruction (RECON) loss. We propose Modality Aware Masking (MAM) to boost cross-modality interaction and take advantage of MLM and RECON losses that separately capture text and image reconstruction quality. Using MLM + RECON tasks coupled with MAM, we present a simplified VLP methodology and show that it has better downstream task performance on a proprietary e-commerce multi-modal dataset.
Published: 2021

15. Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

Author: Chen, Junya, Gan, Zhe, Li, Xuan, Guo, Qing, Chen, Liqun, Gao, Shuyang, Chung, Tagyoung, Xu, Yi, Zeng, Belinda, Lu, Wenlian, Li, Fan, Carin, Lawrence, and Tao, Chenyang
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years. However, these contrastive schemes are notoriously resource demanding, as their effectiveness breaks down with small-batch training (i.e., the log-K curse, whereas K is the batch-size). In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue. Unlike InfoNCE, our FlatNCE no longer explicitly appeals to a discriminative classification goal for contrastive learning. Theoretically, we show FlatNCE is the mathematical dual formulation of InfoNCE, thus bridging the classical literature on energy modeling; and empirically, we demonstrate that, with minimal modification of code, FlatNCE enables immediate performance boost independent of the subject-matter engineering efforts. The significance of this work is furthered by the powerful generalization of contrastive learning techniques, and the introduction of new tools to monitor and diagnose contrastive training. We substantiate our claims with empirical evidence on CIFAR10, ImageNet, and other datasets, where FlatNCE consistently outperforms InfoNCE.
Published: 2021

16. Web-Scale Semantic Product Search with Large Language Models

Author: Muhamed, Aashiq, Srinivasan, Sriram, Teo, Choon-Hui, Cui, Qingjun, Zeng, Belinda, Chilimbi, Trishul, Vishwanathan, S. V. N., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Kashima, Hisashi, editor, Ide, Tsuyoshi, editor, and Peng, Wen-Chih, editor
Published: 2023
Full Text: View/download PDF

17. Web-Scale Semantic Product Search with Large Language Models

Author: Muhamed, Aashiq, primary, Srinivasan, Sriram, additional, Teo, Choon-Hui, additional, Cui, Qingjun, additional, Zeng, Belinda, additional, Chilimbi, Trishul, additional, and Vishwanathan, S. V. N., additional
Published: 2023
Full Text: View/download PDF

18. SST: Semantic and Structural Transformers for Hierarchy-aware Language Models in E-commerce

Author: Samel, Karan, primary, Zhang, Houyu, additional, Ma, Jun, additional, Jiang, Haoming, additional, Ping, Qing, additional, Wang, Sheng, additional, Xu, Yi, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2023
Full Text: View/download PDF

19. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

Author: Xie, Han, primary, Zheng, Da, additional, Ma, Jun, additional, Zhang, Houyu, additional, Ioannidis, Vassilis N., additional, Song, Xiang, additional, Ping, Qing, additional, Wang, Sheng, additional, Yang, Carl, additional, Xu, Yi, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2023
Full Text: View/download PDF

20. Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning

Author: Jiang, Qian, primary, Chen, Changyou, additional, Zhao, Han, additional, Chen, Liqun, additional, Ping, Qing, additional, Tran, Son Dinh, additional, Xu, Yi, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2023
Full Text: View/download PDF

21. OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding

Author: Shi, Zhan, primary, Wang, Guoyin, additional, Bai, Ke, additional, Li, Jiwei, additional, Li, Xiang, additional, Cui, Qingjun, additional, Zeng, Belinda, additional, Chilimbi, Trishul, additional, and Zhu, Xiaodan, additional
Published: 2023
Full Text: View/download PDF

22. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

Author: Zhang, Jianyi, primary, Muhamed, Aashiq, additional, Anantharaman, Aditya, additional, Wang, Guoyin, additional, Chen, Changyou, additional, Zhong, Kai, additional, Cui, Qingjun, additional, Xu, Yi, additional, Zeng, Belinda, additional, Chilimbi, Trishul, additional, and Chen, Yiran, additional
Published: 2023
Full Text: View/download PDF

23. Multi-modal Alignment using Representation Codebook

Author: Duan, Jiali, primary, Chen, Liqun, additional, Tran, Son, additional, Yang, Jinyu, additional, Xu, Yi, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2022
Full Text: View/download PDF

24. Vision-Language Pre-Training with Triple Contrastive Learning

Author: Yang, Jinyu, primary, Duan, Jiali, additional, Tran, Son, additional, Xu, Yi, additional, Chanda, Sampath, additional, Chen, Liqun, additional, Zeng, Belinda, additional, Chilimbi, Trishul, additional, and Huang, Junzhou, additional
Published: 2022
Full Text: View/download PDF

25. DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction

Author: Muhamed, Aashiq, primary, Singh, Jaspreet, additional, Zheng, Shuai, additional, Keivanloo, Iman, additional, Perera, Sujan, additional, Mracek, James, additional, Xu, Yi, additional, Cui, Qingjun, additional, Rajagopalan, Santosh, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2022
Full Text: View/download PDF

26. MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

Author: Arici, Tarik, primary, Seyfioglu, Mehmet Saygin, additional, Neiman, Tal, additional, Xu, Yi, additional, Tran, Son, additional, Chilimbi, Trishul, additional, Zeng, Belinda, additional, and Tutar, Ismail, additional
Published: 2022
Full Text: View/download PDF

27. DynaMaR: Dynamic Prompt with Mask Token Representation

Author: Sun, Xiaodi, primary, Rajagopalan, Sunny, additional, Nigam, Priyanka, additional, Lu, Weiyi, additional, Xu, Yi, additional, Keivanloo, Iman, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2022
Full Text: View/download PDF

28. Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks

Author: Lu, Weiyi, primary, Rajagopalan, Sunny, additional, Nigam, Priyanka, additional, Singh, Jaspreet, additional, Sun, Xiaodi, additional, Xu, Yi, additional, Zeng, Belinda, additional, and Chilimbi, Trishul, additional
Published: 2022
Full Text: View/download PDF

29. Top-Down Attention in End-to-End Spoken Language Understanding

Author: Chen, Yixin, primary, Lu, Weiyi, additional, Mottini, Alejandro, additional, Li, Li Erran, additional, Droppo, Jasha, additional, Du, Zheng, additional, and Zeng, Belinda, additional
Published: 2021
Full Text: View/download PDF

30. Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA

Author: Ding, Han, primary, Li, Li Erran, additional, Hu, Zhiting, additional, Xu, Yi, additional, Hakkani-Tur, Dilek, additional, Du, Zheng, additional, and Zeng, Belinda, additional
Published: 2021
Full Text: View/download PDF

31. CAM: Uninteresting Speech Detector

Author: Lu, Weiyi, primary, Xu, Yi, additional, Yang, Peng, additional, and Zeng, Belinda, additional
Published: 2020
Full Text: View/download PDF

32. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Author: Xie H, Ioannidis VN, Yang C, Zheng D, Song X, Xu Y, Ma J, Ping Q, Zeng B, Zhang H, Wang S, and Chilimbi T
Abstract: Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GaLM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We conduct extensive experiments on Amazon's real internal datasets and large public datasets. Comprehensive empirical results and in-depth analysis demonstrate the effectiveness of our proposed methods along with lessons learned.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

32 results on '"Zeng, Belinda"'

1. Diffusion Models For Multi-Modal Generative Modeling

2. GraphStorm: all-in-one graph machine learning framework for industry applications

3. VidLA: Video-Language Alignment at Scale

4. Robust Multi-Task Learning with Excess Risks

5. Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

6. ForeSeer: Product Aspect Forecasting Using Temporal Graph Embedding

7. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

8. Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

9. Efficient and effective training of language and graph neural network models

10. DynaMaR: Dynamic Prompt with Mask Token Representation

11. Multi-modal Alignment using Representation Codebook

12. Vision-Language Pre-Training with Triple Contrastive Learning

13. Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

14. MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

15. Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

16. Web-Scale Semantic Product Search with Large Language Models

17. Web-Scale Semantic Product Search with Large Language Models

18. SST: Semantic and Structural Transformers for Hierarchy-aware Language Models in E-commerce

19. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

20. Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning

21. OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding

22. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

23. Multi-modal Alignment using Representation Codebook

24. Vision-Language Pre-Training with Triple Contrastive Learning

25. DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction

26. MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

27. DynaMaR: Dynamic Prompt with Mask Token Representation

28. Asynchronous Convergence in Multi-Task Learning via Knowledge Distillation from Converged Tasks

29. Top-Down Attention in End-to-End Spoken Language Understanding

30. Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA

31. CAM: Uninteresting Speech Detector

32. Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

32 results on '"Zeng, Belinda"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources