Author: "Duan, Nan" / Topic: computer science - machine learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Duan, Nan"' showing total 18 results

Start Over Author "Duan, Nan" Topic computer science - machine learning

18 results on '"Duan, Nan"'

1. LongCoder: A Long-Range Pre-trained Language Model for Code Completion

Author: Guo, Daya, Xu, Canwen, Duan, Nan, Yin, Jian, and McAuley, Julian
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens - bridge tokens and memory tokens - to improve performance and efficiency. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction, while memory tokens are included to highlight important statements that may be invoked later and need to be memorized, such as package imports and definitions of classes, functions, or structures. We conduct experiments on a newly constructed dataset that contains longer code context and the publicly available CodeXGLUE benchmark. Experimental results demonstrate that LongCoder achieves superior performance on code completion tasks compared to previous models while maintaining comparable efficiency in terms of computational resources during inference. All the codes and data are available at https://github.com/microsoft/CodeBERT., Comment: ICML 2023
Published: 2023

2. ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Author: Xu, Xiao, Li, Bei, Wu, Chenfei, Tseng, Shao-Yen, Bhiwandiwalla, Anahita, Rosenman, Shachar, Lal, Vasudev, Che, Wanxiang, and Duan, Nan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream VL tasks. Although the most advanced work improves performance by building bridges between encoders, it suffers from ineffective layer-by-layer utilization of uni-modal representations and cannot flexibly exploit different levels of uni-modal semantic knowledge. In this work, we propose ManagerTower, a novel VL model architecture that gathers and combines the insights of pre-trained uni-modal experts at different levels. The managers introduced in each cross-modal layer can adaptively aggregate uni-modal semantic knowledge to facilitate more comprehensive cross-modal alignment and fusion. ManagerTower outperforms previous strong baselines both with and without Vision-Language Pre-training (VLP). With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79.15% accuracy on VQAv2 Test-Std, 86.56% IR@1 and 95.64% TR@1 on Flickr30K. Code and checkpoints are available at https://github.com/LooperXX/ManagerTower., Comment: Accepted by ACL 2023 Main Conference, Oral
Published: 2023

3. Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Author: Lin, Zhenghao, Gong, Yeyun, Shen, Yelong, Wu, Tong, Fan, Zhihao, Lin, Chen, Duan, Nan, and Chen, Weizhu
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE. GENIE is a large-scale pretrained diffusion language model that consists of an encoder and a diffusion-based decoder, which can generate text by gradually transforming a random noise sequence into a coherent text sequence. To pre-train GENIE on a large-scale language corpus, we design a new continuous paragraph denoise objective, which encourages the diffusion-decoder to reconstruct a clean text paragraph from a corrupted version, while preserving the semantic and syntactic coherence. We evaluate GENIE on four downstream text generation benchmarks, namely XSum, CNN/DailyMail, Gigaword, and CommonGen. Our experimental results show that GENIE achieves comparable performance with the state-of-the-art autoregressive models on these benchmarks, and generates more diverse text samples. The code and models of GENIE are available at https://github.com/microsoft/ProphetNet/tree/master/GENIE., Comment: Previous version title -> GENIE: Large Scale Pre-training for Text Generation with Diffusion Model
Published: 2022

4. CodeExp: Explanatory Code Document Generation

Author: Cui, Haotian, Wang, Chenglong, Huang, Junjie, Inala, Jeevana Priya, Mytkowicz, Todd, Wang, Bo, Gao, Jianfeng, and Duan, Nan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, I.2.2, I.2.7
Abstract: Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp., Comment: Accepted in Findings of EMNLP 2022
Published: 2022

5. BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

Author: Xu, Xiao, Wu, Chenfei, Rosenman, Shachar, Lal, Vasudev, Che, Wanxiang, and Duan, Nan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years. Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a deep cross-modal encoder, or feed the last-layer uni-modal representations from the deep pre-trained uni-modal encoders into the top cross-modal encoder. Both approaches potentially restrict vision-language representation learning and limit model performance. In this paper, we propose BridgeTower, which introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. This enables effective bottom-up cross-modal alignment and fusion between visual and textual representations of different semantic levels of pre-trained uni-modal encoders in the cross-modal encoder. Pre-trained with only 4M images, BridgeTower achieves state-of-the-art performance on various downstream vision-language tasks. In particular, on the VQAv2 test-std set, BridgeTower achieves an accuracy of 78.73%, outperforming the previous state-of-the-art model METER by 1.09% with the same pre-training data and almost negligible additional parameters and computational costs. Notably, when further scaling the model, BridgeTower achieves an accuracy of 81.15%, surpassing models that are pre-trained on orders-of-magnitude larger datasets. Code and checkpoints are available at https://github.com/microsoft/BridgeTower., Comment: Accepted by AAAI 2023, Oral
Published: 2022

6. VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

Author: Aflalo, Estelle, Du, Meng, Tseng, Shao-Yen, Liu, Yongfei, Wu, Chenfei, Duan, Nan, and Lal, Vasudev
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool., Comment: Best Demo Award at CVPR 2022
Published: 2022

7. LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Author: Xu, Canwen, Guo, Daya, Duan, Nan, and McAuley, Julian
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance., Comment: ACL 2022 (Findings)
Published: 2022

8. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

Author: Clement, Colin B., Lu, Shuai, Liu, Xiaoyu, Tufano, Michele, Drain, Dawn, Duan, Nan, Sundaresan, Neel, and Svyatkovskiy, Alexey
Subjects: Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context window of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any given task. While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window. Using concrete syntax trees of each source file we extract syntactic hierarchies and integrate them into context window by selectively removing from view more specific, less relevant scopes for a given task. We evaluate this approach on code generation tasks and joint translation of natural language and source code in Python programming language, achieving a new state-of-the-art in code completion and summarization for Python in the CodeXGLUE benchmark. We also introduce new CodeXGLUE benchmarks for user-experience-motivated tasks: code completion with normalized literals, method body completion/code summarization conditioned on file-level context., Comment: EMNLP 2021 camera ready
Published: 2021

9. Learning to Complete Code with Sketches

Author: Guo, Daya, Svyatkovskiy, Alexey, Yin, Jian, Duan, Nan, Brockschmidt, Marc, and Allamanis, Miltiadis
Subjects: Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: Code completion is usually cast as a language modelling problem, i.e., continuing an input in a left-to-right fashion. However, in practice, some parts of the completion (e.g., string literals) may be very hard to predict, whereas subsequent parts directly follow from the context. To handle this, we instead consider the scenario of generating code completions with "holes" inserted in places where a model is uncertain. We develop Grammformer, a Transformer-based model that guides code generation by the programming language grammar, and compare it to a variety of more standard sequence models. We train the models on code completion for C# and Python given partial code context. To evaluate models, we consider both ROUGE as well as a new metric RegexAcc that measures success of generating completions matching long outputs with as few holes as possible. In our experiments, Grammformer generates 10-50% more accurate completions compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques., Comment: Published in ICLR 2022
Published: 2021

10. FastSeq: Make Sequence Generation Faster

Author: Yan, Yu, Hu, Fei, Chen, Jiusheng, Bhendawade, Nikhil, Ye, Ting, Gong, Yeyun, Duan, Nan, Cui, Desheng, Chi, Bingyu, and Zhang, Ruofei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Transformer-based models have made tremendous impacts in natural language generation. However the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq., Comment: ACL 2021 Demo Track
Published: 2021

11. EL-Attention: Memory Efficient Lossless Attention for Generation

Author: Yan, Yu, Chen, Jiusheng, Qi, Weizhen, Bhendawade, Nikhil, Gong, Yeyun, Duan, Nan, and Zhang, Ruofei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memory-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, cache for them is not needed. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss., Comment: ICML 2021. Version 2: add pseudocode
Published: 2021

12. No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension

Author: Wang, Xuguang, Shou, Linjun, Gong, Ming, Duan, Nan, and Jiang, Daxin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The Natural Questions (NQ) benchmark set brings new challenges to Machine Reading Comprehension: the answers are not only at different levels of granularity (long and short), but also of richer types (including no-answer, yes/no, single-span and multi-span). In this paper, we target at this challenge and handle all answer types systematically. In particular, we propose a novel approach called Reflection Net which leverages a two-step training procedure to identify the no-answer and wrong-answer cases. Extensive experiments are conducted to verify the effectiveness of our approach. At the time of paper writing (May.~20,~2020), our approach achieved the top 1 on both long and short answer leaderboard, with F1 scores of 77.2 and 64.1, respectively., Comment: Accepted by Findings of EMNLP 2020
Published: 2020

13. XGPT: Cross-modal Generative Pre-Training for Image Captioning

Author: Xia, Qiaolin, Huang, Haoyang, Duan, Nan, Zhang, Dongdong, Ji, Lei, Sui, Zhifang, Cui, Edward, Bharti, Taroon, Liu, Xin, and Zhou, Ming
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text retrieval and VQA, they cannot be applied to generation tasks directly. In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation tasks, including Image-conditioned Masked Language Modeling (IMLM), Image-conditioned Denoising Autoencoding (IDA), and Text-conditioned Image Feature Generation (TIFG). As a result, the pre-trained XGPT can be fine-tuned without any task-specific architecture modifications to create state-of-the-art models for image captioning. Experiments show that XGPT obtains new state-of-the-art results on the benchmark datasets, including COCO Captions and Flickr30k Captions. We also use XGPT to generate new image captions as data augmentation for the image retrieval task and achieve significant improvement on all recall metrics., Comment: 12 pages, 3 figures, 7 tables
Published: 2020

14. UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Author: Luo, Huaishao, Ji, Lei, Shi, Botian, Huang, Haoyang, Duan, Nan, Li, Tianrui, Li, Jason, Bharti, Taroon, and Zhou, Ming
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: With the recent success of the pre-training technique for NLP and image-linguistic tasks, some video-linguistic pre-training works are gradually developed to improve video-text related downstream tasks. However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks. This paper proposes UniVL: a Unified Video and Language pre-training model for both multimodal understanding and generation. It comprises four components, including two single-modal encoders, a cross encoder, and a decoder with the Transformer backbone. Five objectives, including video-text joint, conditioned masked language model (CMLM), conditioned masked frame model (CMFM), video-text alignment, and language reconstruction, are designed to train each of the components. We further develop two pre-training strategies, stage by stage pre-training (StagedP) and enhanced video representation (EnhancedV), to make the training process of the UniVL more effective. The pre-train is carried out on a sizeable instructional video dataset HowTo100M. Experimental results demonstrate that the UniVL can learn strong video-text representation and achieves state-of-the-art results on five downstream tasks.
Published: 2020

15. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Author: Wang, Ruize, Tang, Duyu, Duan, Nan, Wei, Zhongyu, Huang, Xuanjing, ji, Jianshu, Cao, Guihong, Jiang, Daxin, and Zhou, Ming
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, the historically injected knowledge would be flushed away. To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. Taking RoBERTa as the backbone model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus multiple adapters can be efficiently trained in a distributed way. As a case study, we inject two kinds of knowledge in this work, including (1) factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata and (2) linguistic knowledge obtained via dependency parsing. Results on three knowledge-driven tasks, including relation classification, entity typing, and question answering, demonstrate that each adapter improves the performance and the combination of both adapters brings further improvements. Further analysis indicates that K-Adapter captures versatile knowledge than RoBERTa.
Published: 2020

16. A Tensorized Transformer for Language Modeling

Author: Ma, Xindian, Zhang, Peng, Zhang, Shuai, Duan, Nan, Hou, Yuexian, Song, Dawei, and Zhou, Ming
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition., Comment: Accepted by NeurIPS 2019
Published: 2019

17. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Author: Lu, Pan, Ji, Lei, Zhang, Wei, Duan, Nan, Zhou, Ming, and Wang, Jianyong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data consists of an image, a corresponding question, a correct answer and a supporting relation fact. A well-defined relation detector is then adopted to predict visual question-related relation facts. We further propose a multi-step attention model composed of visual attention and semantic attention sequentially to extract related visual knowledge and semantic knowledge. We conduct comprehensive experiments on the two benchmark datasets, demonstrating that our model achieves state-of-the-art performance and verifying the benefit of considering visual relation facts., Comment: 10 pages, 5 figures, accepted as an oral paper in SIGKDD 2018
Published: 2018
Full Text: View/download PDF

18. APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

Author: Sun, Jiashuo, Zhang, Hang, Lin, Chen, Gong, Yeyun, Guo, Jian, and Duan, Nan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art., 12 pages, 5 figures
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Duan, Nan"'

1. LongCoder: A Long-Range Pre-trained Language Model for Code Completion

2. ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

3. Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

4. CodeExp: Explanatory Code Document Generation

5. BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

6. VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

7. LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

8. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

9. Learning to Complete Code with Sketches

10. FastSeq: Make Sequence Generation Faster

11. EL-Attention: Memory Efficient Lossless Attention for Generation

12. No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension

13. XGPT: Cross-modal Generative Pre-Training for Image Captioning

14. UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

15. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

16. A Tensorized Transformer for Language Modeling

17. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

18. APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

18 results on '"Duan, Nan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources