Author: "Huang, Yongfeng" / Topic: computation and language (cs.cl) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huang, Yongfeng"' showing total 14 results

Start Over Author "Huang, Yongfeng" Topic computation and language (cs.cl)

14 results on '"Huang, Yongfeng"'

1. Solving Math Word Problems via Cooperative Reasoning induced Language Models

Author: Zhu, Xinyu, Wang, Junjie, Zhang, Lin, Zhang, Yuxiang, Huang, Yongfeng, Gan, Ruyi, Zhang, Jiaxing, and Yang, Yujiu
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Large-scale pre-trained language models (PLMs) bring new opportunities to challenging problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.6% increase over best baselines. Our codes are available at https://github.com/TianHongZXY/CoRe, Accepted to ACL 2023 Main Conference; Camera Ready
Published: 2022

2. Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Author: Zhang, Jiaxing, Gan, Ruyi, Wang, Junjie, Zhang, Yuxiang, Zhang, Lin, Yang, Ping, Gao, Xinyu, Wu, Ziwei, Dong, Xiaoqun, He, Junqing, Zhuo, Jianheng, Yang, Qi, Huang, Yongfeng, Li, Xiayu, Wu, Yanghan, Lu, Junyu, Zhu, Xinyu, Chen, Weifeng, Han, Ting, Pan, Kunhao, Wang, Rui, Wang, Hao, Wu, Xiaojun, Zeng, Zhongshen, and Chen, Chongpei
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Nowadays, foundation models become one of fundamental infrastructures in artificial intelligence, paving ways to the general intelligence. However, the reality presents two urgent challenges: existing foundation models are dominated by the English-language community; users are often given limited resources and thus cannot always use foundation models. To support the development of the Chinese-language community, we introduce an open-source project, called Fengshenbang, which leads by the research center for Cognitive Computing and Natural Language (CCNL). Our project has comprehensive capabilities, including large pre-trained models, user-friendly APIs, benchmarks, datasets, and others. We wrap all these in three sub-projects: the Fengshenbang Model, the Fengshen Framework, and the Fengshen Benchmark. An open-source roadmap, Fengshenbang, aims to re-evaluate the open-source community of Chinese pre-trained large-scale models, prompting the development of the entire Chinese large-scale model community. We also want to build a user-centered open-source ecosystem to allow individuals to access the desired models to match their computing resources. Furthermore, we invite companies, colleges, and research institutions to collaborate with us to build the large-scale open-source model-based ecosystem. We hope that this project will be the foundation of Chinese cognitive intelligence., Added the Chinese version and is now a bilingual paper
Published: 2022

3. Unified and Effective Ensemble Knowledge Distillation

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, and Huang, Yongfeng
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Computer Science - Computation and Language, ComputingMilieux_COMPUTERSANDEDUCATION, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Ensemble knowledge distillation can extract knowledge from multiple teacher models and encode it into a single student model. Many existing methods learn and distill the student model on labeled data only. However, the teacher models are usually learned on the same labeled data, and their predictions have high correlations with groudtruth labels. Thus, they cannot provide sufficient knowledge complementary to task labels for student teaching. Distilling on unseen unlabeled data has the potential to enhance the knowledge transfer from the teachers to the student. In this paper, we propose a unified and effective ensemble knowledge distillation method that distills a single student model from an ensemble of teacher models on both labeled and unlabeled data. Since different teachers may have diverse prediction correctness on the same sample, on labeled data we weight the predictions of different teachers according to their correctness. In addition, we weight the distillation loss based on the overall prediction correctness of the teacher ensemble to distill high-quality knowledge. On unlabeled data, there is no groundtruth to evaluate prediction correctness. Fortunately, the disagreement among teachers is an indication of sample hardness, and thereby we weight the distillation loss based on teachers' disagreement to emphasize knowledge distillation on important samples. Extensive experiments on four datasets show the effectiveness of our proposed ensemble distillation method.
Published: 2022

4. NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, Huang, Yongfeng, and Xie, Xing
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Effectively finetuning pretrained language models (PLMs) is critical for their success in downstream tasks. However, PLMs may have risks in overfitting the pretraining tasks and data, which usually have gap with the target downstream tasks. Such gap may be difficult for existing PLM finetuning methods to overcome and lead to suboptimal performance. In this paper, we propose a very simple yet effective method named NoisyTune to help better finetune PLMs on downstream tasks by adding some noise to the parameters of PLMs before fine-tuning. More specifically, we propose a matrix-wise perturbing method which adds different uniform noises to different parameter matrices based on their standard deviations. In this way, the varied characteristics of different types of parameters in PLMs can be considered. Extensive experiments on both GLUE English benchmark and XTREME multilingual benchmark show NoisyTune can consistently empower the finetuning of different PLMs on different downstream tasks., ACL 2022
Published: 2022

5. AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling

Author: Tu, Haoqin, Yang, Zhongliang, Yang, Jinshuai, and Huang, Yongfeng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Variational Auto-Encoder (VAE) has become the de-facto learning paradigm in achieving representation learning and generation for natural language at the same time. Nevertheless, existing VAE-based language models either employ elementary RNNs, which is not powerful to handle complex works in the multi-task situation, or fine-tunes two pre-trained language models (PLMs) for any downstream task, which is a huge drain on resources. In this paper, we propose the first VAE framework empowered with adaptive GPT-2s (AdaVAE). Different from existing systems, we unify both the encoder\&decoder of the VAE model using GPT-2s with adaptive parameter-efficient components, and further introduce Latent Attention operation to better construct latent space from transformer models. Experiments from multiple dimensions validate that AdaVAE is competent to effectively organize language in three related tasks (language modeling, representation modeling and guided text generation) even with less than $15\%$ activated parameters in training. Our code is available at \url{https://github.com/ImKeTT/AdaVAE}.
Published: 2022
Full Text: View/download PDF

6. Quality-aware News Recommendation

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, and Huang, Yongfeng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, InformationSystems_MISCELLANEOUS, Computation and Language (cs.CL), Information Retrieval (cs.IR), Computer Science - Information Retrieval
Abstract: News recommendation is a core technique used by many online news platforms. Recommending high-quality news to users is important for keeping good user experiences and news platforms' reputations. However, existing news recommendation methods mainly aim to optimize news clicks while ignoring the quality of news they recommended, which may lead to recommending news with uninformative content or even clickbaits. In this paper, we propose a quality-aware news recommendation method named QualityRec that can effectively improve the quality of recommended news. In our approach, we first propose an effective news quality evaluation method based on the distributions of users' reading dwell time on news. Next, we propose to incorporate news quality information into user interest modeling by designing a content-quality attention network to select clicked news based on both news semantics and qualities. We further train the recommendation model with an auxiliary news quality prediction task to learn quality-aware recommendation model, and we add a recommendation quality regularization loss to encourage the model to recommend higher-quality news. Extensive experiments on two real-world datasets show that QualityRec can effectively improve the overall quality of recommended news and reduce the recommendation of low-quality news, with even slightly better recommendation accuracy.
Published: 2022
Full Text: View/download PDF

7. NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application

Author: Wu, Chuhan, Wu, Fangzhao, Yu, Yang, Qi, Tao, Huang, Yongfeng, and Liu, Qi
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Computation and Language (cs.CL)
Abstract: Pre-trained language models (PLMs) like BERT have made great progress in NLP. News articles usually contain rich textual information, and PLMs have the potentials to enhance news text modeling for various intelligent news applications like news recommendation and retrieval. However, most existing PLMs are in huge size with hundreds of millions of parameters. Many online news applications need to serve millions of users with low latency tolerance, which poses huge challenges to incorporating PLMs in these scenarios. Knowledge distillation techniques can compress a large PLM into a much smaller one and meanwhile keeps good performance. However, existing language models are pre-trained and distilled on general corpus like Wikipedia, which has some gaps with the news domain and may be suboptimal for news intelligence. In this paper, we propose NewsBERT, which can distill PLMs for efficient and effective news intelligence. In our approach, we design a teacher-student joint learning and distillation framework to collaboratively learn both teacher and student models, where the student model can learn from the learning experience of the teacher model. In addition, we propose a momentum distillation method by incorporating the gradients of teacher model into the update of student model to better transfer useful knowledge learned by the teacher model. Extensive experiments on two real-world datasets with three tasks show that NewsBERT can effectively improve the model performance in various intelligent news applications with much smaller models.
Published: 2021

8. Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, Jiao, Binxing, Jiang, Daxin, Huang, Yongfeng, and Xie, Xing
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by computing sparse self-attention instead of a dense one, which usually attends to tokens at certain positions or randomly selected tokens. However, manually selected or random tokens may be uninformative for context modeling. In this paper, we propose Smart Bird, which is an efficient and effective Transformer with learnable sparse attention. In Smart Bird, we first compute a sketched attention matrix with a single-head low-dimensional Transformer, which aims to find potential important interactions between tokens. We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads. Finally, we select token embeddings according to the index matrices to form the input of sparse attention networks. Extensive experiments on six benchmark datasets for different tasks validate the efficiency and effectiveness of Smart Bird in text modeling.
Published: 2021
Full Text: View/download PDF

9. An unsupervised extractive summarization method based on multi-round computation

Author: Tao, Dehao, Xiong, Yingzhu, Yang, Zhongliang, Huang, Yongfeng, He, Jin, and Song, Kevin
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Text summarization methods have attracted much attention all the time. In recent years, deep learning has been applied to text summarization, and it turned out to be pretty effective. Most of the current text summarization methods based on deep learning are supervised methods which need large-scale datasets. However, large-scale datasets are difficult to obtain in practical applications. In this paper, an unsupervised extractive text summarization method based on multi-round calculation is proposed. Based on the directed graph algorithm, we change the common method which calculates the sentence ranking at one time to multi-round calculation, and we dynamically optimize the relation of sentences after each round of calculation to reduce the redundancy of summarization. Experiments are carried out on four data sets, each separately containing Chinese, English, long and short texts. The experiment results show that our method has better performance than other unsupervised methods.
Published: 2021
Full Text: View/download PDF

10. Fastformer: Additive Attention Can Be All You Need

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, Huang, Yongfeng, and Xie, Xing
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on long sequences or not effective enough. In this paper, we propose Fastformer, which is an efficient Transformer model based on additive attention. In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations. In this way, Fastformer can achieve effective context modeling with linear complexity. Extensive experiments on five datasets show that Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance., Comment: Add results on Bing Ad CVR prediction
Published: 2021
Full Text: View/download PDF

11. Improving Attention Mechanism with Query-Value Interaction

Author: Wu, Chuhan, Wu, Fangzhao, Qi, Tao, and Huang, Yongfeng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Attention mechanism has played critical roles in various state-of-the-art NLP models such as Transformer and BERT. It can be formulated as a ternary function that maps the input queries, keys and values into an output by using a summation of values weighted by the attention weights derived from the interactions between queries and keys. Similar with query-key interactions, there is also inherent relatedness between queries and values, and incorporating query-value interactions has the potential to enhance the output by learning customized values according to the characteristics of queries. However, the query-value interactions are ignored by existing attention methods, which may be not optimal. In this paper, we propose to improve the existing attention mechanism by incorporating query-value interactions. We propose a query-value interaction function which can learn query-aware attention values, and combine them with the original values and attention weights to form the final output. Extensive experiments on four datasets for different tasks show that our approach can consistently improve the performance of many attention-based models by incorporating query-value interactions.
Published: 2020

12. FedNER: Privacy-preserving Medical Named Entity Recognition with Federated Learning

Author: Ge, Suyu, Wu, Fangzhao, Wu, Chuhan, Qi, Tao, Huang, Yongfeng, and Xie, Xing
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Medical named entity recognition (NER) has wide applications in intelligent healthcare. Sufficient labeled data is critical for training accurate medical NER model. However, the labeled data in a single medical platform is usually limited. Although labeled datasets may exist in many different medical platforms, they cannot be directly shared since medical data is highly privacy-sensitive. In this paper, we propose a privacy-preserving medical NER method based on federated learning, which can leverage the labeled data in different platforms to boost the training of medical NER model and remove the need of exchanging raw data among different platforms. Since the labeled data in different platforms usually has some differences in entity type and annotation criteria, instead of constraining different platforms to share the same model, we decompose the medical NER model in each platform into a shared module and a private module. The private module is used to capture the characteristics of the local data in each platform, and is updated using local labeled data. The shared module is learned across different medical platform to capture the shared NER knowledge. Its local gradients from different platforms are aggregated to update the global shared module, which is further delivered to each platform to update their local shared modules. Experiments on three publicly available datasets validate the effectiveness of our method.
Published: 2020
Full Text: View/download PDF

13. Graph-Stega: Semantic Controllable Steganographic Text Generation Guided by Knowledge Graph

Author: Yang, Zhongliang, Gong, Baitao, Li, Yamin, Yang, Jinshuai, Hu, Zhiwen, and Huang, Yongfeng
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computation and Language (cs.CL), Cryptography and Security (cs.CR)
Abstract: Most of the existing text generative steganographic methods are based on coding the conditional probability distribution of each word during the generation process, and then selecting specific words according to the secret information, so as to achieve information hiding. Such methods have their limitations which may bring potential security risks. Firstly, with the increase of embedding rate, these models will choose words with lower conditional probability, which will reduce the quality of the generated steganographic texts; secondly, they can not control the semantic expression of the final generated steganographic text. This paper proposes a new text generative steganography method which is quietly different from the existing models. We use a Knowledge Graph (KG) to guide the generation of steganographic sentences. On the one hand, we hide the secret information by coding the path in the knowledge graph, but not the conditional probability of each generated word; on the other hand, we can control the semantic expression of the generated steganographic text to a certain extent. The experimental results show that the proposed model can guarantee both the quality of the generated text and its semantic expression, which is a supplement and improvement to the current text generation steganography.
Published: 2020
Full Text: View/download PDF

14. Neural Chinese Word Segmentation with Dictionary Knowledge

Author: Liu, Junxin, Wu, Fangzhao, Wu, Chuhan, Huang, Yongfeng, and Xie, Xing
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning, Machine Learning (stat.ML), Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient., Comment: This paper has been accepted by The Seventh CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2018)
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

14 results on '"Huang, Yongfeng"'

1. Solving Math Word Problems via Cooperative Reasoning induced Language Models

2. Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

3. Unified and Effective Ensemble Knowledge Distillation

4. NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

5. AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling

6. Quality-aware News Recommendation

7. NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application

8. Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

9. An unsupervised extractive summarization method based on multi-round computation

10. Fastformer: Additive Attention Can Be All You Need

11. Improving Attention Mechanism with Query-Value Interaction

12. FedNER: Privacy-preserving Medical Named Entity Recognition with Federated Learning

13. Graph-Stega: Semantic Controllable Steganographic Text Generation Guided by Knowledge Graph

14. Neural Chinese Word Segmentation with Dictionary Knowledge

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

14 results on '"Huang, Yongfeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources