25 results on '"Li, Mingchen"'
Search Results
2. W-procer: Weighted Prototypical Contrastive Learning for Medical Few-Shot Named Entity Recognition
- Author
-
Li, Mingchen, Ye, Yang, Yeung, Jeremy, Zhou, Huixue, Chu, Huaiyuan, and Zhang, Rui
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Contrastive learning has become a popular solution for few-shot Name Entity Recognization (NER). The conventional configuration strives to reduce the distance between tokens with the same labels and increase the distance between tokens with different labels. The effect of this setup may, however, in the medical domain, there are a lot of entities annotated as OUTSIDE (O), and they are undesirably pushed apart to other entities that are not labeled as OUTSIDE (O) by the current contrastive learning method end up with a noisy prototype for the semantic representation of the label, though there are many OUTSIDE (O) labeled entities are relevant to the labeled entities. To address this challenge, we propose a novel method named Weighted Prototypical Contrastive Learning for Medical Few Shot Named Entity Recognization (W-PROCER). Our approach primarily revolves around constructing the prototype-based contractive loss and weighting network. These components play a crucial role in assisting the model in differentiating the negative samples from OUTSIDE (O) tokens and enhancing the discrimination ability of contrastive learning. Experimental results show that our proposed W-PROCER framework significantly outperforms the strong baselines on the three medical benchmark datasets., Under Review
- Published
- 2023
3. Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking
- Author
-
Li, Mingchen and Huang, Lifu
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence - Abstract
Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions. It's important to many reasoning tasks to support human everyday activities. However, it's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies. To tackle these challenges, we propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from external knowledge graph (i.e., ConceptNet) and incorporates them to autoregressively generate all the entity state changes with a novel dynamic knowledge grained encoder-decoder framework. To enforce the logical coherence among the predicted entities, attributes, and states, we design a new constraint decoding strategy and employ a coherence reward to improve the decoding process. Experimental results show that our proposed KIEST framework significantly outperforms the strong baselines on the public benchmark dataset OpenPI., Published as a conference paper at SIGIR 2023
- Published
- 2023
4. Federated Multi-Sequence Stochastic Approximation with Local Hypergradient Estimation
- Author
-
Tarzanagh, Davoud Ataee, Li, Mingchen, Sharma, Pranay, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Machine Learning (cs.LG) - Abstract
Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods). However, designing provably-efficient federated algorithms for MSA has been an elusive question even for the special case of double sequence approximation (DSA). Towards this goal, we develop FedMSA which is the first federated algorithm for MSA, and establish its near-optimal communication complexity. As core novelties, (i) FedMSA enables the provable estimation of hypergradients in BLO and MCO via local client updates, which has been a notable bottleneck in prior theory, and (ii) our convergence guarantees are sensitive to the heterogeneity-level of the problem. We also incorporate momentum and variance reduction techniques to achieve further acceleration leading to near-optimal rates. Finally, we provide experiments that support our theory and demonstrate the empirical benefits of FedMSA. As an example, FedMSA enables order-of-magnitude savings in communication rounds compared to prior federated BLO schemes.
- Published
- 2023
- Full Text
- View/download PDF
5. FedYolo: Augmenting Federated Learning with Pretrained Transformers
- Author
-
Zhang, Xuechen, Li, Mingchen, Chang, Xiangyu, Chen, Jiasi, Roy-Chowdhury, Amit K., Suresh, Ananda Theertha, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Machine Learning (cs.LG) - Abstract
The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module., 20 pages, 18 figures
- Published
- 2023
- Full Text
- View/download PDF
6. TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling
- Author
-
Tan, Pan, Li, Mingchen, Zhang, Liang, Hu, Zhiqiang, and Hong, Liang
- Subjects
FOS: Biological sciences ,Quantitative Biology - Quantitative Methods ,Quantitative Methods (q-bio.QM) - Abstract
We introduce TemPL, a novel deep learning approach for zero-shot prediction of protein stability and activity, harnessing temperature-guided language modeling. By assembling an extensive dataset of 96 million sequence-host bacterial strain optimal growth temperatures (OGTs) and {\Delta}Tm data for point mutations under consistent experimental conditions, we effectively compared TemPL with state-of-the-art models. Notably, TemPL demonstrated superior performance in predicting protein stability. An ablation study was conducted to elucidate the influence of OGT prediction and language modeling modules on TemPL's performance, revealing the importance of integrating both components. Consequently, TemPL offers considerable promise for protein engineering applications, facilitating the design of mutation sequences with enhanced stability and activity.
- Published
- 2023
- Full Text
- View/download PDF
7. How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain
- Author
-
Li, Mingchen and Zhang, Rui
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computation and Language (cs.CL) - Abstract
Recent advancements in language models (LMs) have led to the emergence of powerful models such as Small LMs (e.g., T5) and Large LMs (e.g., GPT-4). These models have demonstrated exceptional capabilities across a wide range of tasks, such as name entity recognition (NER) in the general domain. (We define SLMs as pre-trained models with fewer parameters compared to models like GPT-3/3.5/4, such as T5, BERT, and others.) Nevertheless, their efficacy in the medical section remains uncertain and the performance of medical NER always needs high accuracy because of the particularity of the field. This paper aims to provide a thorough investigation to compare the performance of LMs in medical few-shot NER and answer How far is LMs from 100\% Few-shot NER in Medical Domain, and moreover to explore an effective entity recognizer to help improve the NER performance. Based on our extensive experiments conducted on 16 NER models spanning from 2018 to 2023, our findings clearly indicate that LLMs outperform SLMs in few-shot medical NER tasks, given the presence of suitable examples and appropriate logical frameworks. Despite the overall superiority of LLMs in few-shot medical NER tasks, it is important to note that they still encounter some challenges, such as misidentification, wrong template prediction, etc. Building on previous findings, we introduce a simple and effective method called \textsc{RT} (Retrieving and Thinking), which serves as retrievers, finding relevant examples, and as thinkers, employing a step-by-step reasoning process. Experimental results show that our proposed \textsc{RT} framework significantly outperforms the strong open baselines on the two open medical benchmark datasets, Comment: the first manuscript. arXiv admin note: text overlap with arXiv:2305.18624
- Published
- 2023
- Full Text
- View/download PDF
8. MC-GEN:Multi-level Clustering for Private Synthetic Data Generation
- Author
-
Li, Mingchen, Zhuang, Di, and Chang, J. Morris
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Cryptography and Security (cs.CR) ,Machine Learning (cs.LG) - Abstract
With the development of machine learning and data science, data sharing is very common between companies and research institutes to avoid data scarcity. However, sharing original datasets that contain private information can cause privacy leakage. A reliable solution is to utilize private synthetic datasets which preserve statistical information from original datasets. In this paper, we propose MC-GEN, a privacy-preserving synthetic data generation method under differential privacy guarantee for machine learning classification tasks. MC-GEN applies multi-level clustering and differential private generative model to improve the utility of synthetic data. In the experimental evaluation, we evaluated the effects of parameters and the effectiveness of MC-GEN. The results showed that MC-GEN can achieve significant effectiveness under certain privacy guarantees on multiple classification tasks. Moreover, we compare MC-GEN with three existing methods. The results showed that MC-GEN outperforms other methods in terms of utility.
- Published
- 2022
9. A Hierarchical N-Gram Framework for Zero-Shot Link Prediction
- Author
-
Li, Mingchen, Chen, Junfan, Mensah, Samuel, Aletras, Nikolaos, Yang, Xiulong, and Ye, Yang
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) - Abstract
Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embedding for each word token in the text. These methods lack robustness as they suffer from the out-of-vocabulary (OOV) problem. Meanwhile, models built on character n-grams have the capability of generating expressive representations for OOV words. Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP. Our approach works by first constructing a hierarchical n-gram graph on the surface name to model the organizational structure of n-grams that leads to the surface name. A GramTransformer, based on the Transformer is then presented to model the hierarchical n-gram graph to construct the relation embedding for ZSLP. Experimental results show the proposed HNZSLP achieved state-of-the-art performance on two ZSLP datasets., Comment: Published as a conference paper at EMNLP Findings 2022
- Published
- 2022
- Full Text
- View/download PDF
10. Provable and Efficient Continual Representation Learning
- Author
-
Li, Yingcong, Li, Mingchen, Asif, M. Salman, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
In continual learning (CL), the goal is to design models that can learn a sequence of tasks without catastrophic forgetting. While there is a rich set of techniques for CL, relatively little understanding exists on how representations built by previous tasks benefit new tasks that are added to the network. To address this, we study the problem of continual representation learning (CRL) where we learn an evolving representation as new tasks arrive. Focusing on zero-forgetting methods where tasks are embedded in subnetworks (e.g., PackNet), we first provide experiments demonstrating CRL can significantly boost sample efficiency when learning new tasks. To explain this, we establish theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations. Our analysis and experiments also highlight the importance of the order in which we learn the tasks. Specifically, we show that CL benefits if the initial tasks have large sample size and high "representation diversity". Diversity ensures that adding new tasks incurs small representation mismatch and can be learned with few samples while training only few additional nonzero weights. Finally, we ask whether one can ensure each task subnetwork to be efficient during inference time while retaining the benefits of representation learning. To this end, we propose an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning. ESPN embeds tasks in channel-sparse subnets requiring up to 80% less FLOPs to compute while approximately retaining accuracy and is very competitive with a variety of baselines. In summary, this work takes a step towards data and compute-efficient CL with a representation learning perspective. GitHub page: https://github.com/ucr-optml/CtRL
- Published
- 2022
- Full Text
- View/download PDF
11. Locally Differentially Private Distributed Deep Learning via Knowledge Distillation
- Author
-
Zhuang, Di, Li, Mingchen, and Chang, J. Morris
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Cryptography and Security (cs.CR) ,Machine Learning (cs.LG) - Abstract
Deep learning often requires a large amount of data. In real-world applications, e.g., healthcare applications, the data collected by a single organization (e.g., hospital) is often limited, and the majority of massive and diverse data is often segregated across multiple organizations. As such, it motivates the researchers to conduct distributed deep learning, where the data user would like to build DL models using the data segregated across multiple different data owners. However, this could lead to severe privacy concerns due to the sensitive nature of the data, thus the data owners would be hesitant and reluctant to participate. We propose LDP-DL, a privacy-preserving distributed deep learning framework via local differential privacy and knowledge distillation, where each data owner learns a teacher model using its own (local) private dataset, and the data user learns a student model to mimic the output of the ensemble of the teacher models. In the experimental evaluation, a comprehensive comparison has been made among our proposed approach (i.e., LDP-DL), DP-SGD, PATE and DP-FL, using three popular deep learning benchmark datasets (i.e., CIFAR10, MNIST and FashionMNIST). The experimental results show that LDP-DL consistently outperforms the other competitors in terms of privacy budget and model accuracy., Comment: 10 pages, 6 figures, 1 table. Submitted to IEEE Transactions on Knowledge and Data Engineering
- Published
- 2022
- Full Text
- View/download PDF
12. Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph
- Author
-
Li, Mingchen and Ji, Shihao
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) - Abstract
Building query graphs from natural language questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question. By doing so, we can first filter out noisy candidate query graphs, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP (WSP) demonstrate the effectiveness of our method as compared to state-of-the-arts., Comment: Published as a conference paper at COLING 2022
- Published
- 2022
- Full Text
- View/download PDF
13. FedNest: Federated Bilevel, Minimax, and Compositional Optimization
- Author
-
Tarzanagh, Davoud Ataee, Li, Mingchen, Thrampoulidis, Christos, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,FOS: Mathematics ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for \fedblo in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. \fedblo introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter \& hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. Code is available at https://github.com/ucr-optml/FedNest., ICML 2022 (accepted as a long presentation), 34 pages, 6 figures
- Published
- 2022
- Full Text
- View/download PDF
14. AutoBalance: Optimized Loss Functions for Imbalanced Data
- Author
-
Li, Mingchen, Zhang, Xuechen, Thrampoulidis, Christos, Chen, Jiasi, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perform poorly during test. To address these challenges, we propose AutoBalance, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives. Specifically, a lower-level problem trains the model weights, and an upper-level problem tunes the loss function by monitoring and optimizing the desired objective over the validation data. Our loss design enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes. We evaluate the benefits and performance of our approach for the application scenarios of imbalanced and group-sensitive classification. Extensive empirical evaluations demonstrate the benefits of AutoBalance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. All code is available open-source.
- Published
- 2022
- Full Text
- View/download PDF
15. Closure to 'Two-Step Mixing Process Elaboration of the Hot-Mix Asphalt Mixture Based on Surface Energy Theory' by Liping Liu, Mingchen Li, and Qingbing Lu
- Author
-
Li Mingchen, Liping Liu, and Lu Qingbing
- Subjects
Materials science ,Asphalt pavement ,Mechanics of Materials ,Scientific method ,Two step ,Closure (topology) ,Thermodynamics ,General Materials Science ,Building and Construction ,Surface energy ,Mixing (physics) ,Elaboration ,Civil and Structural Engineering - Published
- 2021
16. Generalization Guarantees for Neural Architecture Search with Train-Validation Split
- Author
-
Oymak, Samet, Li, Mingchen, and Soltanolkotabi, Mahdi
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the inner problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results. (1) We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the outer problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. This is established for continuous search spaces which are relevant for differentiable schemes. Extensions to transfer learning are developed in terms of the mismatch between training & validation distributions. (2) We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error. (3) Finally, we highlight connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel insights where the solution of the outer problem can be accurately learned via efficient spectral methods to achieve near-minimal risk., ICML 2021
- Published
- 2021
- Full Text
- View/download PDF
17. sj-pdf-1-jtr-10.1177_00472875211036194 – Supplemental material for Decomposition Methods for Tourism Demand Forecasting: A Comparative Study
- Author
-
Zhang, Chengyuan, Li, Mingchen, Sun, Shaolong, Tang, Ling, and Wang, Shouyang
- Subjects
FOS: Economics and business ,150310 Organisation and Management Theory ,150402 Hospitality Management - Abstract
Supplemental material, sj-pdf-1-jtr-10.1177_00472875211036194 for Decomposition Methods for Tourism Demand Forecasting: A Comparative Study by Chengyuan Zhang, Mingchen Li, Shaolong Sun, Ling Tang and Shouyang Wang in Journal of Travel Research
- Published
- 2021
- Full Text
- View/download PDF
18. sj-pdf-1-jtr-10.1177_00472875211036194 – Supplemental material for Decomposition Methods for Tourism Demand Forecasting: A Comparative Study
- Author
-
Zhang, Chengyuan, Li, Mingchen, Sun, Shaolong, Tang, Ling, and Wang, Shouyang
- Subjects
FOS: Economics and business ,150310 Organisation and Management Theory ,150402 Hospitality Management - Abstract
Supplemental material, sj-pdf-1-jtr-10.1177_00472875211036194 for Decomposition Methods for Tourism Demand Forecasting: A Comparative Study by Chengyuan Zhang, Mingchen Li, Shaolong Sun, Ling Tang and Shouyang Wang in Journal of Travel Research
- Published
- 2021
- Full Text
- View/download PDF
19. Two-Step Mixing Process Elaboration of the Hot-Mix Asphalt Mixture Based on Surface Energy Theory
- Author
-
Lu Qingbing, Li Mingchen, and Liping Liu
- Subjects
Materials science ,Two step ,Mixing (process engineering) ,Building and Construction ,Adhesion ,Surface energy ,Asphalt pavement ,Mechanics of Materials ,Asphalt ,Scientific method ,General Materials Science ,Composite material ,Elaboration ,Civil and Structural Engineering - Abstract
Hot-mix asphalt (HMA) is a multicomponent mixture composed of asphalt, coarse and fine aggregates, fillers, and other necessary additives. The bitumen–aggregate adhesion and performance of ...
- Published
- 2020
20. Exploring Weight Importance and Hessian Bias in Model Pruning
- Author
-
Li, Mingchen, Sattar, Yahya, Thrampoulidis, Christos, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science::Databases ,Machine Learning (cs.LG) - Abstract
Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don't have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or shallow neural nets. In this work, we provide a principled exploration of pruning by building on a natural notion of importance. For linear models, we show that this notion of importance is captured by covariance scaling which connects to the well-known Hessian-based pruning. We then derive asymptotic formulas that allow us to precisely compare the performance of different pruning methods. For neural networks, we demonstrate that the importance can be at odds with larger magnitudes and proper initialization is critical for magnitude-based pruning. Specifically, we identify settings in which weights become more important despite becoming smaller, which in turn leads to a catastrophic failure of magnitude-based pruning. Our results also elucidate that implicit regularization in the form of Hessian structure has a catalytic role in identifying the important weights, which dictate the pruning performance., 28 pages
- Published
- 2020
21. Multi-Fusion Chinese WordNet (MCW) : Compound of Machine Learning and Manual Correction
- Author
-
Li, Mingchen, Zhou, Zili, and Wang, Yanna
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,I.7.0 ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) - Abstract
Princeton WordNet (PWN) is a lexicon-semantic network based on cognitive linguistics, which promotes the development of natural language processing. Based on PWN, five Chinese wordnets have been developed to solve the problems of syntax and semantics. They include: Northeastern University Chinese WordNet (NEW), Sinica Bilingual Ontological WordNet (BOW), Southeast University Chinese WordNet (SEW), Taiwan University Chinese WordNet (CWN), Chinese Open WordNet (COW). By using them, we found that these word networks have low accuracy and coverage, and cannot completely portray the semantic network of PWN. So we decided to make a new Chinese wordnet called Multi-Fusion Chinese Wordnet (MCW) to make up those shortcomings. The key idea is to extend the SEW with the help of Oxford bilingual dictionary and Xinhua bilingual dictionary, and then correct it. More specifically, we used machine learning and manual adjustment in our corrections. Two standards were formulated to help our work. We conducted experiments on three tasks including relatedness calculation, word similarity and word sense disambiguation for the comparison of lemma's accuracy, at the same time, coverage also was compared. The results indicate that MCW can benefit from coverage and accuracy via our method. However, it still has room for improvement, especially with lemmas. In the future, we will continue to enhance the accuracy of MCW and expand the concepts in it., Comment: 7 pages. CICLing 2019: International Conference on Computational Linguistics and Intelligent Text Processing
- Published
- 2020
- Full Text
- View/download PDF
22. Short Text Classification via Knowledge powered Attention with Similarity Matrix based CNN
- Author
-
Li, Mingchen, Clinton, Gabtone., Miao, Yijia, and Gao, Feng
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,I.7.0 ,Computation and Language (cs.CL) - Abstract
Short text is becoming more and more popular on the web, such as Chat Message, SMS and Product Reviews. Accurately classifying short text is an important and challenging task. A number of studies have difficulties in addressing this problem because of the word ambiguity and data sparsity. To address this issue, we propose a knowledge powered attention with similarity matrix based convolutional neural network (KASM) model, which can compute comprehensive information by utilizing the knowledge and deep neural network. We use knowledge graph (KG) to enrich the semantic representation of short text, specially, the information of parent-entity is introduced in our model. Meanwhile, we consider the word interaction in the literal-level between short text and the representation of label, and utilize similarity matrix based convolutional neural network (CNN) to extract it. For the purpose of measuring the importance of knowledge, we introduce the attention mechanisms to choose the important information. Experimental results on five standard datasets show that our model significantly outperforms state-of-the-art methods., Comment: there is an error in this paper
- Published
- 2020
- Full Text
- View/download PDF
23. Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
- Author
-
Oymak, Samet, Fabian, Zalan, Li, Mingchen, and Soltanolkotabi, Mahdi
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Optimization and Control (math.OC) ,Statistics - Machine Learning ,FOS: Mathematics ,Machine Learning (stat.ML) ,Mathematics - Optimization and Control ,Machine Learning (cs.LG) - Abstract
Modern neural network architectures often generalize well despite containing many more parameters than the size of the training dataset. This paper explores the generalization capabilities of neural networks trained via gradient descent. We develop a data-dependent optimization and generalization theory which leverages the low-rank structure of the Jacobian matrix associated with the network. Our results help demystify why training and generalization is easier on clean and structured datasets and harder on noisy and unstructured datasets as well as how the network size affects the evolution of the train and test errors during training. Specifically, we use a control knob to split the Jacobian spectum into "information" and "nuisance" spaces associated with the large and small singular values. We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well. Over the nuisance space training is slower and early stopping can help with generalization at the expense of some bias. We also show that the overall generalization capability of the network is controlled by how well the label vector is aligned with the information space. A key feature of our results is that even constant width neural nets can provably generalize for sufficiently nice datasets. We conduct various numerical experiments on deep networks that corroborate our theoretical findings and demonstrate that: (i) the Jacobian of typical neural networks exhibit low-rank structure with a few large singular values and many small ones leading to a low-dimensional information space, (ii) over the information space learning is fast and most of the label vector falls on this space, and (iii) label noise falls on the nuisance space and impedes optimization/generalization.
- Published
- 2019
24. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
- Author
-
Li, Mingchen, Soltanolkotabi, Mahdi, and Oymak, Samet
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Such neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this, somewhat paradoxically, neural network models trained via first-order methods continue to predict well on yet unseen test data. This paper takes a step towards demystifying this phenomena. Under a rich dataset model, we show that gradient descent is provably robust to noise/corruption on a constant fraction of the labels despite overparameterization. In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels. (ii) to start to overfit to the noisy labels network must stray rather far from from the initialization which can only occur after many more iterations. Together, these results show that gradient descent with early stopping is provably robust to label noise and shed light on the empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting.
- Published
- 2019
25. SOLVING THE CHINESE PHYSICAL PROBLEM BASED ON DEEP LEARNING AND KNOWLEDGE GRAPH
- Author
-
Li Mingchen, Zili Zhou, and Yanna Wang
- Subjects
Theoretical computer science ,Knowledge graph ,Computer science ,business.industry ,Deep learning ,Graph (abstract data type) ,Artificial intelligence ,business - Abstract
In recent years, problem solving, automatic proof and human-like test-tasking have become a hot spot of research. This paper focus on the study of solving physical problem in Chinese. Based on the analysis of physical corpus, it is found that the physical problem are made up of n-tuples which contain concepts and relations between concepts, and the n-tuples can be expressed in the form of UP-graph (The graph of understanding problem), which is the semantic expression of physical problem. UP-graph is the base of problem solving which is generated by using physical knowledge graph (PKG). However, current knowledge graph is hard to be used in problem solving, because it cannot store methods for solving problem. So this paper presents a model of PKG which contains concepts and relations, in the model, concepts and relations are split into terms and unique IDs, and methods can be easily stored in the PKG as concepts. Based on the PKG, DKP-solving is proposed which is a novel approach for solving physical problem. The approach combines rules, statistical methods and knowledge reasoning effectively by integrating the deep learning and knowledge graph. The experimental results over the data set of real physical text indicate that DKP-solving is effective in physical problem solving.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.