"Feng, Yunzhen" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Feng, Yunzhen"' showing total 13 results

Start Over "Feng, Yunzhen"

13 results on '"Feng, Yunzhen"'

1. Strong Model Collapse

Author: Dohmatob, Elvis, Feng, Yunzhen, Subramonian, Arjun, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images.
Published: 2024

2. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Author: Feng, Yunzhen, Dohmatob, Elvis, Yang, Pu, Charton, Francois, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of verification on synthesized data to prevent model collapse. We provide a theoretical characterization using Gaussian mixtures, linear classifiers, and linear verifiers to derive conditions with measurable proxies to assess whether the verifier can effectively select synthesized data that leads to optimal performance. We experiment with two practical tasks -- computing matrix eigenvalues with transformers and news summarization with LLMs -- which both exhibit model collapse when trained on generated data, and show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse and that our proposed proxy measure strongly correlates with performance.
Published: 2024

3. Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Author: Feng, Yunzhen, Rudner, Tim G. J., Tsilivis, Nikolaos, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Methodology, Statistics - Machine Learning
Abstract: Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.
Published: 2024

4. Do Efficient Transformers Really Save Computation?

Author: Yang, Kai, Ackermann, Jan, He, Zhenyu, Feng, Guhao, Zhang, Bohang, Feng, Yunzhen, Ye, Qiwei, He, Di, and Wang, Liwei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This makes it challenging to identify when to use a specific model and what directions to prioritize for further investigation. In this paper, we aim to understand the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer. We focus on their reasoning capability as exhibited by Chain-of-Thought (CoT) prompts and follow previous works to model them as Dynamic Programming (DP) problems. Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size. Nonetheless, we identify a class of DP problems for which these models can be more efficient than the standard Transformer. We confirm our theoretical results through experiments on representative DP tasks, adding to the understanding of efficient Transformers' practical strengths and weaknesses.
Published: 2024

5. Model Collapse Demystified: The Case of Regression

Author: Dohmatob, Elvis, Feng, Yunzhen, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.
Published: 2024

6. A Tale of Tails: Model Collapse as a Change of Scaling Laws

Author: Dohmatob, Elvis, Feng, Yunzhen, Yang, Pu, Charton, Francois, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2.
Published: 2024

7. Embarassingly Simple Dataset Distillation

Author: Feng, Yunzhen, Vedantam, Ramakrishna, and Kempe, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets., Comment: Short version appears at NeurIPS 2023 WANT workshop
Published: 2023

8. Transferred Discrepancy: Quantifying the Difference Between Representations

Author: Feng, Yunzhen, Zhai, Runtian, He, Di, Wang, Liwei, and Dong, Bin
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Understanding what information neural networks capture is an essential problem in deep learning, and studying whether different models capture similar features is an initial step to achieve this goal. Previous works sought to define metrics over the feature matrices to measure the difference between two models. However, different metrics sometimes lead to contradictory conclusions, and there has been no consensus on which metric is suitable to use in practice. In this work, we propose a novel metric that goes beyond previous approaches. Recall that one of the most practical scenarios of using the learned representations is to apply them to downstream tasks. We argue that we should design the metric based on a similar principle. For that, we introduce the transferred discrepancy (TD), a new metric that defines the difference between two representations based on their downstream-task performance. Through an asymptotic analysis, we show how TD correlates with downstream tasks and the necessity to define metrics in such a task-dependent fashion. In particular, we also show that under specific conditions, the TD metric is closely related to previous metrics. Our experiments show that TD can provide fine-grained information for varied downstream tasks, and for the models trained from different initializations, the learned features are not the same in terms of downstream-task predictions. We find that TD may also be used to evaluate the effectiveness of different training strategies. For example, we demonstrate that the models trained with proper data augmentations that improve the generalization capture more similar features in terms of TD, while those with data augmentations that hurt the generalization will not. This suggests a training strategy that leads to more robust representation also trains models that generalize better., Comment: 23 pages, 3 figures
Published: 2020

9. Enhancing Certified Robustness via Smoothed Weighted Ensembling

Author: Liu, Chizhou, Feng, Yunzhen, Wang, Ranran, and Dong, Bin
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: Randomized smoothing has achieved state-of-the-art certified robustness against $l_2$-norm adversarial attacks. However, it is not wholly resolved on how to find the optimal base classifier for randomized smoothing. In this work, we employ a Smoothed WEighted ENsembling (SWEEN) scheme to improve the performance of randomized smoothed classifiers. We show the ensembling generality that SWEEN can help achieve optimal certified robustness. Furthermore, theoretical analysis proves that the optimal SWEEN model can be obtained from training under mild assumptions. We also develop an adaptive prediction algorithm to reduce the prediction and certification cost of SWEEN models. Extensive experiments show that SWEEN models outperform the upper envelope of their corresponding candidate models by a large margin. Moreover, SWEEN models constructed using a few small models can achieve comparable performance to a single large model with a notable reduction in training time.
Published: 2020

10. Numerical analysis of aortic hemodynamics under the support of venoarterial extracorporeal membrane oxygenation and intra-aortic balloon pump

Author: Gu, Kaiyun, Guan, Zhiyuan, Lin, Xuanqi, Feng, Yunzhen, Feng, Jieli, Yang, Yujie, Zhang, Zhe, Chang, Yu, Ling, Yunpeng, and Wan, Feng
Published: 2019
Full Text: View/download PDF

11. Repeat administration of human umbilical cord mesenchymal stem cells improves left ventricular diastolic function in mice with heart failure with preserved ejection fraction

Author: Feng, Yunzhen, Xin, Yuanfeng, Tang, Wenjie, Zhang, Pengfei, Jiang, Yun, Li, Hao, Gong, Yanshan, Chen, Feng, Xu, Zhifeng, Liu, Zhongmin, and Gao, Ling
Published: 2024
Full Text: View/download PDF

12. Hybrid repair of aberrant right subclavian artery with aortic dissection caused by Kommerell diverticulum

Author: Li, Tieyan, primary, Zou, Lin, additional, Feng, Yunzhen, additional, Fan, Guoliang, additional, and Xin, Yuanfeng, additional
Published: 2021
Full Text: View/download PDF

13. Hemodynamic Effects of Concentric and Eccentric Outflow Graft of LVADs on the Aortic Valve

Author: Song, Zhiming, primary, Zhang, Lufeng, additional, Ding, Yagang, additional, Wan, Qing, additional, Feng, Yunzhen, additional, Wang, Heqing, additional, Fan, Guoliang, additional, Zhang, Yangyang, additional, and Wan, Feng, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

13 results on '"Feng, Yunzhen"'

1. Strong Model Collapse

2. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

3. Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

4. Do Efficient Transformers Really Save Computation?

5. Model Collapse Demystified: The Case of Regression

6. A Tale of Tails: Model Collapse as a Change of Scaling Laws

7. Embarassingly Simple Dataset Distillation

8. Transferred Discrepancy: Quantifying the Difference Between Representations

9. Enhancing Certified Robustness via Smoothed Weighted Ensembling

10. Numerical analysis of aortic hemodynamics under the support of venoarterial extracorporeal membrane oxygenation and intra-aortic balloon pump

11. Repeat administration of human umbilical cord mesenchymal stem cells improves left ventricular diastolic function in mice with heart failure with preserved ejection fraction

12. Hybrid repair of aberrant right subclavian artery with aortic dissection caused by Kommerell diverticulum

13. Hemodynamic Effects of Concentric and Eccentric Outflow Graft of LVADs on the Aortic Valve

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

13 results on '"Feng, Yunzhen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources