Author: "Li, Shanda" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Shanda"' showing total 15 results

Start Over Author "Li, Shanda"

15 results on '"Li, Shanda"'

1. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Author: Wu, Yangzhen, Sun, Zhiqing, Li, Shanda, Welleck, Sean, and Yang, Yiming
Subjects: Computer Science - Artificial Intelligence
Abstract: While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies such as greedy search, majority voting, best-of-$n$, weighted voting, and two different tree search algorithms, using different model sizes and compute budgets. Our findings indicate smaller models (e.g., Llemma-7B) can outperform larger models given the same computation budgets, and that smaller models paired with advanced inference algorithms yield Pareto-optimal cost-performance trade-offs. For instance, the Llemma-7B model, equipped with our novel tree search algorithm, consistently outperforms Llemma-34B with standard majority voting on the MATH benchmark across all FLOPs budgets. We hope these findings contribute to a broader understanding of inference scaling laws for LLMs.
Published: 2024

2. Functional Interpolation for Relative Positions Improves Long Context Transformers

Author: Li, Shanda, You, Chong, Guruganesh, Guru, Ainslie, Joshua, Ontanon, Santiago, Zaheer, Manzil, Sanghai, Sumit, Yang, Yiming, Kumar, Sanjiv, and Bhojanapalli, Srinadh
Subjects: Computer Science - Machine Learning
Abstract: Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the input sequence lengths it can process, the choice of position encoding used during training can limit the performance of these models on longer inputs. We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We next empirically show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks., Comment: 26 pages; ICLR 2024 camera ready version
Published: 2023

3. Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

Author: Choromanski, Krzysztof Marcin, Li, Shanda, Likhosherstov, Valerii, Dubey, Kumar Avinava, Luo, Shengjie, He, Di, Yang, Yiming, Sarlos, Tamas, Weingarten, Thomas, and Weller, Adrian
Subjects: Computer Science - Machine Learning
Abstract: We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces. FLTs construct the optimal RPE mechanism implicitly by learning its spectral representation. As opposed to other architectures combining efficient low-rank linear attention with RPEs, FLTs remain practical in terms of their memory usage and do not require additional assumptions about the structure of the RPE mask. Besides, FLTs allow for applying certain structural inductive bias techniques to specify masking strategies, e.g. they provide a way to learn the so-called local RPEs introduced in this paper and give accuracy gains as compared with several other linear Transformers for language modeling. We also thoroughly test FLTs on other data modalities and tasks, such as image classification, 3D molecular modeling, and learnable optimizers. To the best of our knowledge, for 3D molecular data, FLTs are the first Transformer architectures providing linear attention and incorporating RPE masking., Comment: AISTATS 2024
Published: 2023

4. Is $L^2$ Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?

Author: Wang, Chuwei, Li, Shanda, He, Di, and Wang, Liwei
Subjects: Computer Science - Machine Learning, Mathematics - Numerical Analysis
Abstract: The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned solution. In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero. With this concept, we study an important class of high-dimensional non-linear PDEs in optimal control, the Hamilton-Jacobi-Bellman(HJB) Equation, and prove that for general $L^p$ Physics-Informed Loss, a wide class of HJB equation is stable only if $p$ is sufficiently large. Therefore, the commonly used $L^2$ loss is not suitable for training PINN on those equations, while $L^{\infty}$ loss is a better choice. Based on the theoretical insight, we develop a novel PINN training algorithm to minimize the $L^{\infty}$ loss for HJB equations which is in a similar spirit to adversarial training. The effectiveness of the proposed algorithm is empirically demonstrated through experiments. Our code is released at https://github.com/LithiumDA/L_inf-PINN.
Published: 2022

5. Your Transformer May Not be as Powerful as You Expect

Author: Luo, Shengjie, Li, Shanda, Zheng, Shuxin, Liu, Tie-Yan, Wang, Liwei, and He, Di
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative -- RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. One key reason lies in that most RPEs are placed in the softmax attention that always generates a right stochastic matrix. This restricts the network from capturing positional information in the RPEs and limits its capacity. To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions. Therefore, the corresponding URPE-based Transformers become universal function approximators. Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications. The code will be made publicly available at https://github.com/lsj2408/URPE., Comment: 22 pages; NeurIPS 2022, Camera Ready Version
Published: 2022

6. Learning Physics-Informed Neural Networks without Stacked Back-propagation

Author: He, Di, Li, Shanda, Shi, Wenlei, Gao, Xiaotian, Zhang, Jia, Bian, Jiang, Wang, Liwei, and Liu, Tie-Yan
Subjects: Computer Science - Machine Learning
Abstract: Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE). But, facing high-dimensional secondorder PDE problems, PINN will suffer from severe scalability issues since its loss includes second-order derivatives, the computational cost of which will grow along with the dimension during stacked back-propagation. In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks. In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation. We further discuss the model capacity and provide variance reduction methods to address key limitations in the derivative estimation. Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is significantly faster. Our code is released at https://github.com/LithiumDA/PINN-without-Stacked-BP., Comment: AISTATS 2023
Published: 2022

7. Can Vision Transformers Perform Convolution?

Author: Li, Shanda, Chen, Xiangning, He, Di, and Hsieh, Cho-Jui
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers. This naturally leads to the following questions: Can a self-attention layer of ViT express any convolution operation? In this work, we prove that a single ViT layer with image patches as the input can perform any convolution operation constructively, where the multi-head attention mechanism and the relative positional encoding play essential roles. We further provide a lower bound on the number of heads for Vision Transformers to express CNNs. Corresponding with our analysis, experimental results show that the construction in our proof can help inject convolutional bias into Transformers and significantly improve the performance of ViT in low data regimes.
Published: 2021

8. A visual detection algorithm for autonomous driving road environment perception

Author: Cong, Peichao, Feng, Hao, Li, Shanda, Li, Tianheng, Xu, Yutao, and Zhang, Xin
Published: 2024
Full Text: View/download PDF

9. Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Author: Luo, Shengjie, Li, Shanda, Cai, Tianle, He, Di, Peng, Dinglan, Zheng, Shuxin, Ke, Guolin, Wang, Liwei, and Liu, Tie-Yan
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $\mathcal{O}(n\log n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime., Comment: NeurIPS 2021, camera ready version
Published: 2021

10. A Novel Interpolation Method for Soil Parameters Combining RBF Neural Network and IDW in the Pearl River Delta.

Author: Zhao, Zuoxi, Luo, Shuyuan, Zhao, Xuanxuan, Zhang, Jiaxing, Li, Shanda, Luo, Yangfan, and Dai, Jiuxiang
Abstract: Soil fertility is a critical factor in agricultural production, directly impacting crop growth, yield, and quality. To achieve precise agricultural management, accurate spatial interpolation of soil parameters is essential. This study developed a new interpolation prediction framework that combines Radial Basis Function (RBF) neural networks with Inverse Distance Weighting (IDW), termed the IDW-RBFNN. This framework initially uses the IDW method to apply preliminary weights based on distance to the data points, which are then used as input for the RBF neural network to form a training dataset. Subsequently, the RBF neural network further trains on these data to refine the interpolation results, achieving more precise spatial data interpolation. We compared the interpolation prediction accuracy of the IDW-RBFNN framework with ordinary Kriging (OK) and RBF methods under three different parameter settings. Ultimately, the IDW-RBFNN demonstrated lower error rates in terms of RMSE and MRE compared to direct RBF interpolation methods when adjusting settings based on different power values, even with a fixed number of data samples. As the sample size decreases, the interpolation accuracy of OK and RBF methods is significantly affected, while the error of IDW-RBFNN remains relatively low. Considering both interpolation accuracy and resource limitations, we recommend using the IDW-RBFNN method (p = 2) with at least 60 samples as the minimum sampling density to ensure high interpolation accuracy under resource constraints. Our method overcomes limitations of existing approaches that use fixed steady-state distance decay parameters, providing an effective tool for soil fertility monitoring in delta regions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. MYOLO: A Lightweight Fresh Shiitake Mushroom Detection Model Based on YOLOv3

Author: Cong, Peichao, primary, Feng, Hao, additional, Lv, Kunfeng, additional, Zhou, Jiachao, additional, and Li, Shanda, additional
Published: 2023
Full Text: View/download PDF

12. Research on Instance Segmentation Algorithm of Greenhouse Sweet Pepper Detection Based on Improved Mask RCNN

Author: Cong, Peichao, primary, Li, Shanda, additional, Zhou, Jiachao, additional, Lv, Kunfeng, additional, and Feng, Hao, additional
Published: 2023
Full Text: View/download PDF

13. Km-Mask Rcnn: A Lightweight Instance Segmentation Algorithm for Strawberries with Multiple Growth Cycles

Author: Cong, Peichao, primary, Yutao, Xu, additional, Li, Tianheng, additional, Li, Shanda, additional, Feng, Hao, additional, and Zhang, Xin, additional
Published: 2023
Full Text: View/download PDF

14. Citrus Tree Crown Segmentation of Orchard Spraying Robot Based on RGB-D Image and Improved Mask R-CNN

Author: Cong, Peichao, primary, Zhou, Jiachao, additional, Li, Shanda, additional, Lv, Kunfeng, additional, and Feng, Hao, additional
Published: 2022
Full Text: View/download PDF

15. Citrus Tree Crown Segmentation of Orchard Spraying Robot Based on RGB-D Image and Improved Mask R-CNN.

Author: Cong, Peichao, Zhou, Jiachao, Li, Shanda, Lv, Kunfeng, and Feng, Hao
Subjects: CROWNS (Botany), CONVOLUTIONAL neural networks, SPRAYING & dusting in agriculture, CITRUS, AUTONOMOUS robots, METAL spraying, ORCHARDS, TREE growth
Abstract: Orchard spraying robots must visually obtain citrus tree crown growth information to meet the variable growth-stage-based spraying requirements. However, the complex environments and growth characteristics of fruit trees affect the accuracy of crown segmentation. Therefore, we propose a feature-map-based squeeze-and-excitation UNet++ (MSEU) region-based convolutional neural network (R-CNN) citrus tree crown segmentation method that intakes red–green–blue-depth (RGB-D) images that are pixel aligned and visual distance-adjusted to eliminate noise. Our MSEU R-CNN achieves accurate crown segmentation using squeeze-and-excitation (SE) and UNet++. To fully fuse the feature map information, the SE block correlates image features and recalibrates their channel weights, and the UNet++ semantic segmentation branch replaces the original mask structure to maximize the interconnectivity between feature layers, achieving a near-real time detection speed of 5 fps. Its bounding box (bbox) and segmentation (seg) AP50 scores are 96.6 and 96.2%, respectively, and the bbox average recall and F1-score are 73.0 and 69.4%, which are 3.4, 2.4, 4.9, and 3.5% higher than the original model, respectively. Compared with bbox instant segmentation (BoxInst) and conditional convolutional frameworks (CondInst), the MSEU R-CNN provides better seg accuracy and speed than the previous-best Mask R-CNN. These results provide the means to accurately employ autonomous spraying robots. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"Li, Shanda"'

1. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

2. Functional Interpolation for Relative Positions Improves Long Context Transformers

3. Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

4. Is $L^2$ Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?

5. Your Transformer May Not be as Powerful as You Expect

6. Learning Physics-Informed Neural Networks without Stacked Back-propagation

7. Can Vision Transformers Perform Convolution?

8. A visual detection algorithm for autonomous driving road environment perception

9. Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

10. A Novel Interpolation Method for Soil Parameters Combining RBF Neural Network and IDW in the Pearl River Delta.

11. MYOLO: A Lightweight Fresh Shiitake Mushroom Detection Model Based on YOLOv3

12. Research on Instance Segmentation Algorithm of Greenhouse Sweet Pepper Detection Based on Improved Mask RCNN

13. Km-Mask Rcnn: A Lightweight Instance Segmentation Algorithm for Strawberries with Multiple Growth Cycles

14. Citrus Tree Crown Segmentation of Orchard Spraying Robot Based on RGB-D Image and Improved Mask R-CNN

15. Citrus Tree Crown Segmentation of Orchard Spraying Robot Based on RGB-D Image and Improved Mask R-CNN.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

15 results on '"Li, Shanda"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources