Author: "Shidani, Amitis" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shidani, Amitis"' showing total 8 results

Start Over Author "Shidani, Amitis"

8 results on '"Shidani, Amitis"'

1. Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Author: Ramapuram, Jason, Danieli, Federico, Dhekane, Eeshan, Weers, Floris, Busbridge, Dan, Ablin, Pierre, Likhomanenko, Tatiana, Digani, Jagrit, Gu, Zijin, Shidani, Amitis, and Webb, Russ
Subjects: Computer Science - Machine Learning
Abstract: Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoid attention and conduct an in-depth theoretical and empirical analysis. Theoretically, we prove that transformers with sigmoid attention are universal function approximators and benefit from improved regularity compared to softmax attention. Through detailed empirical analysis, we identify stabilization of large initial attention norms during the early stages of training as a crucial factor for the successful training of models with sigmoid attention, outperforming prior attempts. We also introduce FLASHSIGMOID, a hardware-aware and memory-efficient implementation of sigmoid attention yielding a 17% inference kernel speed-up over FLASHATTENTION2 on H100 GPUs. Experiments across language, vision, and speech show that properly normalized sigmoid attention matches the strong performance of softmax attention on a wide range of domains and scales, which previous attempts at sigmoid attention were unable to fully achieve. Our work unifies prior art and establishes best practices for sigmoid attention as a drop-in softmax replacement in transformers.
Published: 2024

2. Poly-View Contrastive Learning

Author: Shidani, Amitis, Hjelm, Devon, Ramapuram, Jason, Webb, Russ, Dhekane, Eeshan Gunesh, and Busbridge, Dan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Theory, Statistics - Machine Learning
Abstract: Contrastive learning typically matches pairs of related views among a number of unrelated negative views. Views can be generated (e.g. by augmentations) or be observed. We investigate matching when there are more than two related views which we call poly-view tasks, and derive new representation learning objectives using information maximization and sufficient statistics. We show that with unlimited computation, one should maximize the number of related views, and with a fixed compute budget, it is beneficial to decrease the number of unique samples whilst increasing the number of views of those samples. In particular, poly-view contrastive models trained for 128 epochs with batch size 256 outperform SimCLR trained for 1024 epochs at batch size 4096 on ImageNet1k, challenging the belief that contrastive models require large batch sizes and many training epochs., Comment: Accepted to ICLR 2024. 42 pages, 7 figures, 3 tables, loss pseudo-code included in appendix
Published: 2024

3. Optimal Regret Bounds for Collaborative Learning in Bandits

Author: Shidani, Amitis and Vakili, Sattar
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed., Comment: Algorithmic Learning Theory (ALT) 2024
Published: 2023

4. Ranking In Generalized Linear Bandits

Author: Shidani, Amitis, Deligiannidis, George, and Doucet, Arnaud
Subjects: Statistics - Machine Learning, Computer Science - Information Retrieval, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.
Published: 2022

5. Chained Generalisation Bounds

Author: Clerico, Eugenio, Shidani, Amitis, Deligiannidis, George, and Doucet, Arnaud
Subjects: Statistics - Machine Learning, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: This work discusses how to derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. By developing a general theoretical framework, we establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts, which can be obtained by lifting the regularity assumption from the loss onto its gradient. This allows us to re-derive the chaining mutual information bound from the literature, and to obtain novel chained information-theoretic generalisation bounds, based on the Wasserstein distance and other probability metrics. We show on some toy examples that the chained generalisation bound can be significantly tighter than its standard counterpart, particularly when the distribution of the hypotheses selected by the algorithm is very concentrated. Keywords: Generalisation bounds; Chaining; Information-theoretic bounds; Mutual information; Wasserstein distance; PAC-Bayes.
Published: 2022

6. Ranking in Contextual Multi-Armed Bandits

Author: Shidani, Amitis, Deligiannidis, George, and Doucet, Arnaud
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Machine Learning (cs.LG)
Abstract: We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an ordered list of items at each time step and observes stochastic outcomes for each position. In online recommendation systems, showing an ordered list of the most attractive items would not be the best choice since both position and item dependencies result in a complicated reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. We prove that the regret bound over $T$ rounds and $L$ positions is $\Tilde{O}(L\sqrt{d T})$, which has the same order as the previous works with respect to $T$ and only increases linearly with $L$. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and proposes a more general contextual bandit model.
Published: 2022
Full Text: View/download PDF

7. Vertex partitioning of graphs into odd induced subgraphs

Author: Aashtab, Arman, primary, Akbari, Saieed, additional, Ghanbari, Maryam, additional, and Shidani, Amitis, additional
Published: 2020
Full Text: View/download PDF

8. VERTEX PARTITIONING OF GRAPHS INTO ODD INDUCED SUBGRAPHS.

Author: AASHTAB, ARMAN, AKBARI, SAIEED, GHANBARI, MARYAM, and SHIDANI, AMITIS
Subjects: *GRAPH connectivity, *INDEPENDENT sets, *ODD numbers, *TREE graphs, *TIMBERLINE, *GRAPH algorithms, *PLANAR graphs
Abstract: A graph G is called an odd (even) graph if for every vertex v ∈ V (G), dG (v) is odd (even). Let G be a graph of even order. Scott in 1992 proved that the vertices of every connected graph of even order can be partitioned into some odd induced forests. We denote the minimum number of odd induced subgraphs which partition V (G) by od(G). If all of the subgraphs are forests, then we denote it by odF (G). In this paper, we show that if G is a connected subcubic graph of even order or G is a connected planar graph of even order, then odF (G) ≤ 4. Moreover, we show that for every tree T of even order odF (T) ≤ 2 and for every unicyclic graph G of even order odF (G) ≤ 3. Also, we prove that if G is claw-free, then V (G) can be partitioned into at most Δ(G)-1 induced forests and possibly one independent set. Furthermore, we demonstrate that the vertex set of the line graph of a tree can be partitioned into at most two odd induced subgraphs and possibly one independent set. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Shidani, Amitis"'

1. Theory, Analysis, and Best Practices for Sigmoid Self-Attention

2. Poly-View Contrastive Learning

3. Optimal Regret Bounds for Collaborative Learning in Bandits

4. Ranking In Generalized Linear Bandits

5. Chained Generalisation Bounds

6. Ranking in Contextual Multi-Armed Bandits

7. Vertex partitioning of graphs into odd induced subgraphs

8. VERTEX PARTITIONING OF GRAPHS INTO ODD INDUCED SUBGRAPHS.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

8 results on '"Shidani, Amitis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources