"bottleneck" / Publication Year Range: This year - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"bottleneck"' showing total 52,524 results

Start Over "bottleneck" Publication Year Range This year

52,524 results on '"bottleneck"'

51. Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts

Author: Tan, Andong, Zhou, Fengtao, and Chen, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The concept bottleneck model (CBM) is an interpretable-by-design framework that makes decisions by first predicting a set of interpretable concepts, and then predicting the class label based on the given concepts. Existing CBMs are trained with a fixed set of concepts (concepts are either annotated by the dataset or queried from language models). However, this closed-world assumption is unrealistic in practice, as users may wonder about the role of any desired concept in decision-making after the model is deployed. Inspired by the large success of recent vision-language pre-trained models such as CLIP in zero-shot classification, we propose "OpenCBM" to equip the CBM with open vocabulary concepts via: (1) Aligning the feature space of a trainable image feature extractor with that of a CLIP's image encoder via a prototype based feature alignment; (2) Simultaneously training an image classifier on the downstream dataset; (3) Reconstructing the trained classification head via any set of user-desired textual concepts encoded by CLIP's text encoder. To reveal potentially missing concepts from users, we further propose to iteratively find the closest concept embedding to the residual parameters during the reconstruction until the residual is small enough. To the best of our knowledge, our "OpenCBM" is the first CBM with concepts of open vocabularies, providing users the unique benefit such as removing, adding, or replacing any desired concept to explain the model's prediction even after a model is trained. Moreover, our model significantly outperforms the previous state-of-the-art CBM by 9% in the classification accuracy on the benchmark dataset CUB-200-2011., Comment: ECCV2024
Published: 2024

52. Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization

Author: Mao, Wenyu, Wu, Jiancan, Liu, Haoyang, Sui, Yongduo, and Wang, Xiang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning in OOD problems for Euclidean data (i.e., images), the exploration within graph data remains constrained by the complex nature of graphs. Existing studies, such as data augmentation or causal intervention, either suffer from disruptions to invariance during the graph manipulation process or face reliability issues due to a lack of supervised signals for causal parts. In this work, we propose a novel framework, called Invariant Graph Learning based on Information bottleneck theory (InfoIGL), to extract the invariant features of graphs and enhance models' generalization ability to unseen distributions. Specifically, InfoIGL introduces a redundancy filter to compress task-irrelevant information related to environmental factors. Cooperating with our designed multi-level contrastive learning, we maximize the mutual information among graphs of the same class in the downstream classification tasks, preserving invariant features for prediction to a great extent. An appealing feature of InfoIGL is its strong generalization ability without depending on supervised signal of invariance. Experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance under OOD generalization for graph classification tasks. The source code is available at https://github.com/maowenyu-11/InfoIGL.
Published: 2024

53. Job Shop Scheduling with Integer Programming, Shifting Bottleneck, and Decision Diagrams: A Computational Study

Author: King, Brannon and Hildebrand, Robert
Subjects: Mathematics - Optimization and Control
Abstract: We study heuristic algorithms for job shop scheduling problems. We compare classical approaches, such as the shifting bottleneck heuristic with novel strategies using decision diagrams. Balas' local refinement is used to improve feasible solutions. Heuristic approaches are combined with Mixed Integer Programming and Constraint Programming approaches. We discuss our results via computational experiments.
Published: 2024

54. White matter tract crossing and bottleneck regions in the fetal brain

Author: Calixto, Camilo, Soldatelli, Matheus D., Li, Bo, Pierotich, Lana, Gholipour, Ali, Warfield, Simon K., and Karimi, Davood
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: There is a growing interest in using diffusion MRI to study the white matter tracts and structural connectivity of the fetal brain. Recent progress in data acquisition and processing suggests that this imaging modality has a unique role in elucidating the normal and abnormal patterns of neurodevelopment in utero. However, there have been no efforts to quantify the prevalence of crossing tracts and bottleneck regions, important issues that have been extensively researched for adult brains. In this work, we determined the brain regions with crossing tracts and bottlenecks between 23 and 36 gestational weeks. We performed probabilistic tractography on 59 fetal brain scans and extracted a set of 51 distinct white tracts, which we grouped into 10 major tract bundle groups. We analyzed the results to determine the patterns of tract crossings and bottlenecks. Our results showed that 20-25% of the white matter voxels included two or three crossing tracts. Bottlenecks were more prevalent. Between 75-80% of the voxels were characterized as bottlenecks, with more than 40% of the voxels involving four or more tracts. The results of this study highlight the challenge of fetal brain tractography and structural connectivity assessment and call for innovative image acquisition and analysis methods to mitigate these problems.
Published: 2024

55. Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval

Author: Balloli, Vaibhav, Beery, Sara, and Bondi-Kelly, Elizabeth
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity., Comment: Accepted at Human-Centred AI Track at IJCAI 2024
Published: 2024

56. Integrating Clinical Knowledge into Concept Bottleneck Models

Author: Pang, Winnie, Ke, Xueyi, Tsutsui, Satoshi, and Wen, Bihan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Concept bottleneck models (CBMs), which predict human-interpretable concepts (e.g., nucleus shapes in cell images) before predicting the final output (e.g., cell type), provide insights into the decision-making processes of the model. However, training CBMs solely in a data-driven manner can introduce undesirable biases, which may compromise prediction performance, especially when the trained models are evaluated on out-of-domain images (e.g., those acquired using different devices). To mitigate this challenge, we propose integrating clinical knowledge to refine CBMs, better aligning them with clinicians' decision-making processes. Specifically, we guide the model to prioritize the concepts that clinicians also prioritize. We validate our approach on two datasets of medical images: white blood cell and skin images. Empirical validation demonstrates that incorporating medical guidance enhances the model's classification performance on unseen datasets with varying preparation methods, thereby increasing its real-world applicability., Comment: Accepted to MICCAI2024
Published: 2024

57. Concept Bottleneck Models Without Predefined Concepts

Author: Schrodi, Simon, Schur, Julian, Argus, Max, and Brox, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: There has been considerable recent interest in interpretable concept-based models such as Concept Bottleneck Models (CBMs), which first predict human-interpretable concepts and then map them to output classes. To reduce reliance on human-annotated concepts, recent works have converted pretrained black-box models into interpretable CBMs post-hoc. However, these approaches predefine a set of concepts, assuming which concepts a black-box model encodes in its representations. In this work, we eliminate this assumption by leveraging unsupervised concept discovery to automatically extract concepts without human annotations or a predefined set of concepts. We further introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes. We show that our approach improves downstream performance and narrows the performance gap to black-box models, while using significantly fewer concepts in the classification. Finally, we demonstrate how large vision-language models can intervene on the final model weights to correct model errors.
Published: 2024

58. A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data

Author: Costa, Efthymios, Papatsouma, Ioanna, and Markos, Angelos
Subjects: Statistics - Methodology, Computer Science - Machine Learning, Statistics - Machine Learning, 62H30
Abstract: In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The method is a variant of the Deterministic Information Bottleneck algorithm which optimally compresses the data while retaining relevant information about the underlying structure. We compare the performance of the proposed method to that of three well-established clustering methods (KAMILA, K-Prototypes, and Partitioning Around Medoids with Gower's dissimilarity) on simulated and real-world datasets. The results demonstrate that the proposed approach represents a competitive alternative to conventional clustering techniques under specific conditions., Comment: Accepted at the 18th conference of the International Federation of Classification Societies (IFCS)
Published: 2024

59. Comment on Deterministic Information Bottleneck

Author: Marzen, Sarah
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: We make the case that although Deterministic Information Bottleneck may be a contribution to clustering, it should not be used to aid lossy compression without the addition of blocklength. We therefore suggest a new objective function that does so and leave its testing to future work., Comment: 2 pages; Comment on arXiv:1604.00268
Published: 2024

60. A DI-SRM Model for Production Bottleneck Prediction in Flexible Production Line

Author: Desheng Liu, Shuanglong Shi, Kun li, and Zhiguo Dai
Subjects: production line, drift index, stacked model, production bottleneck prediction, Engineering (General). Civil engineering (General), TA1-2040, Chemical engineering, TP155-156, Physics, QC1-999
Abstract: In the context of Industrial Internet of Things (IIoT) flexible production lines, accurately predicting production bottlenecks is crucial for optimizing efficiency and resource allocation. However, the dynamic and uncertain nature of these processes poses significant challenges. This study introduces a novel bottleneck prediction method by integrating the Drift Index (DI) with a Stacked Regression Model (SRM). This study marks the first application of stacked ensemble learning techniques in predicting bottlenecks within IIoT-enabled flexible production lines, leading to notable improvements in prediction accuracy and model robustness. The proposed method utilizes both real-time and historical data collected from IoT devices, encompassing three core steps: bottleneck data analysis, quantification of the drift index, and construction of the stacked regression model. By incorporating multiple production parameters such as equipment utilization and queue length, the method employs advanced time series analysis to forecast potential bottleneck drifts. Experimental results confirm that the DI-SRM model achieves high prediction accuracy and real-time responsiveness, effectively addressing the challenges of dynamic production environments. This approach provides reliable decision support for production scheduling and resource allocation, thereby optimizing production efficiency and enhancing market competitiveness.
Published: 2024
Full Text: View/download PDF

61. Correlation Between In Vitro and In Vivo Gene-Expression Strengths is Dependent on Bottleneck Process

Author: Enomoto, Toshihiko, Ohtake, Kazumasa, Senda, Naoko, and Kiga, Daisuke
Published: 2024
Full Text: View/download PDF

62. Blockchain-Enabled Variational Information Bottleneck for Data Extraction Based on Mutual Information in Internet of Vehicles

Author: Zhang, Cui, Zhang, Wenjun, Wu, Qiong, Fan, Pingyi, Cheng, Nan, Chen, Wen, and Letaief, Khaled B.
Subjects: Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: The Internet of Vehicles (IoV) network can address the issue of limited computing resources and data processing capabilities of individual vehicles, but it also brings the risk of privacy leakage to vehicle users. Applying blockchain technology can establish secure data links within the IoV, solving the problems of insufficient computing resources for each vehicle and the security of data transmission over the network. However, with the development of the IoV, the amount of data interaction between multiple vehicles and between vehicles and base stations, roadside units, etc., is continuously increasing. There is a need to further reduce the interaction volume, and intelligent data compression is key to solving this problem. The VIB technique facilitates the training of encoding and decoding models, substantially diminishing the volume of data that needs to be transmitted. This paper introduces an innovative approach that integrates blockchain with VIB, referred to as BVIB, designed to lighten computational workloads and reinforce the security of the network. We first construct a new network framework by separating the encoding and decoding networks to address the computational burden issue, and then propose a new algorithm to enhance the security of IoV networks. We also discuss the impact of the data extraction rate on system latency to determine the most suitable data extraction rate. An experimental framework combining Python and C++ has been established to substantiate the efficacy of our BVIB approach. Comprehensive simulation studies indicate that the BVIB consistently excels in comparison to alternative foundational methodologies., Comment: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/BVIB-for-Data-Extraction-Based-on Mutual-Information-in-the-IoV
Published: 2024

63. Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Author: Rivas, Pablo, Bichler, Gisela, Cerny, Tomas, Giddens, Laurie, and Petter, Stacie
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, I.2.10, I.5.1, K.4.1, H.3.3, I.2.6
Abstract: Unbiased representation learning is still an object of study under specific applications and contexts. Novel architectures are usually crafted to resolve particular problems using mixtures of fundamental pieces. This paper presents different image feature extraction mechanisms that work together with residual connections to encode perceptual image information in an autoencoder configuration. We use image data that aims to support a larger research agenda dealing with issues regarding criminal activity in consumer-to-consumer online platforms. Preliminary results suggest that the proposed architecture can learn rich spaces using ours and other image datasets resolving important challenges that are identified., Comment: 2022 LXAI Workshop at the 39th International Conference on Machine Learning (ICML), Baltimore, Maryland
Published: 2024

64. Use-dependent Biases as Optimal Action under Information Bottleneck

Author: Deng, Hokin and Haith, Adrian
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Information Theory
Abstract: Use-dependent bias is a phenomenon in human sensorimotor behavior whereby movements become biased towards previously repeated actions. Despite being well-documented, the reason why this phenomenon occurs is not yet clearly understood. Here, we propose that use-dependent biases can be understood as a rational strategy for movement under limitations on the capacity to process sensory information to guide motor output. We adopt an information-theoretic approach to characterize sensorimotor information processing and determine how behavior should be optimized given limitations to this capacity. We show that this theory naturally predicts the existence of use-dependent biases. Our framework also generates two further predictions. The first prediction relates to handedness. The dominant hand is associated with enhanced dexterity and reduced movement variability compared to the non-dominant hand, which we propose relates to a greater capacity for information processing in regions that control movement of the dominant hand. Consequently, the dominant hand should exhibit smaller use-dependent biases compared to the non-dominant hand. The second prediction relates to how use-dependent biases are affected by movement speed. When moving faster, it is more challenging to correct for initial movement errors online during the movement. This should exacerbate costs associated with initial directional error and, according to our theory, reduce the extent of use-dependent biases compared to slower movements, and vice versa. We show that these two empirical predictions, the handedness effect and the speed-dependent effect, are confirmed by experimental data.
Published: 2024

65. Unsqueeze [CLS] Bottleneck to Learn Rich Representations

Author: Su, Qing and Ji, Shihao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Distillation-based self-supervised learning typically leads to more compressed representations due to its radical clustering process and the implementation of a sharper target distribution. To overcome this limitation and preserve more information from input, we introduce UDI, conceptualized as Unsqueezed Distillation-based self-supervised learning (SSL). UDI enriches the learned representation by encouraging multimodal prediction distilled from a consolidated profile of local predictions that are derived via stratified sampling. Our evaluations show that UDI not only promotes semantically meaningful representations at instance level, delivering superior or competitive results to state-of-the-art SSL methods in image classification, but also effectively preserves the nuisance of input, which yields significant improvement in dense prediction tasks, including object detection and segmentation. Additionally, UDI performs competitively in low-shot image classification, improving the scalability of joint-embedding pipelines. Various visualizations and ablation studies are presented to further elucidate the mechanisms behind UDI. Our source code is available at https://github.com/ISL-CV/udi., Comment: ECCV 2024
Published: 2024

66. Audio Conditioning for Music Generation via Discrete Bottleneck Features

Author: Rouard, Simon, Adi, Yossi, Copet, Jade, Roebel, Axel, and Défossez, Alexandre
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: While most music generation models use textual or parametric conditioning (e.g. tempo, harmony, musical genre), we propose to condition a language model based music generation system with audio input. Our exploration involves two distinct strategies. The first strategy, termed textual inversion, leverages a pre-trained text-to-music model to map audio input to corresponding "pseudowords" in the textual embedding space. For the second model we train a music language model from scratch jointly with a text conditioner and a quantized audio feature extractor. At inference time, we can mix textual and audio conditioning and balance them thanks to a novel double classifier free guidance method. We conduct automatic and human studies that validates our approach. We will release the code and we provide music samples on https://musicgenstyle.github.io in order to show the quality of our model., Comment: 6 pages, 2 figures, accepted at ISMIR 2024
Published: 2024

67. Plausible mechanism of drug resistance and side-effects of COVID-19 therapeutics: a bottleneck for its eradication

Author: Das, Swarnali, Nath, Sreyashi, Shahjahan, and Dey, Sanjay Kumar
Published: 2024
Full Text: View/download PDF

68. Investigation of Bottleneck Enzyme Through Flux Balance Analysis to Improve Glycolic Acid Production in Escherichia coli

Author: Kim, Jungyeon, Kim, Ye-Bin, Kim, Ju-Young, Seo, Min-Ju, Yeom, Soo-Jin, and Sung, Bong Hyun
Published: 2024
Full Text: View/download PDF

69. Stochastic Concept Bottleneck Models

Author: Vandenhirtz, Moritz, Laguna, Sonia, Marcinkevičs, Ričards, and Vogt, Julia E.
Subjects: Computer Science - Machine Learning
Abstract: Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on intermediate, human-understandable concepts rather than the raw input. Through time-consuming manual interventions, a user can correct wrongly predicted concept values to enhance the model's downstream performance. We propose Stochastic Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. In SCBMs, a single-concept intervention affects all correlated concepts, thereby improving intervention effectiveness. Unlike previous approaches that model the concept relations via an autoregressive structure, we introduce an explicit, distributional parameterization that allows SCBMs to retain the CBMs' efficient training and inference procedure. Additionally, we leverage the parameterization to derive an effective intervention strategy based on the confidence region. We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations., Comment: Published at 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Published: 2024

70. Semi-supervised Concept Bottleneck Models

Author: Hu, Lijie, Huang, Tianhao, Xie, Huanyi, Ren, Chenyang, Hu, Zhengyu, Yu, Lu, and Wang, Di
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts. However, the training of current CBMs heavily relies on the accuracy and richness of annotated concepts in the dataset. These concept labels are typically provided by experts, which can be costly and require significant resources and effort. Additionally, concept saliency maps frequently misalign with input saliency maps, causing concept predictions to correspond to irrelevant input features - an issue related to annotation alignment. To address these limitations, we propose a new framework called SSCBM (Semi-supervised Concept Bottleneck Model). Our SSCBM is suitable for practical situations where annotated data is scarce. By leveraging joint training on both labeled and unlabeled data and aligning the unlabeled data at the concept level, we effectively solve these issues. We proposed a strategy to generate pseudo labels and an alignment loss. Experiments demonstrate that our SSCBM is both effective and efficient. With only 20% labeled data, we achieved 93.19% (96.39% in a fully supervised setting) concept accuracy and 75.51% (79.82% in a fully supervised setting) prediction accuracy., Comment: 17 pages
Published: 2024

71. SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery

Author: Xu, Jialang, Sirajudeen, Nazir, Boal, Matthew, Francis, Nader, Stoyanov, Danail, and Mazomenos, Evangelos
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying duration. Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases. Specifically, we deploy the clinically validated observational clinical human reliability assessment tool (OCHRA) to annotate the errors during suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50). Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gains with significantly reduced computational complexity. The corresponding error annotations, code and models will be released at https://github.com/wzjialang/SEDMamba., Comment: 8 pages
Published: 2024

72. Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification

Author: Udo, Honori and Koshinaka, Takafumi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We revisit language bottleneck models as an approach to ensuring the explainability of deep learning models for image classification. Because of inevitable information loss incurred in the step of converting images into language, the accuracy of language bottleneck models is considered to be inferior to that of standard black-box models. Recent image captioners based on large-scale foundation models of Vision and Language, however, have the ability to accurately describe images in verbal detail to a degree that was previously believed to not be realistically possible. In a task of disaster image classification, we experimentally show that a language bottleneck model that combines a modern image captioner with a pre-trained language model can achieve image classification accuracy that exceeds that of black-box models. We also demonstrate that a language bottleneck model and a black-box model may be thought to extract different features from images and that fusing the two can create a synergistic effect, resulting in even higher classification accuracy., Comment: Accepted at IEEE ICIP 2024. arXiv admin note: substantial text overlap with arXiv:2305.02932
Published: 2024

73. Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

Author: Seo, Sangwoo, Kim, Sungwon, Jung, Jihyeong, Lee, Yoonho, and Park, Chanyoung
Subjects: Computer Science - Machine Learning
Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal graphs due to its inability to capture temporal dependencies, recent studies proposed explanation models for temporal graphs. However, existing explanation models for temporal graphs rely on post-hoc explanations, requiring separate models for prediction and explanation, which is limited in two aspects: efficiency and accuracy of explanation. In this work, we propose a novel built-in explanation framework for temporal graphs, called Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck (TGIB). TGIB provides explanations for event occurrences by introducing stochasticity in each temporal event based on the Information Bottleneck theory. Experimental results demonstrate the superiority of TGIB in terms of both the link prediction performance and explainability compared to state-of-the-art methods. This is the first work that simultaneously performs prediction and explanation for temporal graphs in an end-to-end manner., Comment: KDD 2024
Published: 2024

74. Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

Author: Toller, Maximilian, Hussain, Hussain, and Geiger, Bernhard C
Subjects: Computer Science - Machine Learning
Abstract: A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.
Published: 2024

75. An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

Author: Zhu, Kun, Feng, Xiaocheng, Du, Xiyuan, Gu, Yuxuan, Yu, Weijiang, Wang, Haotian, Chen, Qianglong, Chu, Zheng, Chen, Jingchang, and Qin, Bing
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottleneck theory into retrieval-augmented generation. Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. In addition, we derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations, the selection of supervised fine-tuning data, and the construction of reinforcement learning rewards. Experimental results demonstrate that our approach achieves significant improvements across various question answering datasets, not only in terms of the correctness of answer generation but also in the conciseness with $2.5\%$ compression rate., Comment: Accepted to ACL 2024
Published: 2024

76. AnyCBMs: How to Turn Any Black Box into a Concept Bottleneck Model

Author: Dominici, Gabriele, Barbiero, Pietro, Giannini, Francesco, Gjoreski, Martin, and Langhenirich, Marc
Subjects: Computer Science - Machine Learning
Abstract: Interpretable deep learning aims at developing neural architectures whose decision-making processes could be understood by their users. Among these techniqes, Concept Bottleneck Models enhance the interpretability of neural networks by integrating a layer of human-understandable concepts. These models, however, necessitate training a new model from the beginning, consuming significant resources and failing to utilize already trained large models. To address this issue, we introduce "AnyCBM", a method that transforms any existing trained model into a Concept Bottleneck Model with minimal impact on computational resources. We provide both theoretical and experimental insights showing the effectiveness of AnyCBMs in terms of classification performances and effectivenss of concept-based interventions on downstream tasks.
Published: 2024

77. Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck

Author: Yang, Hao, Sun, Zexu, Xu, Hongteng, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfactual regression through the lens of information bottleneck and propose a novel learning paradigm called Gromov-Wasserstein information bottleneck (GWIB). In this paradigm, we learn CFR by maximizing the mutual information between covariates' latent representations and outcomes while penalizing the kernelized mutual information between the latent representations and the covariates. We demonstrate that the upper bound of the penalty term can be implemented as a new regularizer consisting of $i)$ the fused Gromov-Wasserstein distance between the latent representations of different groups and $ii)$ the gap between the transport cost generated by the model and the cross-group Gromov-Wasserstein distance between the latent representations and the covariates. GWIB effectively learns the CFR model through alternating optimization, suppressing selection bias while avoiding trivial latent distributions. Experiments on ITE estimation tasks show that GWIB consistently outperforms state-of-the-art CFR methods. To promote the research community, we release our project at https://github.com/peteryang1031/Causal-GWIB., Comment: 19 pages
Published: 2024

78. Editable Concept Bottleneck Models

Author: Hu, Lijie, Ren, Chenyang, Hu, Zhengyu, Lin, Hongbin, Wang, Cheng-Long, Xiong, Hui, Zhang, Jingfeng, and Wang, Di
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on cases where the data, including concepts, are clean. In many scenarios, we always need to remove/insert some training data or new concepts from trained CBMs due to different reasons, such as privacy concerns, data mislabelling, spurious concepts, and concept annotation errors. Thus, the challenge of deriving efficient editable CBMs without retraining from scratch persists, particularly in large-scale applications. To address these challenges, we propose Editable Concept Bottleneck Models (ECBMs). Specifically, ECBMs support three different levels of data removal: concept-label-level, concept-level, and data-level. ECBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for re-training. Experimental results demonstrate the efficiency and effectiveness of our ECBMs, affirming their adaptability within the realm of CBMs., Comment: 36 pages
Published: 2024

79. IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck

Author: Zou, He, Qin, Meng'en, Song, Yu, and Yang, Xiaohui
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In the realm of neural network models, the perpetual challenge remains in retaining task-relevant information while effectively discarding redundant data during propagation. In this paper, we introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks by dynamically adjusting the trade-off hyperparameter $\lambda$ through gradient descent, updating it within the FISTA(Fast Iterative Shrinkage-Thresholding Algorithm ) framework. By optimizing the compressive excitation loss function induced by the information bottleneck principle, IB-AdCSCNet achieves an optimal balance between compression and fitting at a global level, approximating the globally optimal representation feature. This information bottleneck trade-off strategy driven by downstream tasks not only helps to learn effective features of the data, but also improves the generalization of the model. This study's contribution lies in presenting a model with consistent performance and offering a fresh perspective on merging deep learning with sparse representation theory, grounded in the information bottleneck concept. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data. Through the inference of the IB trade-off, the model's robustness is notably enhanced.
Published: 2024

80. Partial information decomposition: redundancy as information bottleneck

Author: Kolchinsky, Artemy
Subjects: Computer Science - Information Theory, Statistics - Machine Learning
Abstract: The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target. Here, we show that this goal can be formulated as a type of information bottleneck (IB) problem, termed the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that best predict the target, without revealing which source provided the information. It can be understood as a generalization of "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction--compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve., Comment: Entropy, 2024
Published: 2024
Full Text: View/download PDF

81. Bottleneck-Minimal Indexing for Generative Document Retrieval

Author: Du, Xin, Xiu, Lixin, and Tanaka-Ishii, Kumiko
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be considered to involve information transmission from documents $X$ to queries $Q$, with the requirement to transmit more bits via the indexes $T$. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes $T$ can then be regarded as a {\em bottleneck} in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods., Comment: Accepted for ICML 2024
Published: 2024

82. Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection

Author: Sun, Shengyang and Gong, Xiaojin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Weakly supervised multimodal violence detection aims to learn a violence detection model by leveraging multiple modalities such as RGB, optical flow, and audio, while only video-level annotations are available. In the pursuit of effective multimodal violence detection (MVD), information redundancy, modality imbalance, and modality asynchrony are identified as three key challenges. In this work, we propose a new weakly supervised MVD method that explicitly addresses these challenges. Specifically, we introduce a multi-scale bottleneck transformer (MSBT) based fusion module that employs a reduced number of bottleneck tokens to gradually condense information and fuse each pair of modalities and utilizes a bottleneck token-based weighting scheme to highlight more important fused features. Furthermore, we propose a temporal consistency contrast loss to semantically align pairwise fused features. Experiments on the largest-scale XD-Violence dataset demonstrate that the proposed method achieves state-of-the-art performance. Code is available at https://github.com/shengyangsun/MSBT., Comment: Accepted by ICME 2024
Published: 2024

83. Protecting Your LLMs with Information Bottleneck

Author: Liu, Zichuan, Wang, Zefan, Xu, Linjie, Wang, Jinyu, Song, Lei, Wang, Tianchun, Chen, Chunlin, Cheng, Wei, and Bian, Jiang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector), a defense mechanism grounded in the information bottleneck principle, and we modify the objective to avoid trivial solutions. The IBProtector selectively compresses and perturbs prompts, facilitated by a lightweight and trainable extractor, preserving only essential information for the target LLMs to respond with the expected answer. Moreover, we further consider a situation where the gradient is not visible to be compatible with any LLM. Our empirical evaluations show that IBProtector outperforms current defense methods in mitigating jailbreak attempts, without overly affecting response quality or inference speed. Its effectiveness and adaptability across various attack methods and target LLMs underscore the potential of IBProtector as a novel, transferable defense that bolsters the security of LLMs without requiring modifications to the underlying models., Comment: Accepted by Neural Information Processing Systems (NeurIPS 2024)
Published: 2024

84. Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

Author: Arnold, Solvi, Suzuki, Reiji, Arita, Takaya, and Yamazaki, Kimitoshi
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, I.2.6
Abstract: Advanced biological intelligence learns efficiently from an information-rich stream of stimulus information, even when feedback on behaviour quality is sparse or absent. Such learning exploits implicit assumptions about task domains. We refer to such learning as Domain-Adapted Learning (DAL). In contrast, AI learning algorithms rely on explicit externally provided measures of behaviour quality to acquire fit behaviour. This imposes an information bottleneck that precludes learning from diverse non-reward stimulus information, limiting learning efficiency. We consider the question of how biological evolution circumvents this bottleneck to produce DAL. We propose that species first evolve the ability to learn from reward signals, providing inefficient (bottlenecked) but broad adaptivity. From there, integration of non-reward information into the learning process can proceed via gradual accumulation of biases induced by such information on specific task domains. This scenario provides a biologically plausible pathway towards bottleneck-free, domain-adapted learning. Focusing on the second phase of this scenario, we set up a population of NNs with reward-driven learning modelled as Reinforcement Learning (A2C), and allow evolution to improve learning efficiency by integrating non-reward information into the learning process using a neuromodulatory update mechanism. On a navigation task in continuous 2D space, evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents. Evolution is found to eliminate reliance on reward information altogether, allowing DAL agents to learn from non-reward information exclusively, using local neuromodulation-based connection weight updates only. Code available at github.com/aislab/dal., Comment: Camera ready version. 9 pages, 5 figures
Published: 2024
Full Text: View/download PDF

85. Incremental Residual Concept Bottleneck Models

Author: Shang, Chenming, Zhou, Shiji, Zhang, Hengyuan, Ni, Xinzhe, Yang, Yujiu, and Wang, Yuwang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Concept Bottleneck Models (CBMs) map the black-box visual representations extracted by deep neural networks onto a set of interpretable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. Multimodal pre-trained models can match visual representations with textual concept embeddings, allowing for obtaining the interpretable concept bottleneck without the expertise concept annotations. Recent research has focused on the concept bank establishment and the high-quality concept selection. However, it is challenging to construct a comprehensive concept bank through humans or large language models, which severely limits the performance of CBMs. In this work, we propose the Incremental Residual Concept Bottleneck Model (Res-CBM) to address the challenge of concept completeness. Specifically, the residual concept bottleneck model employs a set of optimizable vectors to complete missing concepts, then the incremental concept discovery module converts the complemented vectors with unclear meanings into potential concepts in the candidate concept bank. Our approach can be applied to any user-defined concept bank, as a post-hoc processing method to enhance the performance of any CBMs. Furthermore, to measure the descriptive efficiency of CBMs, the Concept Utilization Efficiency (CUE) metric is proposed. Experiments show that the Res-CBM outperforms the current state-of-the-art methods in terms of both accuracy and efficiency and achieves comparable performance to black-box models across multiple datasets.
Published: 2024

86. Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Author: Godey, Nathan, de la Clergerie, Éric, and Sagot, Benoît
Subjects: Computer Science - Computation and Language
Abstract: Recent advances in language modeling consist in pretraining highly parameterized neural networks on extremely large web-mined text corpora. Training and inference with such models can be costly in practice, which incentivizes the use of smaller counterparts. However, it has been observed that smaller models can suffer from saturation, characterized as a drop in performance at some advanced point in training followed by a plateau. In this paper, we find that such saturation can be explained by a mismatch between the hidden dimension of smaller models and the high rank of the target contextual probability distribution. This mismatch affects the performance of the linear prediction head used in such models through the well-known softmax bottleneck phenomenon. We measure the effect of the softmax bottleneck in various settings and find that models based on less than 1000 hidden dimensions tend to adopt degenerate latent representations in late pretraining, which leads to reduced evaluation performance.
Published: 2024

87. Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

Author: Semenov, Andrei, Ivanov, Vladimir, Beznosikov, Aleksandr, and Gasnikov, Alexander
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, I.2.6, I.2.10, I.4.10, I.5.1, I.5.4, I.5.5, I.2.6, I.2.10, I.4.10, I.5.1, I.5.4, I.5.5
Abstract: We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs). While SOTA approaches to Image Classification task work as a black box, there is a growing demand for models that would provide interpreted results. Such a models often learn to predict the distribution over class labels using additional description of this target instances, called concepts. However, existing Bottleneck methods have a number of limitations: their accuracy is lower than that of a standard model and CBMs require an additional set of concepts to leverage. We provide a framework for creating Concept Bottleneck Model from pre-trained multi-modal encoder and new CLIP-like architectures. By introducing a new type of layers known as Concept Bottleneck Layers, we outline three methods for training them: with $\ell_1$-loss, contrastive loss and loss function based on Gumbel-Softmax distribution (Sparse-CBM), while final FC layer is still trained with Cross-Entropy. We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models. Which means that sparse representation of concepts activation vector is meaningful in Concept Bottleneck Models. Moreover, with our Concept Matrix Search algorithm we can improve CLIP predictions on complex datasets without any additional training or fine-tuning. The code is available at: https://github.com/Andron00e/SparseCBM., Comment: 23 pages, 1 algorithm, 36 figures
Published: 2024

88. The new fuzzy bottleneck model to improve the axle manufacturing system performance

Author: Sarı, Hacı and İç, Yusuf Tansel
Published: 2024
Full Text: View/download PDF

89. Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering

Author: Jiang, Jingjing, Liu, Ziyi, and Zheng, Nanning
Published: 2024
Full Text: View/download PDF

90. Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Author: Jiang, Yangzhou, Lin, Yinxin, Wang, Yaoming, Li, Teng, Ke, Bilian, and Ni, Bingbing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Appearance-based supervised methods with full-face image input have made tremendous advances in recent gaze estimation tasks. However, intensive human annotation requirement inhibits current methods from achieving industrial level accuracy and robustness. Although current unsupervised pre-training frameworks have achieved success in many image recognition tasks, due to the deep coupling between facial and eye features, such frameworks are still deficient in extracting useful gaze features from full-face. To alleviate above limitations, this work proposes a novel unsupervised/self-supervised gaze pre-training framework, which forces the full-face branch to learn a low dimensional gaze embedding without gaze annotations, through collaborative feature contrast and squeeze modules. In the heart of this framework is an alternating eye-attended/unattended masking training scheme, which squeezes gaze-related information from full-face branch into an eye-masked auto-encoder through an injection bottleneck design that successfully encourages the model to pays more attention to gaze direction rather than facial textures only, while still adopting the eye self-reconstruction objective. In the same time, a novel eye/gaze-related information contrastive loss has been designed to further boost the learned representation by forcing the model to focus on eye-centered regions. Extensive experimental results on several gaze benchmarks demonstrate that the proposed scheme achieves superior performances over unsupervised state-of-the-art., Comment: 12 pages, 6 figures, 7 tables
Published: 2024

91. Learning a Clinically-Relevant Concept Bottleneck for Lesion Detection in Breast Ultrasound

Author: Bunnell, Arianna, Glaser, Yannik, Valdez, Dustin, Wolfgruber, Thomas, Altamirano, Aleen, González, Carol Zamora, Hernandez, Brenda Y., Sadowski, Peter, and Shepherd, John A.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Detecting and classifying lesions in breast ultrasound images is a promising application of artificial intelligence (AI) for reducing the burden of cancer in regions with limited access to mammography. Such AI systems are more likely to be useful in a clinical setting if their predictions can be explained to a radiologist. This work proposes an explainable AI model that provides interpretable predictions using a standard lexicon from the American College of Radiology's Breast Imaging and Reporting Data System (BI-RADS). The model is a deep neural network featuring a concept bottleneck layer in which known BI-RADS features are predicted before making a final cancer classification. This enables radiologists to easily review the predictions of the AI system and potentially fix errors in real time by modifying the concept predictions. In experiments, a model is developed on 8,854 images from 994 women with expert annotations and histological cancer labels. The model outperforms state-of-the-art lesion detection frameworks with 48.9 average precision on the held-out testing set, and for cancer classification, concept intervention is shown to increase performance from 0.876 to 0.885 area under the receiver operating characteristic curve. Training and evaluation code is available at https://github.com/hawaii-ai/bus-cbm., Comment: Submitted version of manuscript accepted at MICCAI 2024. This preprint has not undergone peer review or any post-submission improvements or corrections
Published: 2024

92. Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning based Information Bottleneck

Author: Wang, Dedi and Tiwary, Pratyush
Subjects: Physics - Computational Physics, Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics, Physics - Biological Physics, Physics - Chemical Physics
Abstract: The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.
Published: 2024

93. Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Author: Stafylakis, Themos, Silnova, Anna, Rohdin, Johan, Plchot, Oldrich, and Burget, Lukas
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss., Comment: Accepted at Interspeech 2024
Published: 2024

94. Spatially Resolved Gene Expression Prediction from Histology via Multi-view Graph Contrastive Learning with HSIC-bottleneck Regularization

Author: Chi, Changxi, Shi, Hang, Zhu, Qi, Zhang, Daoqiang, and Shao, Wei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The rapid development of spatial transcriptomics(ST) enables the measurement of gene expression at spatial resolution, making it possible to simultaneously profile the gene expression, spatial locations of spots, and the matched histopathological images. However, the cost for collecting ST data is much higher than acquiring histopathological images, and thus several studies attempt to predict the gene expression on ST by leveraging their corresponding histopathological images. Most of the existing image-based gene prediction models treat the prediction task on each spot of ST data independently, which ignores the spatial dependency among spots. In addition, while the histology images share phenotypic characteristics with the ST data, it is still challenge to extract such common information to help align paired image and expression representations. To address the above issues, we propose a Multi-view Graph Contrastive Learning framework with HSIC-bottleneck Regularization(ST-GCHB) aiming at learning shared representation to help impute the gene expression of the queried imagingspots by considering their spatial dependency.
Published: 2024

95. Breaking the Attention Bottleneck

Author: Hilsenbek, Kalle
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Attention-based transformers have become the standard architecture in many deep learning fields, primarily due to their ability to model long-range dependencies and handle variable-length input sequences. However, the attention mechanism with its quadratic complexity is a significant bottleneck in the transformer architecture. This algorithm is only uni-directional in the decoder and converges to a static pattern in over-parametrized decoder-only models. I address this issue by developing a generative function as attention or activation replacement. It still has the auto-regressive character by comparing each token with the previous one. In my test setting with nanoGPT this yields a smaller loss while having a smaller model. The loss further drops by incorporating an average context vector. This concept of attention replacement is distributed under the GNU AGPL v3 license at https://gitlab.com/Bachstelze/causal_generation., Comment: 6 pages, 4 figures
Published: 2024

96. Is Value Learning Really the Main Bottleneck in Offline RL?

Author: Park, Seohong, Frans, Kevin, Levine, Sergey, and Kumar, Aviral
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this observation, we aim to understand the bottlenecks in current offline RL algorithms. While poor performance of offline RL is typically attributed to an imperfect value function, we ask: is the main bottleneck of offline RL indeed in learning the value function, or something else? To answer this question, we perform a systematic empirical study of (1) value learning, (2) policy extraction, and (3) policy generalization in offline RL problems, analyzing how these components affect performance. We make two surprising observations. First, we find that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL, often more so than the value learning objective. For instance, we show that common value-weighted behavioral cloning objectives (e.g., AWR) do not fully leverage the learned value function, and switching to behavior-constrained policy gradient objectives (e.g., DDPG+BC) often leads to substantial improvements in performance and scalability. Second, we find that a big barrier to improving offline RL performance is often imperfect policy generalization on test-time states out of the support of the training data, rather than policy learning on in-distribution states. We then show that the use of suboptimal but high-coverage data or test-time policy training techniques can address this generalization issue in practice. Specifically, we propose two simple test-time policy improvement methods and show that these methods lead to better performance., Comment: NeurIPS 2024
Published: 2024

97. Enhancing Adversarial Transferability via Information Bottleneck Constraints

Author: Qi, Biqing, Gao, Junqi, Liu, Jianxing, Wu, Ligang, and Zhou, Bowen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: From the perspective of information bottleneck (IB) theory, we propose a novel framework for performing black-box transferable adversarial attacks named IBTA, which leverages advancements in invariant features. Intuitively, diminishing the reliance of adversarial perturbations on the original data, under equivalent attack performance constraints, encourages a greater reliance on invariant features that contributes most to classification, thereby enhancing the transferability of adversarial attacks. Building on this motivation, we redefine the optimization of transferable attacks using a novel theoretical framework that centers around IB. Specifically, to overcome the challenge of unoptimizable mutual information, we propose a simple and efficient mutual information lower bound (MILB) for approximating computation. Moreover, to quantitatively evaluate mutual information, we utilize the Mutual Information Neural Estimator (MINE) to perform a thorough analysis. Our experiments on the ImageNet dataset well demonstrate the efficiency and scalability of IBTA and derived MILB. Our code is available at https://github.com/Biqing-Qi/Enhancing-Adversarial-Transferability-via-Information-Bottleneck-Constraints.
Published: 2024

98. VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning

Author: Dutta, Oshin, Gupta, Ritvik, and Agarwal, Sumeet
Subjects: Computer Science - Machine Learning
Abstract: In recent years, there has been a growing emphasis on compressing large pre-trained transformer models for resource-constrained devices. However, traditional pruning methods often leave the embedding layer untouched, leading to model over-parameterization. Additionally, they require extensive compression time with large datasets to maintain performance in pruned models. To address these challenges, we propose VTrans, an iterative pruning framework guided by the Variational Information Bottleneck (VIB) principle. Our method compresses all structural components, including embeddings, attention heads, and layers using VIB-trained masks. This approach retains only essential weights in each layer, ensuring compliance with specified model size or computational constraints. Notably, our method achieves upto 70% more compression than prior state-of-the-art approaches, both task-agnostic and task-specific. We further propose faster variants of our method: Fast-VTrans utilizing only 3% of the data and Faster-VTrans, a time efficient alternative that involves exclusive finetuning of VIB masks, accelerating compression by upto 25 times with minimal performance loss compared to previous methods. Extensive experiments on BERT, ROBERTa, and GPT-2 models substantiate the efficacy of our method. Moreover, our method demonstrates scalability in compressing large models such as LLaMA-2-7B, achieving superior performance compared to previous pruning methods. Additionally, we use attention-based probing to qualitatively assess model redundancy and interpret the efficiency of our approach. Notably, our method considers heads with high attention to special and current tokens in un-pruned model as foremost candidates for pruning while retained heads are observed to attend more to task-critical keywords.
Published: 2024

99. High Throughput Polar Code Decoders with Information Bottleneck Quantization

Author: Kestel, Claus, Johannsen, Lucas, and Wehn, Norbert
Subjects: Computer Science - Information Theory
Abstract: In digital baseband processing, the forward error correction (FEC) unit belongs to the most demanding components in terms of computational complexity and power consumption. Hence, efficient implementation of FEC decoders is crucial for next generation mobile broadband standards and an ongoing research topic. Quantization has a significant impact on the decoder area, power consumption and throughput. Thus, lower bit-widths are preferred for efficient implementations but degrade the error-correction capability. To address this issue, a non-uniform quantization based on the Information Bottleneck (IB) method was proposed that enables a low bit width while maintaining the essential information. Many investigations on the use of IB method for Low-density parity-check code (LDPC) decoders exist and have shown its advantages from an implementation perspective. However, for polar code decoder implementations, there exists only one publication that is not based on the state-of-the-art Fast-SSC decoding algorithm, and only synthesis implementation results without energy estimation are shown. In contrast, our paper presents several optimized Fast Simplified Successive-Cancellation (Fast-SSC) polar code decoder implementations using IB-based quantization with placement&routing results in an advanced 12 nm FinFET technology. Gains of up to 16% in area and 13% in energy efficiency are achieved with IB-based quantization at a Frame Error Rate (FER) of 10-7 and a Polar Code of N = 1024, R = 0.5 compared to state-of-the-art decoders.
Published: 2024

100. Genetic Bottleneck and the Emergence of High Intelligence by Scaling-out and High Throughput

Author: Khan, Arifa, P, Saravanan, and K., Venkatesan S.
Subjects: Computer Science - Other Computer Science
Abstract: We study the biological evolution of low-latency natural neural networks for short-term survival, and its parallels in the development of low latency high-performance Central Processing Unit in computer design and architecture. The necessity of accurate high-quality display of motion picture led to the special processing units known as the GPU, just as how special visual cortex regions of animals produced such low-latency computational capacity. The human brain, especially considered as nothing but a scaled-up version of a primate brain evolved in response to genomic bottleneck, producing a brain that is trainable and prunable by society, and as a further extension, invents language, writing and storage of narratives displaced in time and space. We conclude that this modern digital invention of social media and the archived collective common corpus has further evolved from just simple CPU-based low-latency fast retrieval to high-throughput parallel processing of data using GPUs to train Attention based Deep Learning Neural Networks producing Generative AI with aspects like toxicity, bias, memorization, hallucination, with intriguing close parallels in humans and their society. We show how this paves the way for constructive approaches to eliminating such drawbacks from human society and its proxy and collective large-scale mirror, the Generative AI of the LLMs.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

52,524 results on '"bottleneck"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources