Author: "Chang, Xiangyu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chang, Xiangyu"' showing total 303 results

Start Over Author "Chang, Xiangyu"

303 results on '"Chang, Xiangyu"'

1. Randomized Spectral Clustering for Large-Scale Multi-Layer Networks

Author: Su, Wenqing, Guo, Xiao, Chang, Xiangyu, and Yang, Ying
Subjects: Statistics - Computation
Abstract: Large-scale multi-layer networks with large numbers of nodes, edges, and layers arise across various domains, which poses a great computational challenge for the downstream analysis. In this paper, we develop an efficient randomized spectral clustering algorithm for community detection of multi-layer networks. We first utilize the random sampling strategy to sparsify the adjacency matrix of each layer. Then we use the random projection strategy to accelerate the eigen-decomposition of the sum-of-squared sparsified adjacency matrices of all layers. The communities are finally obtained via the k-means of the eigenvectors. The algorithm not only has low time complexity but also saves the storage space. Theoretically, we study the misclassification error rate of the proposed algorithm under the multi-layer stochastic block models, which shows that the randomization does not deteriorate the error bound under certain conditions. Numerical studies on multi-layer networks with millions of nodes show the superior efficiency of the proposed algorithm, which achieves clustering results rapidly. A new R package called MLRclust is developed and made available to the public.
Published: 2025

2. Selective Attention: Enhancing Transformer through Principled Context Control

Author: Zhang, Xuechen, Chang, Xiangyu, Li, Mingchen, Roy-Chowdhury, Amit, Chen, Jiasi, and Oymak, Samet
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$ are the value and key embeddings respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. As a solution, we introduce the $\textit{Selective Self-Attention}$ (SSA) layer that augments the softmax nonlinearity with a principled temperature scaling strategy. By controlling temperature, SSA adapts the contextual sparsity of the attention map to the query embedding and its position in the context window. Through theory and experiments, we demonstrate that this alleviates attention dilution, aids the optimization process, and enhances the model's ability to control softmax spikiness of individual queries. We also incorporate temperature scaling for value embeddings and show that it boosts the model's ability to suppress irrelevant/noisy tokens. Notably, SSA is a lightweight method which introduces less than 0.5% new parameters through a weight-sharing strategy and can be fine-tuned on existing LLMs. Extensive empirical evaluations demonstrate that SSA-equipped models achieve a noticeable and consistent accuracy improvement on language modeling benchmarks.
Published: 2024

3. Towards Data Valuation via Asymmetric Data Shapley

Author: Zheng, Xi, Chang, Xiangyu, Jia, Ruoxi, and Tan, Yong
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning
Abstract: As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient $k$-nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our framework across various machine learning tasks and data market contexts. The code is available at: https://github.com/xzheng01/Asymmetric-Data-Shapley.
Published: 2024

4. AdapFair: Ensuring Continuous Fairness for Machine Learning Operations

Author: Huang, Yinghui, Tang, Zihao, and Chang, Xiangyu
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: The biases and discrimination of machine learning algorithms have attracted significant attention, leading to the development of various algorithms tailored to specific contexts. However, these solutions often fall short of addressing fairness issues inherent in machine learning operations. In this paper, we present a debiasing framework designed to find an optimal fair transformation of input data that maximally preserves data predictability. A distinctive feature of our approach is its flexibility and efficiency. It can be integrated with any downstream black-box classifiers, providing continuous fairness guarantees with minimal retraining efforts, even in the face of frequent data drifts, evolving fairness requirements, and batches of similar tasks. To achieve this, we leverage the normalizing flows to enable efficient, information-preserving data transformation, ensuring that no critical information is lost during the debiasing process. Additionally, we incorporate the Wasserstein distance as the unfairness measure to guide the optimization of data transformations. Finally, we introduce an efficient optimization algorithm with closed-formed gradient computations, making our framework scalable and suitable for dynamic, real-world environments., Comment: 18 pages,15 figures
Published: 2024

5. Uncertainty Quantification of Data Shapley via Statistical Inference

Author: Wu, Mengmeng, Liu, Zhihong, Li, Xiang, Jia, Ruoxi, and Chang, Xiangyu
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications where data constantly evolves and expands. This paper establishes the relationship between Data Shapley and infinite-order U-statistics and addresses this limitation by quantifying the uncertainty of Data Shapley with changes in data distribution from the perspective of U-statistics. We make statistical inferences on data valuation to obtain confidence intervals for the estimations. We construct two different algorithms to estimate this uncertainty and provide recommendations for their applicable situations. We also conduct a series of experiments on various datasets to verify asymptotic normality and propose a practical trading scenario enabled by this method.
Published: 2024

6. Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

Author: Di, Hao, Ye, Haishan, Zhang, Yueling, Chang, Xiangyu, Dai, Guang, and Tsang, Ivor W.
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.
Published: 2024

7. Fedpower: privacy-preserving distributed eigenspace estimation

Author: Guo, Xiao, Li, Xiang, Chang, Xiangyu, Wang, Shusen, and Zhang, Zhihua
Published: 2024
Full Text: View/download PDF

8. Anderson Acceleration Without Restart: A Novel Method with $n$-Step Super Quadratic Convergence Rate

Author: Ye, Haishan, Lin, Dachao, Chang, Xiangyu, and Zhang, Zhihua
Subjects: Mathematics - Optimization and Control
Abstract: In this paper, we propose a novel Anderson's acceleration method to solve nonlinear equations, which does \emph{not} require a restart strategy to achieve numerical stability. We propose the greedy and random versions of our algorithm. Specifically, the greedy version selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix. In contrast, the random version chooses the random Gaussian vector as the direction to update the approximate Jacobian. Furthermore, our algorithm, including both greedy and random versions, has an $n$-step super quadratic convergence rate, where $n$ is the dimension of the objective problem. For example, the explicit convergence rate of the random version can be presented as $ \norm{\vx_{k+n+1} - \vx_*} / \norm{\vx_k- \vx_*}^2 = \cO\left(\left(1-\frac{1}{n}\right)^{kn}\right)$ for any $k\geq 0$ where $\vx_*$ is the optimum of the objective problem. This kind of convergence rate is new to Anderson's acceleration and quasi-Newton methods. The experiments also validate the fast convergence rate of our algorithm.
Published: 2024

9. FLASH: Federated Learning Across Simultaneous Heterogeneities

Author: Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, and Roy-Chowdhury, Amit K.
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of heterogeneity is critical; for instance, low-latency clients may have poor data quality, and vice versa. In this work, we propose FLASH(Federated Learning Across Simultaneous Heterogeneities), a lightweight and flexible client selection algorithm that outperforms state-of-the-art FL frameworks under extensive sources of heterogeneity, by trading-off the statistical information associated with the client's data quality, data distribution, and latency. FLASH is the first method, to our knowledge, for handling all these heterogeneities in a unified manner. To do so, FLASH models the learning dynamics through contextual multi-armed bandits (CMAB) and dynamically selects the most promising clients. Through extensive experiments, we demonstrate that FLASH achieves substantial and consistent improvements over state-of-the-art baselines -- as much as 10% in absolute accuracy -- thanks to its unified approach. Importantly, FLASH also outperforms federated aggregation methods that are designed to handle highly heterogeneous settings and even enjoys a performance boost when integrated with them.
Published: 2024

10. Plug-and-Play Transformer Modules for Test-Time Adaptation

Author: Chang, Xiangyu, Ahmed, Sk Miraj, Krishnamurthy, Srikanth V., Guler, Basak, Swami, Ananthram, Oymak, Samet, and Roy-Chowdhury, Amit K.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.
Published: 2024

11. CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions

Author: Ahmed, Sk Miraj, Niloy, Fahim Faisal, Chang, Xiangyu, Raychaudhuri, Dripta S., Oymak, Samet, and Roy-Chowdhury, Amit K.
Subjects: Computer Science - Machine Learning
Abstract: Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. To address the challenge of adapting to dynamic distributions in such practical settings, we propose Continual Multi-source Adaptation to Dynamic Distributions (CONTRAST), a novel method that optimally combines multiple source models to adapt to the dynamic test data. CONTRAST has two distinguishing features. First, it efficiently computes the optimal combination weights to combine the source models to adapt to the test data distribution continuously as a function of time. Second, it identifies which of the source model parameters to update so that only the model which is most correlated to the target data is adapted, leaving the less correlated ones untouched; this mitigates the issue of ``forgetting" the source model parameters by focusing only on the source model that exhibits the strongest correlation with the test batch distribution. Through theoretical analysis we show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting. Experimental analysis on diverse datasets demonstrates that the combination of multiple source models does at least as well as the best source (with hindsight knowledge), and performance does not degrade as the test data distribution changes over time (robust to forgetting)., Comment: NeurIPS 2024
Published: 2024

12. Optimal Decentralized Composite Optimization for Convex Functions

Author: Ye, Haishan and Chang, Xiangyu
Subjects: Mathematics - Optimization and Control
Abstract: In this paper, we focus on the decentralized composite optimization for convex functions. Because of advantages such as robust to the network and no communication bottle-neck in the central server, the decentralized optimization has attracted much research attention in signal processing, control, and optimization communities. Many optimal algorithms have been proposed for the objective function is smooth and (strongly)-convex in the past years. However, it is still an open question whether one can design an optimal algorithm when there is a non-smooth regularization term. In this paper, we fill the gap between smooth decentralized optimization and decentralized composite optimization and propose the first algorithm which can achieve both the optimal computation and communication complexities. Our experiments also validate the effectiveness and efficiency of our algorithm both in computation and communication.
Published: 2023

13. Identification of a disulfidptosis-related genes signature for diagnostic and immune infiltration characteristics in endometriosis

Author: Chang, Xiangyu and Miao, Jinwei
Published: 2024
Full Text: View/download PDF

14. ROS/mtROS promotes TNTs formation via the PI3K/AKT/mTOR pathway to protect against mitochondrial damages in glial cells induced by engineered nanomaterials

Author: Lin, Xinpei, Wang, Wei, Chang, Xiangyu, Chen, Cheng, Guo, Zhenkun, Yu, Guangxia, Shao, Wenya, Wu, Siying, Zhang, Qunwei, Zheng, Fuli, and Li, Huangyuan
Published: 2024
Full Text: View/download PDF

15. PPFL: A Personalized Federated Learning Framework for Heterogeneous Population

Author: Di, Hao, Yang, Yi, Ye, Haishan, and Chang, Xiangyu
Subjects: Computer Science - Machine Learning
Abstract: Personalization aims to characterize individual preferences and is widely applied across many fields. However, conventional personalized methods operate in a centralized manner and potentially expose the raw data when pooling individual information. In this paper, with privacy considerations, we develop a flexible and interpretable personalized framework within the paradigm of Federated Learning, called PPFL (Population Personalized Federated Learning). By leveraging canonical models to capture fundamental characteristics among the heterogeneous population and employing membership vectors to reveal clients' preferences, it models the heterogeneity as clients' varying preferences for these characteristics and provides substantial insights into client characteristics, which is lacking in existing Personalized Federated Learning (PFL) methods. Furthermore, we explore the relationship between our method and three main branches of PFL methods: multi-task PFL, clustered FL, and decoupling PFL, and demonstrate the advantages of PPFL. To solve PPFL (a non-convex constrained optimization problem), we propose a novel random block coordinate descent algorithm and present the convergence property. We conduct experiments on both pathological and practical datasets, and the results validate the effectiveness of PPFL., Comment: 38 pages, 11 figures
Published: 2023

16. Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules

Author: Wu, Ying, Liu, Hanzhong, Ren, Kai, and Chang, Xiangyu
Subjects: Computer Science - Machine Learning, Statistics - Methodology, Statistics - Machine Learning
Abstract: Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient.
Published: 2023

17. Spectral co-Clustering in Multi-layer Directed Networks

Author: Su, Wenqing, Guo, Xiao, Chang, Xiangyu, and Yang, Ying
Subjects: Mathematics - Statistics Theory, Statistics - Applications
Abstract: Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect co-clusters, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the debiased sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the k-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results.
Published: 2023
Full Text: View/download PDF

18. FedYolo: Augmenting Federated Learning with Pretrained Transformers

Author: Zhang, Xuechen, Li, Mingchen, Chang, Xiangyu, Chen, Jiasi, Roy-Chowdhury, Amit K., Suresh, Ananda Theertha, and Oymak, Samet
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module., Comment: 20 pages, 18 figures
Published: 2023

19. Privacy-Preserving Community Detection for Locally Distributed Multiple Networks

Author: Guo, Xiao, Li, Xiang, Chang, Xiangyu, and Ma, Shujie
Subjects: Computer Science - Social and Information Networks, Computer Science - Machine Learning, Statistics - Methodology, Statistics - Machine Learning
Abstract: Modern multi-layer networks are commonly stored and analyzed in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on the model-based statistical methods for community detection based on these data is still limited. This paper proposes a new method for consensus community detection and estimation in a multi-layer stochastic block model using locally stored and computed network data with privacy protection. A novel algorithm named privacy-preserving Distributed Spectral Clustering (ppDSC) is developed. To preserve the edges' privacy, we adopt the randomized response (RR) mechanism to perturb the network edges, which satisfies the strong notion of differential privacy. The ppDSC algorithm is performed on the squared RR-perturbed adjacency matrices to prevent possible cancellation of communities among different layers. To remove the bias incurred by RR and the squared network matrices, we develop a two-step bias-adjustment procedure. Then we perform eigen-decomposition on the debiased matrices, aggregation of the local eigenvectors using an orthogonal Procrustes transformation, and k-means clustering. We provide theoretical analysis on the statistical errors of ppDSC in terms of eigen-vector estimation. In addition, the blessings and curses of network heterogeneity are well-explained by our bounds.
Published: 2023

20. 2D-Shapley: A Framework for Fragmented Data Valuation

Author: Liu, Zhihong, Just, Hoang Anh, Chang, Xiangyu, Chen, Xi, and Jia, Ruoxi
Subjects: Computer Science - Machine Learning
Abstract: Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis., Comment: 25 pages, 13 figures, ICML 2023
Published: 2023

21. Subsampling-Based Modified Bayesian Information Criterion for Large-Scale Stochastic Block Models

Author: Deng, Jiayi, Huang, Danyang, Chang, Xiangyu, and Zhang, Bo
Subjects: Statistics - Methodology
Abstract: Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian information criterion (SM-BIC) for identifying the number of communities in a network generated via the stochastic block model and degree-corrected stochastic block model. We first propose a node-pair subsampling method to extract an informative subnetwork from the entire network, and then we derive a purely data-driven criterion to identify the number of communities for the subnetwork. In this way, the SM-BIC can identify the number of communities based on the subsampled network instead of the entire dataset. This leads to important computational advantages over existing methods. We theoretically investigate the computational complexity and identification consistency of the SM-BIC. Furthermore, the advantages of the SM-BIC are demonstrated by extensive numerical studies.
Published: 2023

22. Learning Personalized Brain Functional Connectivity of MDD Patients from Multiple Sites via Federated Bayesian Networks

Author: Liu, Shuai, Guo, Xiao, Qi, Shun, Wang, Huaning, and Chang, Xiangyu
Subjects: Computer Science - Machine Learning, Quantitative Biology - Neurons and Cognition
Abstract: Identifying functional connectivity biomarkers of major depressive disorder (MDD) patients is essential to advance understanding of the disorder mechanisms and early intervention. However, due to the small sample size and the high dimension of available neuroimaging data, the performance of existing methods is often limited. Multi-site data could enhance the statistical power and sample size, while they are often subject to inter-site heterogeneity and data-sharing policies. In this paper, we propose a federated joint estimator, NOTEARS-PFL, for simultaneous learning of multiple Bayesian networks (BNs) with continuous optimization, to identify disease-induced alterations in MDD patients. We incorporate information shared between sites and site-specific information into the proposed federated learning framework to learn personalized BN structures by introducing the group fused lasso penalty. We develop the alternating direction method of multipliers, where in the local update step, the neuroimaging data is processed at each local site. Then the learned network structures are transmitted to the center for the global update. In particular, we derive a closed-form expression for the local update step and use the iterative proximal projection method to deal with the group fused lasso penalty in the global update step. We evaluate the performance of the proposed method on both synthetic and real-world multi-site rs-fMRI datasets. The results suggest that the proposed NOTEARS-PFL yields superior effectiveness and accuracy than the comparable methods.
Published: 2023

23. Snap-Shot Decentralized Stochastic Gradient Tracking Methods

Author: Ye, Haishan and Chang, Xiangyu
Subjects: Mathematics - Optimization and Control
Abstract: In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (\texttt{SGD}) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking~(\texttt{DSGT})~\citep{pu2021distributed} has been recognized as the popular and state-of-the-art decentralized \texttt{SGD} method due to its proper theoretical guarantees. However, the theoretical analysis of \dsgt~\citep{koloskova2021improved} shows that its iteration complexity is $\tilde{\mathcal{O}} \left(\frac{\bar{\sigma}^2}{m\mu \varepsilon} + \frac{\sqrt{L}\bar{\sigma}}{\mu(1 - \lambda_2(W))^{1/2} C_W \sqrt{\varepsilon} }\right)$, where $W$ is a double stochastic mixing matrix that presents the network topology and $ C_W $ is a parameter that depends on $W$. Thus, it indicates that the convergence property of \texttt{DSGT} is heavily affected by the topology of the communication network. To overcome the weakness of \texttt{DSGT}, we resort to the snap-shot gradient tracking skill and propose two novel algorithms. We further justify that the proposed two algorithms are more robust to the topology of communication networks under similar algorithmic structures and the same communication strategy to \dsgt~. Compared with \dsgt, their iteration complexity are $\mathcal{O}\left( \frac{\bar{\sigma}^2}{m\mu\varepsilon} + \frac{\sqrt{L}\bar{\sigma}}{\mu (1 - \lambda_2(W))\sqrt{\varepsilon}} \right)$ and $\mathcal{O}\left( \frac{\bar{\sigma}^2}{m\mu \varepsilon} + \frac{\sqrt{L}\bar{\sigma}}{\mu (1 - \lambda_2(W))^{1/2}\sqrt{\varepsilon}} \right)$ which reduce the impact on network topology (no $C_W$).
Published: 2022

24. Variance reduced Shapley value estimation for trustworthy data valuation

Author: Wu, Mengmeng, Jia, Ruoxi, Lin, Changle, Huang, Wei, and Chang, Xiangyu
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications.
Published: 2022

25. On the efficacy of higher-order spectral clustering under weighted stochastic block models

Author: Guo, Xiao, Zhang, Hai, and Chang, Xiangyu
Subjects: Statistics - Methodology
Abstract: Higher-order structures of networks, namely, small subgraphs of networks (also called network motifs), are widely known to be crucial and essential to the organization of networks. There has been a few work studying the community detection problem -- a fundamental problem in network analysis, at the level of motifs. In particular, higher-order spectral clustering has been developed, where the notion of motif adjacency matrix is introduced as the input of the algorithm. However, it remains largely unknown that how higher-order spectral clustering works and when it performs better than its edge-based counterpart. To elucidate these problems, we investigate higher-order spectral clustering from a statistical perspective. In particular, we theoretically study the clustering performance of higher-order spectral clustering under a weighted stochastic block model and compare the resulting bounds with the corresponding results of edge-based spectral clustering. It turns out that when the network is dense with weak signal of weights, higher-order spectral clustering can really lead to the performance gain in clustering. We also use simulations and real data experiments to support the findings.
Published: 2022

26. Learning Multitask Gaussian Bayesian Networks

Author: Liu, Shuai, Qiu, Yixuan, Li, Baojuan, Wang, Huaning, and Chang, Xiangyu
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Major depressive disorder (MDD) requires study of brain functional connectivity alterations for patients, which can be uncovered by resting-state functional magnetic resonance imaging (rs-fMRI) data. We consider the problem of identifying alterations of brain functional connectivity for a single MDD patient. This is particularly difficult since the amount of data collected during an fMRI scan is too limited to provide sufficient information for individual analysis. Additionally, rs-fMRI data usually has the characteristics of incompleteness, sparsity, variability, high dimensionality and high noise. To address these problems, we proposed a multitask Gaussian Bayesian network (MTGBN) framework capable for identifying individual disease-induced alterations for MDD patients. We assume that such disease-induced alterations show some degrees of similarity with the tool to learn such network structures from observations to understanding of how system are structured jointly from related tasks. First, we treat each patient in a class of observation as a task and then learn the Gaussian Bayesian networks (GBNs) of this data class by learning from all tasks that share a default covariance matrix that encodes prior knowledge. This setting can help us to learn more information from limited data. Next, we derive a closed-form formula of the complete likelihood function and use the Monte-Carlo Expectation-Maximization(MCEM) algorithm to search for the approximately best Bayesian network structures efficiently. Finally, we assess the performance of our methods with simulated and real-world rs-fMRI data.
Published: 2022

27. Off-design performance optimization for steam-water dual heat source ORC systems

Author: Wang, Shiqi, Chang, Xiangyu, Yuan, Zhongyuan, Ooi, Kim Tiow, and Yu, Nanyang
Published: 2024
Full Text: View/download PDF

28. Spectral co-clustering in multi-layer directed networks

Author: Su, Wenqing, Guo, Xiao, Chang, Xiangyu, and Yang, Ying
Published: 2024
Full Text: View/download PDF

29. Toward a Fairness-Aware Scoring System for Algorithmic Decision-Making

Author: Yang, Yi, Wu, Ying, Li, Mei, Chang, Xiangyu, and Tan, Yong
Subjects: Computer Science - Machine Learning, Statistics - Methodology
Abstract: Scoring systems, as a type of predictive model, have significant advantages in interpretability and transparency and facilitate quick decision-making. As such, scoring systems have been extensively used in a wide variety of industries such as healthcare and criminal justice. However, the fairness issues in these models have long been criticized, and the use of big data and machine learning algorithms in the construction of scoring systems heightens this concern. In this paper, we propose a general framework to create fairness-aware, data-driven scoring systems. First, we develop a social welfare function that incorporates both efficiency and group fairness. Then, we transform the social welfare maximization problem into the risk minimization task in machine learning, and derive a fairness-aware scoring system with the help of mixed integer programming. Lastly, several theoretical bounds are derived for providing parameter selection suggestions. Our proposed framework provides a suitable solution to address group fairness concerns in the development of scoring systems. It enables policymakers to set and customize their desired fairness requirements as well as other application-specific constraints. We test the proposed algorithm with several empirical data sets. Experimental evidence supports the effectiveness of the proposed scoring system in achieving the optimal welfare of stakeholders and in balancing the needs for interpretability, fairness, and efficiency.
Published: 2021

30. Statistical Estimation and Inference via Local SGD in Federated Learning

Author: Li, Xiang, Liang, Jiadong, Chang, Xiangyu, and Zhang, Zhihua
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Federated Learning (FL) makes a large amount of edge computing devices (e.g., mobile phones) jointly learn a global model without data sharing. In FL, data are generated in a decentralized manner with high heterogeneity. This paper studies how to perform statistical estimation and inference in the federated setting. We analyze the so-called Local SGD, a multi-round estimation procedure that uses intermittent communication to improve communication efficiency. We first establish a {\it functional central limit theorem} that shows the averaged iterates of Local SGD weakly converge to a rescaled Brownian motion. We next provide two iterative inference methods: the {\it plug-in} and the {\it random scaling}. Random scaling constructs an asymptotically pivotal statistic for inference by using the information along the whole Local SGD path. Both the methods are communication efficient and applicable to online data. Our theoretical and empirical results show that Local SGD simultaneously achieves both statistical efficiency and communication efficiency.
Published: 2021

31. Explicit Superlinear Convergence Rates of The SR1 Algorithm

Author: Ye, Haishan, Lin, Dachao, Zhang, Zhihua, and Chang, Xiangyu
Subjects: Mathematics - Optimization and Control
Abstract: We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm which has wide applications in different scenarios. Although it has been extensively investigated, SR1 still lacks a non-asymptotic superlinear rate compared with other quasi-Newton methods such as DFP and BFGS. In this paper we address this problem. Inspired by the recent work on explicit convergence analysis of quasi-Newton methods, we obtain the first explicit non-asymptotic rates of superlinear convergence for the vanilla SR1 methods with correction strategy to achieve the numerical stability. Specifically, the vanilla SR1 with the correction strategy achieves the rates of the form $\left(\frac{4n\ln(e\kappa) }{k}\right)^{k/2}$ for general smooth strongly-convex functions where $k$ is the iteration counter, $\kappa$ is the condition number of the objective function and $n$ is the dimension of the problem. For the quadratic function, the vanilla SR1 algorithm can find the optima of the objective function at most $n$ steps.
Published: 2021

32. FedPower: Privacy-Preserving Distributed Eigenspace Estimation

Author: Guo, Xiao, Li, Xiang, Chang, Xiangyu, Wang, Shusen, and Zhang, Zhihua
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Eigenspace estimation is fundamental in machine learning and statistics, which has found applications in PCA, dimension reduction, and clustering, among others. The modern machine learning community usually assumes that data come from and belong to different organizations. The low communication power and the possible privacy breaches of data make the computation of eigenspace challenging. To address these challenges, we propose a class of algorithms called \textsf{FedPower} within the federated learning (FL) framework. \textsf{FedPower} leverages the well-known power method by alternating multiple local power iterations and a global aggregation step, thus improving communication efficiency. In the aggregation, we propose to weight each local eigenvector matrix with {\it Orthogonal Procrustes Transformation} (OPT) for better alignment. To ensure strong privacy protection, we add Gaussian noise in each iteration by adopting the notion of \emph{differential privacy} (DP). We provide convergence bounds for \textsf{FedPower} that are composed of different interpretable terms corresponding to the effects of Gaussian noise, parallelization, and random sampling of local machines. Additionally, we conduct experiments to demonstrate the effectiveness of our proposed algorithms.
Published: 2021

33. On the efficacy of higher-order spectral clustering under weighted stochastic block models

Author: Guo, Xiao, Zhang, Hai, and Chang, Xiangyu
Published: 2024
Full Text: View/download PDF

34. Towards explicit superlinear convergence rate for SR1

Author: Ye, Haishan, Lin, Dachao, Chang, Xiangyu, and Zhang, Zhihua
Published: 2023
Full Text: View/download PDF

35. Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Author: Chang, Xiangyu, Li, Yingcong, Oymak, Samet, and Thrampoulidis, Christos
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional asymptotics of model pruning in the overparameterized regime. The theory presented addresses the following core question: "should one train a small model from the beginning, or first train a large model and then prune?". We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning rather than simply training with the known informative features. This leads to a new double descent in the training of sparse models: growing the original model, while preserving the target sparsity, improves the test accuracy as one moves beyond the overparameterization threshold. Our analysis further reveals the benefit of retraining by relating it to feature correlations. We find that the above phenomena are already present in linear and random-features models. Our technical approach advances the toolset of high-dimensional analysis and precisely characterizes the asymptotic distribution of over-parameterized least-squares. The intuition gained by analytically studying simpler models is numerically verified on neural networks., Comment: to appear at AAAI 2021
Published: 2020

36. Kernel Interpolation of High Dimensional Scattered Data

Author: Lin, Shao-Bo, Chang, Xiangyu, and Sun, Xingping
Subjects: Mathematics - Numerical Analysis, Statistics - Machine Learning
Abstract: Data sites selected from modeling high-dimensional problems often appear scattered in non-paternalistic ways. Except for sporadic clustering at some spots, they become relatively far apart as the dimension of the ambient space grows. These features defy any theoretical treatment that requires local or global quasi-uniformity of distribution of data sites. Incorporating a recently-developed application of integral operator theory in machine learning, we propose and study in the current article a new framework to analyze kernel interpolation of high dimensional data, which features bounding stochastic approximation error by the spectrum of the underlying kernel matrix. Both theoretical analysis and numerical simulations show that spectra of kernel matrices are reliable and stable barometers for gauging the performance of kernel-interpolation methods for high dimensional data., Comment: 33 pages, 5 figures
Published: 2020

37. Randomized spectral co-clustering for large-scale directed networks

Author: Guo, Xiao, Qiu, Yixuan, Zhang, Hai, and Chang, Xiangyu
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Computer Science - Social and Information Networks, Statistics - Methodology
Abstract: Directed networks are broadly used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose great computational challenges to it. In this paper, we leverage sketching techniques and derive two randomized spectral co-clustering algorithms, one \emph{random-projection-based} and the other \emph{random-sampling-based}, to accelerate the co-clustering of large-scale directed networks. We theoretically analyze the resulting algorithms under two generative models -- the stochastic co-block model and the degree-corrected stochastic co-block model, and establish their approximation error rates and misclustering error rates, indicating better bounds than the state-of-the-art results of co-clustering literature. Numerically, we design and conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to millions of nodes. A publicly available R package \textsf{RandClust} is developed for better usability and reproducibility of the proposed methods.
Published: 2020

38. Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing

Author: Wang, Yining, Chen, Xi, Chang, Xiangyu, and Ge, Dongdong
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Data-driven sequential decision has found a wide range of applications in modern operations management, such as dynamic pricing, inventory control, and assortment optimization. Most existing research on data-driven sequential decision focuses on designing an online policy to maximize the revenue. However, the research on uncertainty quantification on the underlying true model function (e.g., demand function), a critical problem for practitioners, has not been well explored. In this paper, using the problem of demand function prediction in dynamic pricing as the motivating example, we study the problem of constructing accurate confidence intervals for the demand function. The main challenge is that sequentially collected data leads to significant distributional bias in the maximum likelihood estimator or the empirical risk minimization estimate, making classical statistics approaches such as the Wald's test no longer valid. We address this challenge by developing a debiased approach and provide the asymptotic normality guarantee of the debiased estimator. Based this the debiased estimator, we provide both point-wise and uniform confidence intervals of the demand function.
Published: 2020

39. Angle-Based Cost-Sensitive Multicategory Classification

Author: Yang, Yi, Guo, Yuxuan, and Chang, Xiangyu
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to construct K classification functions for a K-class problem and remove the redundancy by imposing a sum-to-zero constraint. However, such method usually results in higher computational complexity and inefficient algorithms. In this paper, we propose a novel angle-based cost-sensitive classification framework for multicategory classification without the sum-to-zero constraint. Loss functions that included in the angle-based cost-sensitive classification framework are further justified to be Fisher consistent. To show the usefulness of the framework, two cost-sensitive multicategory boosting algorithms are derived as concrete instances. Numerical experiments demonstrate that proposed boosting algorithms yield competitive classification performances against other existing boosting approaches.
Published: 2020

40. Multi-step reward ensemble methods for adaptive stock trading

Author: Zeng, Zhiyi, Ma, Cong, and Chang, Xiangyu
Published: 2023
Full Text: View/download PDF

41. Variance reduced Shapley value estimation for trustworthy data valuation

Author: Wu, Mengmeng, Jia, Ruoxi, Lin, Changle, Huang, Wei, and Chang, Xiangyu
Published: 2023
Full Text: View/download PDF

42. Back analysis of rock mass parameters in tunnel engineering using machine learning techniques

Author: Chang, Xiangyu, Wang, Hao, and Zhang, Yiming
Published: 2023
Full Text: View/download PDF

43. Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Author: Zhang, Hai, Guo, Xiao, and Chang, Xiangyu
Subjects: Computer Science - Social and Information Networks, Computer Science - Machine Learning, Statistics - Methodology, Statistics - Machine Learning
Abstract: Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is not necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public.
Published: 2020

44. Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

Author: Chang, Xiangyu and Lin, Shao-Bo
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, we propose an adaptive stopping rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stopping strategy. We analyze the performance of the adaptive stopping rule in the framework of learning theory. Using the recently developed integral operator approach, we rigorously prove the optimality of the adaptive stopping rule in terms of showing the optimal learning rates for KGD equipped with this rule. Furthermore, a sharp bound on the number of iterations in KGD equipped with the proposed early stopping rule is also given to demonstrate its computational advantage., Comment: There is a critical wrong in the proof
Published: 2020

45. Cobalt nanoparticles induce mitochondrial damage and β-amyloid toxicity via the generation of reactive oxygen species

Author: Chen, Jingrong, Chen, Cheng, Wang, Na, Wang, Chunyu, Gong, Zhaohui, Du, Jingxian, Lai, Honglin, Lin, Xinpei, Wang, Wei, Chang, Xiangyu, Aschner, Michael, Guo, Zhenkun, Wu, Siying, Li, Huangyuan, and Zheng, Fuli
Published: 2023
Full Text: View/download PDF

46. Predicting Depression Severity by Multi-Modal Feature Engineering and Fusion

Author: Samareh, Aven, Jin, Yan, Wang, Zhangyang, Chang, Xiangyu, and Huang, Shuai
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present our preliminary work to determine if patient's vocal acoustic, linguistic, and facial patterns could predict clinical ratings of depression severity, namely Patient Health Questionnaire depression scale (PHQ-8). We proposed a multi modal fusion model that combines three different modalities: audio, video , and text features. By training over AVEC 2017 data set, our proposed model outperforms each single modality prediction model, and surpasses the data set baseline with ice margin., Comment: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)
Published: 2017

47. Learning rates for classification with Gaussian kernels

Author: Lin, Shao-Bo, Zeng, Jinshan, and Chang, Xiangyu
Subjects: Computer Science - Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: This paper aims at refined error analysis for binary classification using support vector machine (SVM) with Gaussian kernel and convex loss. Our first result shows that for some loss functions such as the truncated quadratic loss and quadratic loss, SVM with Gaussian kernel can reach the almost optimal learning rate, provided the regression function is smooth. Our second result shows that, for a large number of loss functions, under some Tsybakov noise assumption, if the regression function is infinitely smooth, then SVM with Gaussian kernel can achieve the learning rate of order $m^{-1}$, where $m$ is the number of samples., Comment: This paper has been accepted by Neural Computation
Published: 2017

48. Buyer-supplier collaboration: A macro, micro, and congruence perspective

Author: Li, Mei, Falcone, Ellie, Sanders, Nada, Choi, Thomas Y., and Chang, Xiangyu
Published: 2022
Full Text: View/download PDF

49. Bayesian prediction of tunnel convergence combining empirical model and relevance vector machine

Author: Chang, Xiangyu, Wang, Hao, Zhang, Yiming, Wang, Feiqiu, and Li, Zhaozhong
Published: 2022
Full Text: View/download PDF

50. Asymmetric cryptosystem based on optical scanning cryptography and elliptic curve algorithm

Author: Chang, Xiangyu, Li, Wei, Yan, Aimin, Tsang, Peter Wai Ming, and Poon, Ting-Chung
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

303 results on '"Chang, Xiangyu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources