Author: "Rangwala, Huzefa" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Rangwala, Huzefa"' showing total 531 results

Start Over Author "Rangwala, Huzefa"

531 results on '"Rangwala, Huzefa"'

1. Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer

Author: Pengmei, Zihan, Shen, Zhengyuan, Wang, Zichen, Collins, Marcus, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Physics - Chemical Physics
Abstract: Constructing transferable descriptors for conformation representation of molecular and biological systems finds numerous applications in drug discovery, learning-based molecular dynamics, and protein mechanism analysis. Geometric graph neural networks (Geom-GNNs) with all-atom information have transformed atomistic simulations by serving as a general learnable geometric descriptors for downstream tasks including prediction of interatomic potential and molecular properties. However, common practices involve supervising Geom-GNNs on specific downstream tasks, which suffer from the lack of high-quality data and inaccurate labels leading to poor generalization and performance degradation on out-of-distribution (OOD) scenarios. In this work, we explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization. To explore their representation power, we studied the scaling behaviors of Geom-GNNs under self-supervised pre-training, supervised and unsupervised learning setups. We find that the expressive power of different architectures can differ on the pre-training task. Interestingly, Geom-GNNs do not follow the power-law scaling on the pre-training task, and universally lack predictable scaling behavior on the supervised tasks with quantum chemical labels important for screening and design of novel molecules. More importantly, we demonstrate how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power. Meanwhile, the low-dimensional projection of the latent space shows excellent agreement with conventional geometrical descriptors., Comment: 15 pages, 4 figures, supporting information appended
Published: 2024

2. DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification

Author: Han, Xiaoxue, Rangwala, Huzefa, and Ning, Yue
Subjects: Computer Science - Machine Learning
Abstract: Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. There is a pressing need to enhance the generalizability of GNNs on out-of-distribution (OOD) test data. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process, which do not adequately reflect the actual dynamics of distribution shifts in graphs. In this paper, we introduce a more realistic graph data generation model using Structural Causal Models (SCMs), allowing us to redefine distribution shifts by pinpointing their origins within the generation process. Building on this, we propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings. We provide a detailed theoretical framework that shows how our approach can effectively mitigate the impact of various distribution shifts. We evaluate DeCaf across both real-world and synthetic datasets that demonstrate different patterns of shifts, confirming its efficacy in enhancing the generalizability of GNNs.
Published: 2024

3. Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

Author: Cen, Zhepeng, Liu, Yao, Zeng, Siliang, Chaudhar, Pratik, Rangwala, Huzefa, Karypis, George, and Fakoor, Rasool
Subjects: Computer Science - Machine Learning
Abstract: Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset. However, during inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one. Marginal differences in predictions at each step can cascade over successive steps, resulting in different distributions from what the models were trained for and potentially leading to unpredictable behavior. This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time. Our first approach is Batch-Scheduled Sampling, where, during training, we stochastically choose between the ground-truth token from the dataset and the model's own generated token as input to predict the next token. This is done in an offline manner, modifying the context window by interleaving ground-truth tokens with those generated by the model. Our second approach is Reference-Answer-based Correction, where we explicitly incorporate a self-correction capability into the model during training. This enables the model to effectively self-correct the gaps between the generated sequences and the ground truth data without relying on an external oracle model. By incorporating our proposed strategies during training, we have observed an overall improvement in performance compared to baseline methods, as demonstrated by our extensive experiments using summarization, general question-answering, and math question-answering tasks.
Published: 2024

4. AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Author: Yang, Ke, Liu, Yao, Chaudhary, Sapana, Fakoor, Rasool, Chaudhari, Pratik, Karypis, George, and Rangwala, Huzefa
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Autonomy via agents using large language models (LLMs) for personalized, standardized tasks boosts human efficiency. Automating web tasks (like booking hotels within a budget) is increasingly sought after. Fulfilling practical needs, the web agent also serves as an important proof-of-concept example for various agent grounding scenarios, with its success promising advancements in many future applications. Prior research often handcrafts web agent strategies (e.g., prompting templates, multi-agent systems, search methods, etc.) and the corresponding in-context examples, which may not generalize well across all real-world scenarios. On the other hand, there has been limited study on the misalignment between a web agent's observation/action representation and the pre-training data of the LLM it's based on. This discrepancy is especially notable when LLMs are primarily trained for language completion rather than tasks involving embodied navigation actions and symbolic web elements. Our study enhances an LLM-based web agent by simply refining its observation and action space to better align with the LLM's capabilities. This approach enables our base agent to significantly outperform previous methods on a wide variety of web tasks. Specifically, on WebArena, a benchmark featuring general-purpose web interaction tasks, our agent AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively, and boosts the success rate by 26.6 points (+161%) over similar plain web agents with its observation and action space alignment. We achieve this without using in-context examples, new agent roles, online feedback or search strategies. AgentOccam's simple design highlights LLMs' impressive zero-shot performance on web tasks, and underlines the critical role of carefully tuning observation and action spaces for LLM-based agents.
Published: 2024

5. FeatNavigator: Automatic Feature Augmentation on Tabular Data

Author: Liang, Jiaming, Lei, Chuan, Qin, Xiao, Zhang, Jiani, Katsifodimos, Asterios, Faloutsos, Christos, and Rangwala, Huzefa
Subjects: Computer Science - Databases, Computer Science - Machine Learning
Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality features in relational tables for ML models. FeatNavigator evaluates a feature from two aspects: (1) the intrinsic value of a feature towards an ML task (i.e., feature importance) and (2) the efficacy of a join path connecting the feature to the base table (i.e., integration quality). FeatNavigator strategically selects a small set of available features and their corresponding join paths to train a feature importance estimation model and an integration quality prediction model. Furthermore, FeatNavigator's search algorithm exploits both estimated feature importance and integration quality to identify the optimized feature augmentation plan. Our experimental results show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance., Comment: 15 pages, 41 figures
Published: 2024

6. GraphStorm: all-in-one graph machine learning framework for industry applications

Author: Zheng, Da, Song, Xiang, Zhu, Qi, Zhang, Jian, Vasiloudis, Theodore, Ma, Runjie, Zhang, Houyu, Wang, Zichen, Adeshina, Soji, Nisa, Israt, Mottini, Alejandro, Cui, Qingjun, Rangwala, Huzefa, Zeng, Belinda, Faloutsos, Christos, and Karypis, George
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm.
Published: 2024

7. Learning Analytics Tools in Higher Education: Adoption at the Intersection of Institutional Commitment and Individual Action

Author: Klein, Carrie, Lester, Jaime, Rangwala, Huzefa, and Johri, Aditya
Published: 2018
Full Text: View/download PDF

8. DispaRisk: Auditing Fairness Through Usable Information

Author: Vasquez, Jonathan, Domeniconi, Carlotta, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning
Abstract: Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it against commonly used datasets in fairness research. Our findings demonstrate DispaRisk's capabilities to identify datasets with a high risk of discrimination, detect model families prone to biases within an ML pipeline, and enhance the explainability of these bias risks. This work contributes to the development of fairer ML systems by providing a robust tool for early bias detection and mitigation. The code for our experiments is available in the following repository: https://github.com/jovasque156/disparisk
Published: 2024

9. OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Author: Kong, Kezhi, Zhang, Jiani, Shen, Zhengyuan, Srinivasan, Balasubramaniam, Lei, Chuan, Faloutsos, Christos, Rangwala, Huzefa, and Karypis, George
Subjects: Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system., Comment: Accepted by ICLR 2024
Published: 2024

10. Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Author: Mavromatis, Costas, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Zhang, Jiani, Rangwala, Huzefa, Faloutsos, Christos, and Karypis, George
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples.
Published: 2023

11. NameGuess: Column Name Expansion for Tabular Data

Author: Zhang, Jiani, Shen, Zhengyuan, Srinivasan, Balasubramaniam, Wang, Shen, Rangwala, Huzefa, and Karypis, George
Subjects: Computer Science - Computation and Language, Computer Science - Databases, Computer Science - Machine Learning
Abstract: Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand column names (used in database schema) as a natural language generation problem. We create a training dataset of 384K abbreviated-expanded column pairs using a new data fabrication method and a human-annotated evaluation benchmark that includes 9.2K examples from real-world tables. To tackle the complexities associated with polysemy and ambiguity in NameGuess, we enhance auto-regressive language models by conditioning on table content and column header names -- yielding a fine-tuned model (with 2.7B parameters) that matches human performance. Furthermore, we conduct a comprehensive analysis (on multiple LLMs) to validate the effectiveness of table content in NameGuess and identify promising future opportunities. Code has been made available at https://github.com/amazon-science/nameguess., Comment: This work has been accepted to EMNLP'23
Published: 2023

12. Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Author: Zhang, Hengrui, Zhang, Jiani, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Qin, Xiao, Faloutsos, Christos, Rangwala, Huzefa, and Karypis, George
Subjects: Computer Science - Machine Learning
Abstract: Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed Tabsyn include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that Tabsyn outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines., Comment: Accepted by ICLR 2024 (Oral Presentation). Code is available at: https://github.com/amazon-science/tabsyn
Published: 2023

13. BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

Author: Wang, Zifeng, Wang, Zichen, Srinivasan, Balasubramaniam, Ioannidis, Vassilis N., Rangwala, Huzefa, and Anubhai, Rishita
Subjects: Computer Science - Machine Learning
Abstract: Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs., Comment: ICLR 2024
Published: 2023

14. GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Author: Acharya, Angeela, Sikdar, Siddhartha, Das, Sanmay, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computational Engineering, Finance, and Science
Abstract: Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to aggregated data (macro data) sources. In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data by combining information from multiple easier-to-obtain lower-resolution data sources. In particular, we introduce a framework that uses a combination of univariate and multivariate frequency tables from a given target geographical location in combination with frequency tables from other auxiliary locations to generate synthetic microdata for individuals in the target location. Our method combines the estimation of a dependency graph and conditional probabilities from the target location with the use of a Gaussian copula to leverage the available information from the auxiliary locations. We perform extensive testing on two real-world datasets and demonstrate that our approach outperforms prior approaches in preserving the overall dependency structure of the data while also satisfying the constraints defined on the different variables., Comment: 10 pages, 6 figures, Accepted for the 2022 IEEE International Conference on Big Data
Published: 2022

15. Training self-supervised peptide sequence models on artificially chopped proteins

Author: Sadeh, Gil, Wang, Zichen, Grewal, Jasleen, Rangwala, Huzefa, and Price, Layne
Subjects: Quantitative Biology - Quantitative Methods, Computer Science - Machine Learning, Quantitative Biology - Biomolecules
Abstract: Representation learning for proteins has primarily focused on the global understanding of protein sequences regardless of their length. However, shorter proteins (known as peptides) take on distinct structures and functions compared to their longer counterparts. Unfortunately, there are not as many naturally occurring peptides available to be sequenced and therefore less peptide-specific data to train with. In this paper, we propose a new peptide data augmentation scheme, where we train peptide language models on artificially constructed peptides that are small contiguous subsets of longer, wild-type proteins; we refer to the training peptides as "chopped proteins". We evaluate the representation potential of models trained with chopped proteins versus natural peptides and find that training language models with chopped proteins results in more generalized embeddings for short protein sequences. These peptide-specific models also retain information about the original protein they were derived from better than language models trained on full-length proteins. We compare masked language model training objectives to three novel peptide-specific training objectives: next-peptide prediction, contrastive peptide selection and evolution-weighted MLM. We demonstrate improved zero-shot learning performance for a deep mutational scan peptides benchmark.
Published: 2022

16. Ex-Ante Assessment of Discrimination in Dataset

Author: Vasquez, Jonathan, Gitiaux, Xavier, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: Data owners face increasing liability for how the use of their data could harm under-priviliged communities. Stakeholders would like to identify the characteristics of data that lead to algorithms being biased against any particular demographic groups, for example, defined by their race, gender, age, and/or religion. Specifically, we are interested in identifying subsets of the feature space where the ground truth response function from features to observed outcomes differs across demographic groups. To this end, we propose FORESEE, a FORESt of decision trEEs algorithm, which generates a score that captures how likely an individual's response varies with sensitive attributes. Empirically, we find that our approach allows us to identify the individuals who are most likely to be misclassified by several classifiers, including Random Forest, Logistic Regression, Support Vector Machine, and k-Nearest Neighbors. The advantage of our approach is that it allows stakeholders to characterize risky samples that may contribute to discrimination, as well as, use the FORESEE to estimate the risk of upcoming samples.
Published: 2022

17. SoFaiR: Single Shot Fair Representation Learning

Author: Gitiaux, Xavier and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate that fairness-information trade-offs are fully characterized by rate-distortion trade-offs. Then, we use this key result and propose SoFaiR, a single shot fair representation learning method that generates with one trained model many points on the fairness-information plane. Besides its computational saving, our single-shot approach is, to the extent of our knowledge, the first fair representation learning method that explains what information is affected by changes in the fairness / distortion properties of the representation. Empirically, we find on three datasets that SoFaiR achieves similar fairness-information trade-offs as its multi-shot counterparts.
Published: 2022

18. Improving Zero-Shot Event Extraction via Sentence Simplification

Author: Mehta, Sneha, Rangwala, Huzefa, and Ramakrishnan, Naren
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The success of sites such as ACLED and Our World in Data have demonstrated the massive utility of extracting events in structured formats from large volumes of textual data in the form of news, social media, blogs and discussion forums. Event extraction can provide a window into ongoing geopolitical crises and yield actionable intelligence. With the proliferation of large pretrained language models, Machine Reading Comprehension (MRC) has emerged as a new paradigm for event extraction in recent times. In this approach, event argument extraction is framed as an extractive question-answering task. One of the key advantages of the MRC-based approach is its ability to perform zero-shot extraction. However, the problem of long-range dependencies, i.e., large lexical distance between trigger and argument words and the difficulty of processing syntactically complex sentences plague MRC-based approaches. In this paper, we present a general approach to improve the performance of MRC-based event extraction by performing unsupervised sentence simplification guided by the MRC model itself. We evaluate our approach on the ICEWS geopolitical event extraction dataset, with specific attention to `Actor' and `Target' argument roles. We show how such context simplification can improve the performance of MRC-based event extraction by more than 5% for actor extraction and more than 10% for target extraction.
Published: 2022

19. Trade-offs between Group Fairness Metrics in Societal Resource Allocation

Author: Mashiat, Tasfia, Gitiaux, Xavier, Rangwala, Huzefa, Fowler, Patrick J., and Das, Sanmay
Subjects: Computer Science - Computers and Society, Computer Science - Computer Science and Game Theory
Abstract: We consider social resource allocations that deliver an array of scarce supports to a diverse population. Such allocations pervade social service delivery, such as provision of homeless services, assignment of refugees to cities, among others. At issue is whether allocations are fair across sociodemographic groups and intersectional identities. Our paper shows that necessary trade-offs exist for fairness in the context of scarcity; many reasonable definitions of equitable outcomes cannot hold simultaneously except under stringent conditions. For example, defining fairness in terms of improvement over a baseline inherently conflicts with defining fairness in terms of loss compared with the best possible outcome. Moreover, we demonstrate that the fairness trade-offs stem from heterogeneity across groups in intervention responses. Administrative records on homeless service delivery offer a real-world example. Building on prior work, we measure utilities for each household as the probability of reentry into homeless services if given three homeless services. Heterogeneity in utility distributions (conditional on received services) for several sociodemographic groups (e.g. single women with children versus without children) generates divergence across fairness metrics. We argue that such heterogeneity, and thus, fairness trade-offs pervade many social policy contexts.
Published: 2022

20. Causal Knowledge Guided Societal Event Forecasting

Author: Deng, Songgaojun, Rangwala, Huzefa, and Ning, Yue
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, 68T07
Abstract: Data-driven societal event forecasting methods exploit relevant historical information to predict future events. These methods rely on historical labeled data and cannot accurately predict events when data are limited or of poor quality. Studying causal effects between events goes beyond correlation analysis and can contribute to a more robust prediction of events. However, incorporating causality analysis in data-driven event forecasting is challenging due to several factors: (i) Events occur in a complex and dynamic social environment. Many unobserved variables, i.e., hidden confounders, affect both potential causes and outcomes. (ii) Given spatiotemporal non-independent and identically distributed (non-IID) data, modeling hidden confounders for accurate causal effect estimation is not trivial. In this work, we introduce a deep learning framework that integrates causal effect estimation into event forecasting. We first study the problem of Individual Treatment Effect (ITE) estimation from observational event data with spatiotemporal attributes and present a novel causal inference model to estimate ITEs. We then incorporate the learned event-related causal information into event prediction as prior knowledge. Two robust learning modules, including a feature reweighting module and an approximate constraint loss, are introduced to enable prior knowledge injection. We evaluate the proposed causal inference model on real-world event datasets and validate the effectiveness of proposed robust learning modules in event prediction by feeding learned causal information into different deep learning methods. Experimental results demonstrate the strengths of the proposed causal inference model for ITE estimation in societal events and showcase the beneficial properties of robust learning modules in societal event forecasting., Comment: 17 pages, 19 figures
Published: 2021

21. Using Online Textbook and In-Class Poll Data to Predict In-Class Performance

Author: Hunt-Isaak, Noah, Cherniavsky, Peter, Snyder, Mark, and Rangwala, Huzefa
Abstract: National failure rates seen in undergraduate introductory CS courses are quite high. In this paper, we develop a predictive model for student in-class performance in an introductory CS course. The model can serve as an early warning system, flagging struggling students who might benefit from additional support. We use a variety of features from the first few weeks of the course such as scores on assignments, interaction with the online textbook, and participation with the in-class polling system in order to train our models. We compare the performance of a number of machine learning algorithms on predicting final exam scores as well as final course grade. We find that the Support Vector Machine and AdaBoost are the most effective, and that we can achieve increasingly accurate predictions as we use data from further into the course. The regression coefficients give us insights into which features are most correlated with student success, suggesting that certain types of assignments are more indicative of learning than others. [For the full proceedings, see ED607784.]
Published: 2020

22. Towards Fair Educational Data Mining: A Case Study on Detecting At-Risk Students

Author: Hu, Qian and Rangwala, Huzefa
Abstract: Over the past decade, machine learning has become an integral part of educational technologies. With more and more applications such as students' performance prediction, course recommendation, dropout prediction and knowledge tracing relying upon machine learning models, there is increasing evidence and concerns about bias and unfairness of these models. Unfair models can lead to inequitable outcomes for some groups of students and negatively impact their learning. We show by real-world examples that educational data has embedded bias that leads to biased student modeling, which urges the development of fairness formalizations and fair algorithms for educational applications. Several formalizations of fairness have been proposed that can be classified into two types: (i) group fairness and (ii) individual fairness. Group fairness guarantees that groups are treated fairly as a whole, which might not be fair to some individuals. Thus individual fairness has been proposed to make sure fairness is achieved on individual level. In this work, we focus on developing an individually fair model for identifying students at-risk of underperforming. We propose a model which is based on the idea that the prediction for a student (identifying at-risk students) should not be influenced by his/her sensitive attributes. The proposed model is shown to effectively remove bias from these predictions and hence, making them useful in aiding all students. [For the full proceedings, see ED607784.]
Published: 2020

23. Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

Author: Krishnan, Jitin, Anastasopoulos, Antonios, Purohit, Hemant, and Rangwala, Huzefa
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks. In this work, we combine data augmentation approaches with a Teacher-Student training scheme to address this issue in a cross-lingual transfer setting for fine-tuning state-of-the-art pre-trained multilingual language models such as mBERT and XLM-R. We evaluate our method on transliterated Hindi and Malayalam, also introducing new datasets for benchmarking on real-world scenarios: one on sentiment classification in transliterated Malayalam, and another on crisis tweet classification in transliterated Hindi and Malayalam (related to the 2013 North India and 2018 Kerala floods). Our method yielded an average improvement of +5.6% on mBERT and +4.7% on XLM-R in F1 scores over their strong baselines., Comment: 12 pages, 5 tables, 7 Figures
Published: 2021

24. Asynchronous Federated Learning for Sensor Data with Concept Drift

Author: Chen, Yujing, Chai, Zheng, Cheng, Yue, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fixed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and different system configurations. In addition, the underlying distribution of the device data can change dynamically over time, which is known as concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing and upcoming data. Traditional concept drift handling techniques such as chunk based and ensemble learning-based methods are not suitable in the federated learning frameworks due to the heterogeneity of local devices. We propose a novel approach, FedConD, to detect and deal with the concept drift on local devices and minimize the effect on the performance of models in asynchronous FL. The drift detection strategy is based on an adaptive mechanism which uses the historical performance of the local models. The drift adaptation is realized by adjusting the regularization parameter of objective function on each local device. Additionally, we design a communication strategy on the server side to select local updates in a prudent fashion and speed up model convergence. Experimental evaluations on three evolving data streams and two image datasets show that \model~detects and handles concept drift, and also reduces the overall communication cost compared to other baseline methods.
Published: 2021

25. Fair Representations by Compression

Author: Gitiaux, Xavier and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computers and Society
Abstract: Organizations that collect and sell data face increasing scrutiny for the discriminatory use of data. We propose a novel unsupervised approach to transform data into a compressed binary representation independent of sensitive attributes. We show that in an information bottleneck framework, a parsimonious representation should filter out information related to sensitive attributes if they are provided directly to the decoder. Empirical results show that the proposed method, \textbf{FBC}, achieves state-of-the-art accuracy-fairness trade-off. Explicit control of the entropy of the representation bit stream allows the user to move smoothly and simultaneously along both rate-distortion and rate-fairness curves. \end{abstract}
Published: 2021

26. Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling

Author: Krishnan, Jitin, Anastasopoulos, Antonios, Purohit, Hemant, and Rangwala, Huzefa
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Predicting user intent and detecting the corresponding slots from text are two key problems in Natural Language Understanding (NLU). In the context of zero-shot learning, this task is typically approached by either using representations from pre-trained multilingual transformers such as mBERT, or by machine translating the source data into the known target language and then fine-tuning. Our work focuses on a particular scenario where the target language is unknown during training. To this goal, we propose a novel method to augment the monolingual source data using multilingual code-switching via random translations to enhance a transformer's language neutrality when fine-tuning it for a downstream task. This method also helps discover novel insights on how code-switching with different language families around the world impact the performance on the target language. Experiments on the benchmark dataset of MultiATIS++ yielded an average improvement of +4.2% in accuracy for intent task and +1.8% in F1 for slot task using our method over the state-of-the-art across 8 different languages. Furthermore, we present an application of our method for crisis informatics using a new human-annotated tweet dataset of slot filling in English and Haitian Creole, collected during Haiti earthquake disaster.
Published: 2021

27. Estimating the Risk of Individual Discrimination of Classifiers

Author: Vasquez, Jonathan, Gitiaux, Xavier, Rangwala, Huzefa, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Kashima, Hisashi, editor, Ide, Tsuyoshi, editor, and Peng, Wen-Chih, editor
Published: 2023
Full Text: View/download PDF

28. Grade Prediction Based on Cumulative Knowledge and Co-Taken Courses

Author: Ren, Zhiyun, Ning, Xia, Lan, Andrew S., and Rangwala, Huzefa
Abstract: Over the past decade, low graduation and retention rates have plagued higher education institutions. To help students graduate on time and achieve optimal learning outcomes, many institutions provide advising services supported by educational technologies. Accurate grade prediction is an integral part of these services such as degree planning software, personalized advising systems and early warning systems that can identify students at-risk of dropping from their field of study. In this work, we present next-term grade prediction models based on students' cumulative knowledge and co-taken courses. The proposed models are based on a matrix factorization framework and incorporate a co-taken course interaction function to learn the influence from the co-taken courses on the target course. The co-taken course interaction function is formed by a neural network, which takes the knowledge difference between the co-taken courses and the target course as input, and outputs an influence value that will be used to predict students' grades on the target course. The experimental results on various datasets from a U.S. University demonstrate that the proposed models significantly outperform competitive baselines across different test sets. Furthermore, we analyze the proposed models' performance with different numbers of co-taken courses as well as different numbers of co-taken course subjects, and highlight with an application case study how a student might make decisions related to selection of courses. The codes are available at https://github.com/Zhiyun0411/EDM. [For the full proceedings, see ED599096.]
Published: 2019

29. Academic Performance Estimation with Attention-based Graph Convolutional Networks

Author: Hu, Qian and Rangwala, Huzefa
Abstract: Student's academic performance prediction empowers educational technologies including academic trajectory and degree planning, course recommender systems, early warning and advising systems. Given a student's past data (such as grades in prior courses), the task of student's performance prediction is to predict a student's grades in future courses. Academic programs are structured in a way that prior courses lay the foundation for future courses. The knowledge required by courses is obtained by taking multiple prior courses, which exhibits complex relationships modeled by graph structures. Traditional methods for student's performance prediction usually neglect the underlying relationships between multiple courses; and how students acquire knowledge across them. In addition, traditional methods do not provide interpretation for predictions needed for decision making. In this work, we propose a novel attentionbased graph convolutional networks model for student's performance prediction. We conduct extensive experiments on a real-world dataset obtained from a large public university. The experimental results show that our proposed model outperforms state-of-the-art approaches in terms of grade prediction. The proposed model also shows strong accuracy in identifying students who are at-risk of failing or dropping out so that timely intervention and feedback can be provided to the student. [For the full proceedings, see ED599096.]
Published: 2019

30. Metric-Free Individual Fairness with Cooperative Contextual Bandits

Author: Hu, Qian and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning
Abstract: Data mining algorithms are increasingly used in automated decision making across all walks of daily life. Unfortunately, as reported in several studies these algorithms inject bias from data and environment leading to inequitable and unfair solutions. To mitigate bias in machine learning, different formalizations of fairness have been proposed that can be categorized into group fairness and individual fairness. Group fairness requires that different groups should be treated similarly which might be unfair to some individuals within a group. On the other hand, individual fairness requires that similar individuals be treated similarly. However, individual fairness remains understudied due to its reliance on problem-specific similarity metrics. We propose a metric-free individual fairness and a cooperative contextual bandits (CCB) algorithm. The CCB algorithm utilizes fairness as a reward and attempts to maximize it. The advantage of treating fairness as a reward is that the fairness criterion does not need to be differentiable. The proposed algorithm is tested on multiple real-world benchmark datasets. The results show the effectiveness of the proposed algorithm at mitigating bias and at achieving both individual and group fairness.
Published: 2020

31. FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

Author: Chai, Zheng, Chen, Yujing, Anwar, Ali, Zhao, Liang, Cheng, Yue, and Rangwala, Huzefa
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Computer Science - Networking and Internet Architecture
Abstract: Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem, where the clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck, where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based synchronous mechanisms to tackle the straggler problem. However, the asynchronous methods can easily create a network communication bottleneck, while tiering may introduce biases as tiering favors faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning method with Asynchronous Tiers under Non-i.i.d. data. FedAT synergistically combines synchronous intra-tier training and asynchronous cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training for further accuracy improvement. FedAT compresses the uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, therefore minimizing the communication cost. Results show that FedAT improves the prediction performance by up to 21.09%, and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods., Comment: SC21
Published: 2020

32. Learning Smooth and Fair Representations

Author: Gitiaux, Xavier and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Organizations that own data face increasing legal liability for its discriminatory use against protected demographic groups, extending to contractual transactions involving third parties access and use of the data. This is problematic, since the original data owner cannot ex-ante anticipate all its future uses by downstream users. This paper explores the upstream ability to preemptively remove the correlations between features and sensitive attributes by mapping features to a fair representation space. Our main result shows that the fairness measured by the demographic parity of the representation distribution can be certified from a finite sample if and only if the chi-squared mutual information between features and representations is finite. Empirically, we find that smoothing the representation distribution provides generalization guarantees of fairness certificates, which improves upon existing fair representation learning approaches. Moreover, we do not observe that smoothing the representation distribution degrades the accuracy of downstream tasks compared to state-of-the-art methods in fair representation learning.
Published: 2020

33. Common-Knowledge Concept Recognition for SEVA

Author: Krishnan, Jitin, Coronado, Patrick, Purohit, Hemant, and Rangwala, Huzefa
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We build a common-knowledge concept recognition system for a Systems Engineer's Virtual Assistant (SEVA) which can be used for downstream tasks such as relation extraction, knowledge graph construction, and question-answering. The problem is formulated as a token classification task similar to named entity extraction. With the help of a domain expert and text processing methods, we construct a dataset annotated at the word-level by carefully defining a labelling scheme to train a sequence model to recognize systems engineering concepts. We use a pre-trained language model and fine-tune it with the labeled dataset of concepts. In addition, we also create some essential datasets for information such as abbreviations and definitions from the systems engineering domain. Finally, we construct a simple knowledge graph using these extracted concepts along with some hyponym relations., Comment: Source code available
Published: 2020

34. Generative Multi-Stream Architecture For American Sign Language Recognition

Author: Huh, Dom, Gurrapu, Sai, Olson, Frederick, Rangwala, Huzefa, Pathak, Parth, and Kosecka, Jana
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data at the cost of supplementary hardware, which are fed in streams to counteract this limitation and boost performance. We propose a generative multi-stream architecture, eliminating the need for additional hardware with the intent to improve feature richness without risking impracticability. We also introduce the compact spatio-temporal residual block to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs comparatively to the top C3D residual variant architecture, the pseudo-3D model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
Published: 2020

35. FineHand: Learning Hand Shapes for American Sign Language Recognition

Author: Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Rangwala, Huzefa, and Kosecka, Jana
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: American Sign Language recognition is a difficult gesture recognition problem, characterized by fast, highly articulate gestures. These are comprised of arm movements with different hand shapes, facial expression and head movements. Among these components, hand shape is the vital, often the most discriminative part of a gesture. In this work, we present an approach for effective learning of hand shape embeddings, which are discriminative for ASL gestures. For hand shape recognition our method uses a mix of manually labelled hand shapes and high confidence predictions to train deep convolutional neural network (CNN). The sequential gesture component is captured by recursive neural network (RNN) trained on the embeddings learned in the first stage. We will demonstrate that higher quality hand shape models can significantly improve the accuracy of final video gesture classification in challenging conditions with variety of speakers, different illumination and significant motion blurr. We compare our model to alternative approaches exploiting different modalities and representations of the data and show improved video gesture recognition accuracy on GMU-ASL51 benchmark dataset
Published: 2020

36. Unsupervised and Interpretable Domain Adaptation to Rapidly Filter Tweets for Emergency Services

Author: Krishnan, Jitin, Purohit, Hemant, and Rangwala, Huzefa
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Social and Information Networks, Statistics - Machine Learning
Abstract: During the onset of a disaster event, filtering relevant information from the social web data is challenging due to its sparse availability and practical limitations in labeling datasets of an ongoing crisis. In this paper, we hypothesize that unsupervised domain adaptation through multi-task learning can be a useful framework to leverage data from past crisis events for training efficient information filtering models during the sudden onset of a new crisis. We present a novel method to classify relevant tweets during an ongoing crisis without seeing any new examples, using the publicly available dataset of TREC incident streams. Specifically, we construct a customized multi-task architecture with a multi-domain discriminator for crisis analytics: multi-task domain adversarial attention network. This model consists of dedicated attention layers for each task to provide model interpretability; critical for real-word applications. As deep networks struggle with sparse datasets, we show that this can be improved by sharing a base layer for multi-task learning and domain adversarial training. Evaluation of domain adaptation for crisis events is performed by choosing a target event as the test set and training on the rest. Our results show that the multi-task model outperformed its single task counterpart. For the qualitative evaluation of interpretability, we show that the attention layer can be used as a guide to explain the model predictions and empower emergency services for exploring accountability of the model, by showcasing the words in a tweet that are deemed important in the classification process. Finally, we show a practical implication of our work by providing a use-case for the COVID-19 pandemic., Comment: 8 pages, 4 Figures, 6 Tables, Source Code Available
Published: 2020

37. Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift

Author: Krishnan, Jitin, Purohit, Hemant, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Domain adaptation approaches seek to learn from a source domain and generalize it to an unseen target domain. At present, the state-of-the-art unsupervised domain adaptation approaches for subjective text classification problems leverage unlabeled target data along with labeled source data. In this paper, we propose a novel method for domain adaptation of single-task text classification problems based on a simple but effective idea of diversity-based generalization that does not require unlabeled target data but still matches the state-of-the-art in performance. Diversity plays the role of promoting the model to better generalize and be indiscriminate towards domain shift by forcing the model not to rely on same features for prediction. We apply this concept on the most explainable component of neural networks, the attention layer. To generate sufficient diversity, we create a multi-head attention model and infuse a diversity constraint between the attention heads such that each head will learn differently. We further expand upon our model by tri-training and designing a procedure with an additional diversity constraint between the attention heads of the tri-trained classifiers. Extensive evaluation using the standard benchmark dataset of Amazon reviews and a newly constructed dataset of Crisis events shows that our fully unsupervised method matches with the competing baselines that uses unlabeled target data. Our results demonstrate that machine learning architectures that ensure sufficient diversity can generalize better; encouraging future research to design ubiquitously usable learning models without using unlabeled target data., Comment: 16 pages, 3 figures, 5 Tables, Source Code Available
Published: 2020

38. Graph Message Passing with Cross-location Attentions for Long-term ILI Prediction

Author: Deng, Songgaojun, Wang, Shusen, Rangwala, Huzefa, Wang, Lijing, and Ning, Yue
Subjects: Computer Science - Machine Learning, Computer Science - Social and Information Networks, Statistics - Machine Learning
Abstract: Forecasting influenza-like illness (ILI) is of prime importance to epidemiologists and health-care providers. Early prediction of epidemic outbreaks plays a pivotal role in disease intervention and control. Most existing work has either limited long-term prediction performance or lacks a comprehensive ability to capture spatio-temporal dependencies in data. Accurate and early disease forecasting models would markedly improve both epidemic prevention and managing the onset of an epidemic. In this paper, we design a cross-location attention based graph neural network (Cola-GNN) for learning time series embeddings and location aware attentions. We propose a graph message passing framework to combine learned feature embeddings and an attention matrix to model disease propagation over time. We compare the proposed method with state-of-the-art statistical approaches and deep learning models on real-world epidemic-related datasets from United States and Japan. The proposed method shows strong predictive performance and leads to interpretable results for long-term epidemic predictions., Comment: 17 pages, 22 figures, 5 tables
Published: 2019

39. Low Rank Factorization for Compact Multi-Head Self-Attention

Author: Mehta, Sneha, Rangwala, Huzefa, and Ramakrishnan, Naren
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Effective representation learning from text has been an active area of research in the fields of NLP and text mining. Attention mechanisms have been at the forefront in order to learn contextual sentence representations. Current state-of-the-art approaches for many NLP tasks use large pre-trained language models such as BERT, XLNet and so on for learning representations. These models are based on the Transformer architecture that involves recurrent blocks of computation consisting of multi-head self-attention and feedforward networks. One of the major bottlenecks largely contributing to the computational complexity of the Transformer models is the self-attention layer, that is both computationally expensive and parameter intensive. In this work, we introduce a novel multi-head self-attention mechanism operating on GRUs that is shown to be computationally cheaper and more parameter efficient than self-attention mechanism proposed in Transformers for text classification tasks. The efficiency of our approach mainly stems from two optimizations; 1) we use low-rank matrix factorization of the affinity matrix to efficiently get multiple attention distributions instead of having separate parameters for each head 2) attention scores are obtained by querying a global context vector instead of densely querying all the words in the sentence. We evaluate the performance of the proposed model on tasks such as sentiment analysis from movie reviews, predicting business ratings from reviews and classifying news articles into topics. We find that the proposed approach matches or outperforms a series of strong baselines and is more parameter efficient than comparable multi-head approaches. We also perform qualitative analyses to verify that the proposed approach is interpretable and captures context-dependent word importance., Comment: 9 pages, 5 figures
Published: 2019

40. Asynchronous Online Federated Learning for Edge Devices with Non-IID Data

Author: Chen, Yujing, Ning, Yue, Slawski, Martin, and Rangwala, Huzefa
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Federated learning (FL) is a machine learning paradigm where a shared central model is learned across distributed edge devices while the training data remains on these devices. Federated Averaging (FedAvg) is the leading optimization method for training non-convex models in this setting with a synchronized protocol. However, the assumptions made by FedAvg are not realistic given the heterogeneity of devices. In particular, the volume and distribution of collected data vary in the training process due to different sampling rates of edge devices. The edge devices themselves also vary in their available communication bandwidth and system configurations, such as memory, processor speed, and power requirements. This leads to vastly different training times as well as model/data transfer times. Furthermore, availability issues at edge devices can lead to a lack of contribution from specific edge devices to the federated model. In this paper, we present an Asynchronous Online Federated Learning (ASO-Fed) framework, where the edge devices perform online learning with continuous streaming local data and a central server aggregates model parameters from clients. Our framework updates the central model in an asynchronous manner to tackle the challenges associated with both varying computational loads at heterogeneous edge devices and edge devices that lag behind or dropout. We perform extensive experiments on a simulated benchmark image dataset and three real-world non-IID streaming datasets. The results demonstrate the effectiveness of \model~on converging fast and maintaining good prediction performance., Comment: 11 pages
Published: 2019

41. Sign Language Recognition Analysis using Multimodal Data

Author: Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Kosecka, Jana, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Voice-controlled personal and home assistants (such as the Amazon Echo and Apple Siri) are becoming increasingly popular for a variety of applications. However, the benefits of these technologies are not readily accessible to Deaf or Hard-ofHearing (DHH) users. The objective of this study is to develop and evaluate a sign recognition system using multiple modalities that can be used by DHH signers to interact with voice-controlled devices. With the advancement of depth sensors, skeletal data is used for applications like video analysis and activity recognition. Despite having similarity with the well-studied human activity recognition, the use of 3D skeleton data in sign language recognition is rare. This is because unlike activity recognition, sign language is mostly dependent on hand shape pattern. In this work, we investigate the feasibility of using skeletal and RGB video data for sign language recognition using a combination of different deep learning architectures. We validate our results on a large-scale American Sign Language (ASL) dataset of 12 users and 13107 samples across 51 signs. It is named as GMUASL51. We collected the dataset over 6 months and it will be publicly released in the hope of spurring further machine learning research towards providing improved accessibility for digital assistants., Comment: conference : IEEE DSAA, 2019, Washington DC
Published: 2019

42. Federated Multi-task Hierarchical Attention Model for Sensor Analytics

Author: Chen, Yujing, Ning, Yue, Chai, Zheng, and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Sensors are an integral part of modern Internet of Things (IoT) applications. There is a critical need for the analysis of heterogeneous multivariate temporal data obtained from the individual sensors of these systems. In this paper we particularly focus on the problem of the scarce amount of training data available per sensor. We propose a novel federated multi-task hierarchical attention model (FATHOM) that jointly trains classification/regression models from multiple sensors. The attention mechanism of the proposed model seeks to extract feature representations from the input and learn a shared representation focused on time dimensions across multiple sensors. The underlying temporal and non-linear relationships are modeled using a combination of attention mechanism and long-short term memory (LSTM) networks. We find that our proposed method outperforms a wide range of competitive baselines in both classification and regression settings on activity recognition and environment monitoring datasets. We further provide visualization of feature representations learned by our model at the input sensor level and central time level.
Published: 2019

43. Multi-Differential Fairness Auditor for Black Box Classifiers

Author: Gitiaux, Xavier and Rangwala, Huzefa
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.
Published: 2019

44. Reliable Deep Grade Prediction with Uncertainty Estimation

Author: Hu, Qian and Rangwala, Huzefa
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society
Abstract: Currently, college-going students are taking longer to graduate than their parental generations. Further, in the United States, the six-year graduation rate has been 59% for decades. Improving the educational quality by training better-prepared students who can successfully graduate in a timely manner is critical. Accurately predicting students' grades in future courses has attracted much attention as it can help identify at-risk students early so that personalized feedback can be provided to them on time by advisors. Prior research on students' grade prediction include shallow linear models; however, students' learning is a highly complex process that involves the accumulation of knowledge across a sequence of courses that can not be sufficiently modeled by these linear models. In addition to that, prior approaches focus on prediction accuracy without considering prediction uncertainty, which is essential for advising and decision making. In this work, we present two types of Bayesian deep learning models for grade prediction. The MLP ignores the temporal dynamics of students' knowledge evolution. Hence, we propose RNN for students' performance prediction. To evaluate the performance of the proposed models, we performed extensive experiments on data collected from a large public university. The experimental results show that the proposed models achieve better performance than prior state-of-the-art approaches. Besides more accurate results, Bayesian deep learning models estimate uncertainty associated with the predictions. We explore how uncertainty estimation can be applied towards developing a reliable educational early warning system. In addition to uncertainty, we also develop an approach to explain the prediction results, which is useful for advisors to provide personalized feedback to students.
Published: 2019

45. Estimating the Risk of Individual Discrimination of Classifiers

Author: Vasquez, Jonathan, primary, Gitiaux, Xavier, additional, and Rangwala, Huzefa, additional
Published: 2023
Full Text: View/download PDF

46. Grade Prediction with Temporal Course-Wise Influence

Author: Ren, Zhiyun, Ning, Xia, and Rangwala, Huzefa
Abstract: There is a critical need to develop new educational technology applications that analyze the data collected by universities to ensure that students graduate in a timely fashion (4 to 6 years); and they are well prepared for jobs in their respective fields of study. In this paper, we present a novel approach for analyzing historical educational records from a large, public university to perform next-term grade prediction; i.e., to estimate the grades that a student will get in a course that he/she will enroll in the next term. Accurate next-term grade prediction holds the promise for better student degree planning, personalized advising and automated interventions to ensure that students stay on track in their chosen degree program and graduate on time. We present a factorization-based approach called Matrix Factorization with Temporal Course-wise Influence that incorporates course-wise influence effects and temporal effects for grade prediction. In this model, students and courses are represented in a latent "knowledge" space. The grade of a student on a course is modeled as the similarity of their latent representation in the "knowledge" space. Course-wise influence is considered as an additional factor in the grade prediction. Our experimental results show that the proposed method outperforms several baseline approaches and infer meaningful patterns between pairs of courses within academic programs. [For the full proceedings, see ED596512.]
Published: 2017

47. ALE: Additive Latent Effect Models for Grade Prediction

Author: Ren, Zhiyun, Ning, Xia, and Rangwala, Huzefa
Subjects: Computer Science - Learning
Abstract: The past decade has seen a growth in the development and deployment of educational technologies for assisting college-going students in choosing majors, selecting courses and acquiring feedback based on past academic performance. Grade prediction methods seek to estimate a grade that a student may achieve in a course that she may take in the future (e.g., next term). Accurate and timely prediction of students' academic grades is important for developing effective degree planners and early warning systems, and ultimately improving educational outcomes. Existing grade pre- diction methods mostly focus on modeling the knowledge components associated with each course and student, and often overlook other factors such as the difficulty of each knowledge component, course instructors, student interest, capabilities and effort. In this paper, we propose additive latent effect models that incorporate these factors to predict the student next-term grades. Specifically, the proposed models take into account four factors: (i) student's academic level, (ii) course instructors, (iii) student global latent factor, and (iv) latent knowledge factors. We compared the new models with several state-of-the-art methods on students of various characteristics (e.g., whether a student transferred in or not). The experimental results demonstrate that the proposed methods significantly outperform the baselines on grade prediction problem. Moreover, we perform a thorough analysis on the importance of different factors and how these factors can practically assist students in course selection, and finally improve their academic performance., Comment: 9 pages, 3 figures
Published: 2018

48. Embedding Feature Selection for Large-scale Hierarchical Classification

Author: Naik, Azad and Rangwala, Huzefa
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection., Comment: IEEE International Conference on Big Data (IEEE BigData 2016)
Published: 2017

49. Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Author: Naik, Azad, Charuvaka, Anveshi, and Rangwala, Huzefa
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task., Comment: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2013
Published: 2017

50. Inconsistent Node Flattening for Improving Top-down Hierarchical Classification

Author: Naik, Azad and Rangwala, Huzefa
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: Large-scale classification of data where classes are structurally organized in a hierarchy is an important area of research. Top-down approaches that exploit the hierarchy during the learning and prediction phase are efficient for large scale hierarchical classification. However, accuracy of top-down approaches is poor due to error propagation i.e., prediction errors made at higher levels in the hierarchy cannot be corrected at lower levels. One of the main reason behind errors at the higher levels is the presence of inconsistent nodes that are introduced due to the arbitrary process of creating these hierarchies by domain experts. In this paper, we propose two different data-driven approaches (local and global) for hierarchical structure modification that identifies and flattens inconsistent nodes present within the hierarchy. Our extensive empirical evaluation of the proposed approaches on several image and text datasets with varying distribution of features, classes and training instances per class shows improved classification performance over competing hierarchical modification approaches. Specifically, we see an improvement upto 7% in Macro-F1 score with our approach over best TD baseline. SOURCE CODE: http://www.cs.gmu.edu/~mlbio/InconsistentNodeFlattening, Comment: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

531 results on '"Rangwala, Huzefa"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources