678 results on '"Shroff, Gautam"'
Search Results
2. ConceptSearch: Towards Efficient Program Search Using LLMs for Abstraction and Reasoning Corpus (ARC)
- Author
-
Singhal, Kartik and Shroff, Gautam
- Subjects
Computer Science - Machine Learning - Abstract
The Abstraction and Reasoning Corpus (ARC) poses a significant challenge to artificial intelligence, demanding broad generalization and few-shot learning capabilities that remain elusive for current deep learning methods, including large language models (LLMs). While LLMs excel in program synthesis, their direct application to ARC yields limited success. To address this, we introduce ConceptSearch, a novel function-search algorithm that leverages LLMs for program generation and employs a concept-based scoring method to guide the search efficiently. Unlike simplistic pixel-based metrics like Hamming distance, ConceptSearch evaluates programs on their ability to capture the underlying transformation concept reflected in the input-output examples. We explore three scoring functions: Hamming distance, a CNN-based scoring function, and an LLM-based natural language scoring function. Experimental results demonstrate the effectiveness of ConceptSearch, achieving a significant performance improvement over direct prompting with GPT-4. Moreover, our novel concept-based scoring exhibits up to 30% greater efficiency compared to Hamming distance, measured in terms of the number of iterations required to reach the correct solution. These findings highlight the potential of LLM-driven program search when integrated with concept-based guidance for tackling challenging generalization problems like ARC., Comment: Pre-print of paper accepted at AAAI 2025
- Published
- 2024
3. Numin: Weighted-Majority Ensembles for Intraday Trading
- Author
-
Mukherjee, Aniruddha, Singhal, Rekha, and Shroff, Gautam
- Subjects
Computer Science - Computational Engineering, Finance, and Science - Abstract
We consider the application of machine learning models for short-term intra-day trading in equities. We envisage a scenario wherein machine learning models are submitted by independent data scientists to predict discretised ten-candle returns every five minutes, in response to five-minute candlestick data provided to them in near real-time. An ensemble model combines these multiple models via a weighted-majority algorithm. The weights of each model are dynamically updated based on the performance of each model, and can also be used to reward model owners. Each model's performance is evaluated according to two different metrics over a recent time window: In addition to accuracy, we also consider a `utility' metric that is a proxy for a model's potential profitability under a particular trading strategy. We present experimental results on real intra-day data that show that our weighted-majority ensemble techniques show improved accuracy as well as utility over any of the individual models, especially using the utility metric to dynamically re-weight models over shorter time-windows., Comment: Accepted at ACM ICAIF'24
- Published
- 2024
- Full Text
- View/download PDF
4. BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks
- Author
-
Gandhi, Shubham, Patwardhan, Manasi, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,68T42 ,I.2.1 ,I.2.2 ,I.2.5 ,I.2.7 ,I.2.8 - Abstract
Large Language Models (LLMs) excel in diverse applications including generation of code snippets, but often struggle with generating code for complex Machine Learning (ML) tasks. Although existing LLM single-agent based systems give varying performance depending on the task complexity, they purely rely on larger and expensive models such as GPT-4. Our investigation reveals that no-cost and low-cost models such as Gemini-Pro, Mixtral and CodeLlama perform far worse than GPT-4 in a single-agent setting. With the motivation of developing a cost-efficient LLM based solution for solving ML tasks, we propose an LLM Multi-Agent based system which leverages combination of experts using profiling, efficient retrieval of past observations, LLM cascades, and ask-the-expert calls. Through empirical analysis on ML engineering tasks in the MLAgentBench benchmark, we demonstrate the effectiveness of our system, using no-cost models, namely Gemini as the base LLM, paired with GPT-4 in cascade and expert to serve occasional ask-the-expert calls for planning. With 94.2\% reduction in the cost (from \$0.931 per run cost averaged over all tasks for GPT-4 single agent system to \$0.054), our system is able to yield better average success rate of 32.95\% as compared to GPT-4 single-agent system yielding 22.72\% success rate averaged over all the tasks of MLAgentBench., Comment: Presented at AIMLSystems '24
- Published
- 2024
5. SmartFlow: Robotic Process Automation using LLMs
- Author
-
Jain, Arushi, Paliwal, Shubham, Sharma, Monika, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at https://smartflow-4c5a0a.webflow.io/., Comment: 32nd ACM International Conference on Information and Knowledge Management
- Published
- 2024
6. Acceleron: A Tool to Accelerate Research Ideation
- Author
-
Nigam, Harshit, Patwardhan, Manasi, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Several tools have recently been proposed for assisting researchers during various stages of the research life-cycle. However, these primarily concentrate on tasks such as retrieving and recommending relevant literature, reviewing and critiquing the draft, and writing of research manuscripts. Our investigation reveals a significant gap in availability of tools specifically designed to assist researchers during the challenging ideation phase of the research life-cycle. To aid with research ideation, we propose `Acceleron', a research accelerator for different phases of the research life cycle, and which is specially designed to aid the ideation process. Acceleron guides researchers through the formulation of a comprehensive research proposal, encompassing a novel research problem. The proposals motivation is validated for novelty by identifying gaps in the existing literature and suggesting a plausible list of techniques to solve the proposed problem. We leverage the reasoning and domain-specific skills of Large Language Models (LLMs) to create an agent-based architecture incorporating colleague and mentor personas for LLMs. The LLM agents emulate the ideation process undertaken by researchers, engaging researchers in an interactive fashion to aid in the development of the research proposal. Notably, our tool addresses challenges inherent in LLMs, such as hallucinations, implements a two-stage aspect-based retrieval to manage precision-recall trade-offs, and tackles issues of unanswerability. As evaluation, we illustrate the execution of our motivation validation and method synthesis workflows on proposals from the ML and NLP domain, given by 3 distinct researchers. Our observations and evaluations provided by the researchers illustrate the efficacy of the tool in terms of assisting researchers with appropriate inputs at distinct stages and thus leading to improved time efficiency., Comment: Accepted at AI2ASE Workshop at AAAI'24 Conference. 13 Pages and 4 Figures
- Published
- 2024
7. Conservative Predictions on Noisy Financial Data
- Author
-
Nabar, Omkar and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computational Engineering, Finance, and Science - Abstract
Price movements in financial markets are well known to be very noisy. As a result, even if there are, on occasion, exploitable patterns that could be picked up by machine-learning algorithms, these are obscured by feature and label noise rendering the predictions less useful, and risky in practice. Traditional rule-learning techniques developed for noisy data, such as CN2, would seek only high precision rules and refrain from making predictions where their antecedents did not apply. We apply a similar approach, where a model abstains from making a prediction on data points that it is uncertain on. During training, a cascade of such models are learned in sequence, similar to rule lists, with each model being trained only on data on which the previous model(s) were uncertain. Similar pruning of data takes place at test-time, with (higher accuracy) predictions being made albeit only on a fraction (support) of test-time data. In a financial prediction setting, such an approach allows decisions to be taken only when the ensemble model is confident, thereby reducing risk. We present results using traditional MLPs as well as differentiable decision trees, on synthetic data as well as real financial market data, to predict fixed-term returns using commonly used features. We submit that our approach is likely to result in better overall returns at a lower level of risk. In this context we introduce an utility metric to measure the average gain per trade, as well as the return adjusted for downside risk, both of which are improved significantly by our approach., Comment: Accepted at ACM ICAIF 2023
- Published
- 2023
- Full Text
- View/download PDF
8. Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting
- Author
-
Arora, Aseem, Bhaisaheb, Shabbirhussain, Nigam, Harshit, Patwardhan, Manasi, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach., Comment: 22 Pages
- Published
- 2023
9. Neuro-symbolic Meta Reinforcement Learning for Trading
- Author
-
Harini, S I, Shroff, Gautam, Srinivasan, Ashwin, Faldu, Prayushi, and Vig, Lovekesh
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Logic in Computer Science - Abstract
We model short-duration (e.g. day) trading in financial markets as a sequential decision-making problem under uncertainty, with the added complication of continual concept-drift. We, therefore, employ meta reinforcement learning via the RL2 algorithm. It is also known that human traders often rely on frequently occurring symbolic patterns in price series. We employ logical program induction to discover symbolic patterns that occur frequently as well as recently, and explore whether using such features improves the performance of our meta reinforcement learning algorithm. We report experiments on real data indicating that meta-RL is better than vanilla RL and also benefits from learned symbolic features., Comment: To appear in Muffin@AAAI'23
- Published
- 2023
10. Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning
- Author
-
Hebbalaguppe, Ramya, Patra, Rishabh, Dash, Tirtharaj, Shroff, Gautam, and Vig, Lovekesh
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep neural networks (DNN) are prone to miscalibrated predictions, often exhibiting a mismatch between the predicted output and the associated confidence scores. Contemporary model calibration techniques mitigate the problem of overconfident predictions by pushing down the confidence of the winning class while increasing the confidence of the remaining classes across all test samples. However, from a deployment perspective, an ideal model is desired to (i) generate well-calibrated predictions for high-confidence samples with predicted probability say >0.95, and (ii) generate a higher proportion of legitimate high-confidence samples. To this end, we propose a novel regularization technique that can be used with classification losses, leading to state-of-the-art calibrated predictions at test time; From a deployment standpoint in safety-critical applications, only high-confidence samples from a well-calibrated model are of interest, as the remaining samples have to undergo manual inspection. Predictive confidence reduction of these potentially ``high-confidence samples'' is a downside of existing calibration approaches. We mitigate this by proposing a dynamic train-time data pruning strategy that prunes low-confidence samples every few epochs, providing an increase in "confident yet calibrated samples". We demonstrate state-of-the-art calibration performance across image classification benchmarks, reducing training time without much compromise in accuracy. We provide insights into why our dynamic pruning strategy that prunes low-confidence training samples leads to an increase in high-confidence samples at test time., Comment: The paper is accepted at Winter Conference on applications of Computer Vision (IEEE WACV) in algorithms tracks. 8 pages Main paper; 3 pages supplementary material
- Published
- 2022
11. Neural Feature-Adaptation for Symbolic Predictions Using Pre-Training and Semantic Loss
- Author
-
Shah, Vedant, Agrawal, Aditya, Vig, Lovekesh, Srinivasan, Ashwin, Shroff, Gautam, and Verlekar, Tanmay
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Logic in Computer Science - Abstract
We are interested in neurosymbolic systems consisting of a high-level symbolic layer for explainable prediction in terms of human-intelligible concepts; and a low-level neural layer for extracting symbols required to generate the symbolic explanation. Real data is often imperfect meaning that even if the symbolic theory remains unchanged, we may still need to address the problem of mapping raw data to high-level symbols, each time there is a change in the data acquisition environment or equipment. Manual (re-)annotation of the raw data each time this happens is laborious and expensive; and automated labelling methods are often imperfect, especially for complex problems. NEUROLOG proposed the use of a semantic loss function that allows an existing feature-based symbolic model to guide the extraction of feature-values from raw data, using `abduction'. However, the experiments demonstrating the use of semantic loss through abduction appear to rely heavily on a domain-specific pre-processing step that enables a prior delineation of feature locations in the raw data. We examine the use of semantic loss in domains where such pre-processing is not possible, or is not obvious. We show that without any prior information about the features, the NEUROLOG approach can continue to predict accurately even with substantially incorrect feature predictions. We show also that prior information about the features in the form of even imperfect pre-training can help correct this situation. These findings are replicated on the original problem considered by NEUROLOG, without the use of feature-delineation. This suggests that symbolic explanations constructed for data in a domain could be re-used in a related domain, by `feature-adaptation' of pre-trained neural extractors using the semantic loss function constrained by abductive feedback.
- Published
- 2022
12. Abstract 4134249: Cardiovascular Risk Factors and Associations of Chronic Inflammatory-Related Disease in the Multi-Ethnic Study of Atherosclerosis
- Author
-
Manning, Evan, Shroff, Gautam, Jacobs, David, and Duprez, Daniel
- Published
- 2024
- Full Text
- View/download PDF
13. Knowledge-based Analogical Reasoning in Neuro-symbolic Latent Spaces
- Author
-
Shah, Vishwa, Sharma, Aditya, Shroff, Gautam, Vig, Lovekesh, Dash, Tirtharaj, and Srinivasan, Ashwin
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Connectionist systems on the other hand can directly ingest rich input spaces such as images, text or speech and recognize pattern even with noisy inputs. However, connectionist models struggle to include explicit domain knowledge for deductive reasoning. In this paper, we propose a framework that combines the pattern recognition abilities of neural networks with symbolic reasoning and background knowledge for solving a class of Analogical Reasoning problems where the set of attributes and possible relations across them are known apriori. We take inspiration from the 'neural algorithmic reasoning' approach [DeepMind 2020] and use problem-specific background knowledge by (i) learning a distributed representation based on a symbolic model of the problem (ii) training neural-network transformations reflective of the relations involved in the problem and finally (iii) training a neural network encoder from images to the distributed representation in (i). These three elements enable us to perform search-based reasoning using neural networks as elementary functions manipulating distributed representations. We test this on visual analogy problems in RAVENs Progressive Matrices, and achieve accuracy competitive with human performance and, in certain cases, superior to initial end-to-end neural-network based approaches. While recent neural models trained at scale yield SOTA, our novel neuro-symbolic reasoning approach is a promising direction for this problem, and is arguably more general, especially for problems where domain knowledge is available., Comment: 13 pages, 4 figures, Accepted at 16th International Workshop on Neural-Symbolic Learning and Reasoning as part of the 2nd International Joint Conference on Learning & Reasoning (IJCLR 2022)
- Published
- 2022
14. A Program-Synthesis Challenge for ARC-Like Tasks
- Author
-
Challa, Aditya, Srinivasan, Ashwin, Bain, Michael, Shroff, Gautam, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Muggleton, Stephen H., editor, and Tamaddoni-Nezhad, Alireza, editor
- Published
- 2024
- Full Text
- View/download PDF
15. Continual Learning for Multivariate Time Series Tasks with Variable Input Dimensions
- Author
-
Gupta, Vibhor, Narwariya, Jyoti, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning - Abstract
We consider a sequence of related multivariate time series learning tasks, such as predicting failures for different instances of a machine from time series of multi-sensor data, or activity recognition tasks over different individuals from multiple wearable sensors. We focus on two under-explored practical challenges arising in such settings: (i) Each task may have a different subset of sensors, i.e., providing different partial observations of the underlying 'system'. This restriction can be due to different manufacturers in the former case, and people wearing more or less measurement devices in the latter (ii) We are not allowed to store or re-access data from a task once it has been observed at the task level. This may be due to privacy considerations in the case of people, or legal restrictions placed by machine owners. Nevertheless, we would like to (a) improve performance on subsequent tasks using experience from completed tasks as well as (b) continue to perform better on past tasks, e.g., update the model and improve predictions on even the first machine after learning from subsequently observed ones. We note that existing continual learning methods do not take into account variability in input dimensions arising due to different subsets of sensors being available across tasks, and struggle to adapt to such variable input dimensions (VID) tasks. In this work, we address this shortcoming of existing methods. To this end, we learn task-specific generative models and classifiers, and use these to augment data for target tasks. Since the input dimensions across tasks vary, we propose a novel conditioning module based on graph neural networks to aid a standard recurrent neural network. We evaluate the efficacy of the proposed approach on three publicly available datasets corresponding to two activity recognition tasks (classification) and one prognostics task (regression)., Comment: Accepted at ICDM 2021
- Published
- 2022
16. Learning to Liquidate Forex: Optimal Stopping via Adaptive Top-K Regression
- Author
-
Garg, Diksha, Malhotra, Pankaj, Bhatia, Anil, Bhat, Sanjay, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning - Abstract
We consider learning a trading agent acting on behalf of the treasury of a firm earning revenue in a foreign currency (FC) and incurring expenses in the home currency (HC). The goal of the agent is to maximize the expected HC at the end of the trading episode by deciding to hold or sell the FC at each time step in the trading episode. We pose this as an optimization problem, and consider a broad spectrum of approaches with the learning component ranging from supervised to imitation to reinforcement learning. We observe that most of the approaches considered struggle to improve upon simple heuristic baselines. We identify two key aspects of the problem that render standard solutions ineffective - i) while good forecasts of future FX rates can be highly effective in guiding good decisions, forecasting FX rates is difficult, and erroneous estimates tend to degrade the performance of trading agents instead of improving it, ii) the inherent non-stationary nature of FX rates renders a fixed decision-threshold highly ineffective. To address these problems, we propose a novel supervised learning approach that learns to forecast the top-K future FX rates instead of forecasting all the future FX rates, and bases the hold-versus-sell decision on the forecasts (e.g. hold if future FX rate is higher than current FX rate, sell otherwise). Furthermore, to handle the non-stationarity in the FX rates data which poses challenges to the i.i.d. assumption in supervised learning methods, we propose to adaptively learn decision-thresholds based on recent historical episodes. Through extensive empirical evaluation, we show that our approach is the only approach which is able to consistently improve upon a simple heuristic baseline. Further experiments show the inefficacy of state-of-the-art statistical and deep-learning-based forecasting methods as they degrade the performance of the trading agent., Comment: Published at Workshop on AI in Financial Services: Adaptiveness, Resilience & Governance, AAAI-22
- Published
- 2022
17. DRTCI: Learning Disentangled Representations for Temporal Causal Inference
- Author
-
Gupta, Garima, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Medical professionals evaluating alternative treatment plans for a patient often encounter time varying confounders, or covariates that affect both the future treatment assignment and the patient outcome. The recently proposed Counterfactual Recurrent Network (CRN) accounts for time varying confounders by using adversarial training to balance recurrent historical representations of patient data. However, this work assumes that all time varying covariates are confounding and thus attempts to balance the full state representation. Given that the actual subset of covariates that may in fact be confounding is in general unknown, recent work on counterfactual evaluation in the static, non-temporal setting has suggested that disentangling the covariate representation into separate factors, where each either influence treatment selection, patient outcome or both can help isolate selection bias and restrict balancing efforts to factors that influence outcome, allowing the remaining factors which predict treatment without needlessly being balanced., Comment: Accepted in Workshop on "The Neglected Assumptions in Causal Inference" at ICML 2021 (July)
- Published
- 2022
18. Solving Visual Analogies Using Neural Algorithmic Reasoning
- Author
-
Sonwane, Atharv, Shroff, Gautam, Vig, Lovekesh, Srinivasan, Ashwin, and Dash, Tirtharaj
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We consider a class of visual analogical reasoning problems that involve discovering the sequence of transformations by which pairs of input/output images are related, so as to analogously transform future inputs. This program synthesis task can be easily solved via symbolic search. Using a variation of the `neural analogical reasoning' approach of (Velickovic and Blundell 2021), we instead search for a sequence of elementary neural network transformations that manipulate distributed representations derived from a symbolic space, to which input images are directly encoded. We evaluate the extent to which our `neural reasoning' approach generalizes for images with unseen shapes and positions., Comment: 20 pages. Contains extended abstract accepted at the AAAI-22 Student Abstract and Poster Program along with relevent supplementary material
- Published
- 2021
19. Using Program Synthesis and Inductive Logic Programming to solve Bongard Problems
- Author
-
Sonwane, Atharv, Chitlangia, Sharad, Dash, Tirtharaj, Vig, Lovekesh, Shroff, Gautam, and Srinivasan, Ashwin
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Programming Languages - Abstract
The ability to recognise and make analogies is often used as a measure or test of human intelligence. The ability to solve Bongard problems is an example of such a test. It has also been postulated that the ability to rapidly construct novel abstractions is critical to being able to solve analogical problems. Given an image, the ability to construct a program that would generate that image is one form of abstraction, as exemplified in the Dreamcoder project. In this paper, we present a preliminary examination of whether programs constructed by Dreamcoder can be used for analogical reasoning to solve certain Bongard problems. We use Dreamcoder to discover programs that generate the images in a Bongard problem and represent each of these as a sequence of state transitions. We decorate the states using positional information in an automated manner and then encode the resulting sequence into logical facts in Prolog. We use inductive logic programming (ILP), to learn an (interpretable) theory for the abstract concept involved in an instance of a Bongard problem. Experiments on synthetically created Bongard problems for concepts such as 'above/below' and 'clockwise/counterclockwise' demonstrate that our end-to-end system can solve such problems. We study the importance and completeness of each component of our approach, highlighting its current limitations and pointing to directions for improvement in our formulation as well as in elements of any Dreamcoder-like program synthesis system used for such an approach., Comment: Equal contribution from first two authors. Accepted at the 10th International Workshop on Approaches and Applications of Inductive Programming as a Work In Progress Report
- Published
- 2021
20. Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins!
- Author
-
Shah, Vedant and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Deep-learning techniques have been successfully used for time-series forecasting and have often shown superior performance on many standard benchmark datasets as compared to traditional techniques. Here we present a comprehensive and comparative study of performance of deep-learning techniques for forecasting prices in financial markets. We benchmark state-of-the-art deep-learning baselines, such as NBeats, etc., on data from currency as well as stock markets. We also generate synthetic data using a fuzzy-logic based model of demand driven by technical rules such as moving averages, which are often used by traders. We benchmark the baseline techniques on this synthetic data as well as use it for data augmentation. We also apply gradient-based meta-learning to account for non-stationarity of financial time-series. Our extensive experiments notwithstanding, the surprising result is that the standard ARIMA models outperforms deep-learning even using data augmentation or meta-learning. We conclude by speculating as to why this might be the case., Comment: Camera Ready Version for ICBINB Workshop @ NeurIPS 2021
- Published
- 2021
21. Homelessness, Race/Ethnicity, and Cardiovascular Disease: a State-of-the-Evidence Summary and Structured Review of Race/Ethnicity Reporting
- Author
-
Nyembo, Phillippe F., Bakker, Caitlin, Ayenew, Woubeshet, Shroff, Gautam R., Busch, Andrew M., and Vickery, Katherine Diaz
- Published
- 2023
- Full Text
- View/download PDF
22. CAMTA: Causal Attention Model for Multi-touch Attribution
- Author
-
Kumar, Sachin, Gupta, Garima, Prasad, Ranjitha, Chatterjee, Arnab, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning - Abstract
Advertising channels have evolved from conventional print media, billboards and radio advertising to online digital advertising (ad), where the users are exposed to a sequence of ad campaigns via social networks, display ads, search etc. While advertisers revisit the design of ad campaigns to concurrently serve the requirements emerging out of new ad channels, it is also critical for advertisers to estimate the contribution from touch-points (view, clicks, converts) on different channels, based on the sequence of customer actions. This process of contribution measurement is often referred to as multi-touch attribution (MTA). In this work, we propose CAMTA, a novel deep recurrent neural network architecture which is a casual attribution mechanism for user-personalised MTA in the context of observational data. CAMTA minimizes the selection bias in channel assignment across time-steps and touchpoints. Furthermore, it utilizes the users' pre-conversion actions in a principled way in order to predict pre-channel attribution. To quantitatively benchmark the proposed MTA model, we employ the real world Criteo dataset and demonstrate the superior performance of CAMTA with respect to prediction accuracy as compared to several baselines. In addition, we provide results for budget allocation and user-behaviour modelling on the predicted channel attribution., Comment: Accepted in ICDMW 2020
- Published
- 2020
23. Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
- Author
-
Garg, Diksha, Gupta, Priyanka, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval - Abstract
Most of the existing deep reinforcement learning (RL) approaches for session-based recommendations either rely on costly online interactions with real users, or rely on potentially biased rule-based or data-driven user-behavior models for learning. In this work, we instead focus on learning recommendation policies in the pure batch or offline setting, i.e. learning policies solely from offline historical interaction logs or batch data generated from an unknown and sub-optimal behavior policy, without further access to data from the real-world or user-behavior models. We propose BCD4Rec: Batch-Constrained Distributional RL for Session-based Recommendations. BCD4Rec builds upon the recent advances in batch (offline) RL and distributional RL to learn from offline logs while dealing with the intrinsically stochastic nature of rewards from the users due to varied latent interest preferences (environments). We demonstrate that BCD4Rec significantly improves upon the behavior policy as well as strong RL and non-RL baselines in the batch setting in terms of standard performance metrics like Click Through Rates or Buy Rates. Other useful properties of BCD4Rec include: i. recommending items from the correct latent categories indicating better value estimates despite large action space (of the order of number of items), and ii. overcoming popularity bias in clicked or bought items typically present in the offline logs., Comment: Presented at Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020
- Published
- 2020
24. Hi-CI: Deep Causal Inference in High Dimensions
- Author
-
Sharma, Ankit, Gupta, Garima, Prasad, Ranjitha, Chatterjee, Arnab, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Statistics - Methodology ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
We address the problem of counterfactual regression using causal inference (CI) in observational studies consisting of high dimensional covariates and high cardinality treatments. Confounding bias, which leads to inaccurate treatment effect estimation, is attributed to covariates that affect both treatments and outcome. The presence of high-dimensional co-variates exacerbates the impact of bias as it is harder to isolate and measure the impact of these confounders. In the presence of high-cardinality treatment variables, CI is rendered ill-posed due to the increase in the number of counterfactual outcomes to be predicted. We propose Hi-CI, a deep neural network (DNN) based framework for estimating causal effects in the presence of large number of covariates, and high-cardinal and continuous treatment variables. The proposed architecture comprises of a decorrelation network and an outcome prediction network. In the decorrelation network, we learn a data representation in lower dimensions as compared to the original covariates and addresses confounding bias alongside. Subsequently, in the outcome prediction network, we learn an embedding of high-cardinality and continuous treatments, jointly with the data representation. We demonstrate the efficacy of causal effect prediction of the proposed Hi-CI network using synthetic and real-world NEWS datasets., Comment: 23 pages, 5 figures, Accepted in Causal Discovery Workshop - KDD 2020
- Published
- 2020
25. Handling Variable-Dimensional Time Series with Graph Neural Networks
- Author
-
Gupta, Vibhor, Narwariya, Jyoti, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing ,Statistics - Machine Learning - Abstract
Several applications of Internet of Things (IoT) technology involve capturing data from multiple sensors resulting in multi-sensor time series. Existing neural networks based approaches for such multi-sensor or multivariate time series modeling assume fixed input dimension or number of sensors. Such approaches can struggle in the practical setting where different instances of the same device or equipment such as mobiles, wearables, engines, etc. come with different combinations of installed sensors. We consider training neural network models from such multi-sensor time series, where the time series have varying input dimensionality owing to availability or installation of a different subset of sensors at each source of time series. We propose a novel neural network architecture suitable for zero-shot transfer learning allowing robust inference for multivariate time series with previously unseen combination of available dimensions or sensors at test time. Such a combinatorial generalization is achieved by conditioning the layers of a core neural network-based time series model with a "conditioning vector" that carries information of the available combination of sensors for each time series. This conditioning vector is obtained by summarizing the set of learned "sensor embedding vectors" corresponding to the available sensors in a time series via a graph neural network. We evaluate the proposed approach on publicly available activity recognition and equipment prognostics datasets, and show that the proposed approach allows for better generalization in comparison to a deep gated recurrent neural network baseline., Comment: Accepted at AI4IoT@IJCAI'20 workshop
- Published
- 2020
26. Graph Neural Networks for Leveraging Industrial Equipment Structure: An application to Remaining Useful Life Estimation
- Author
-
Narwariya, Jyoti, Malhotra, Pankaj, TV, Vishnu, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Automated equipment health monitoring from streaming multisensor time-series data can be used to enable condition-based maintenance, avoid sudden catastrophic failures, and ensure high operational availability. We note that most complex machinery has a well-documented and readily accessible underlying structure capturing the inter-dependencies between sub-systems or modules. Deep learning models such as those based on recurrent neural networks (RNNs) or convolutional neural networks (CNNs) fail to explicitly leverage this potentially rich source of domain-knowledge into the learning procedure. In this work, we propose to capture the structure of a complex equipment in the form of a graph, and use graph neural networks (GNNs) to model multi-sensor time-series data. Using remaining useful life estimation as an application task, we evaluate the advantage of incorporating the graph structure via GNNs on the publicly available turbofan engine benchmark dataset. We observe that the proposed GNN-based RUL estimation model compares favorably to several strong baselines from literature such as those based on RNNs and CNNs. Additionally, we observe that the learned network is able to focus on the module (node) with impending failure through a simple attention mechanism, potentially paving the way for actionable diagnosis., Comment: Accepted at AAAI workshop DLGMA'20
- Published
- 2020
27. Privacy Guidelines for Contact Tracing Applications
- Author
-
Shukla, Manish, A, Rajan M, Lodha, Sachin, Shroff, Gautam, and Raskar, Ramesh
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Computers and Society ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Contact tracing is a very powerful method to implement and enforce social distancing to avoid spreading of infectious diseases. The traditional approach of contact tracing is time consuming, manpower intensive, dangerous and prone to error due to fatigue or lack of skill. Due to this there is an emergence of mobile based applications for contact tracing. These applications primarily utilize a combination of GPS based absolute location and Bluetooth based relative location remitted from user's smartphone to infer various insights. These applications have eased the task of contact tracing; however, they also have severe implication on user's privacy, for example, mass surveillance, personal information leakage and additionally revealing the behavioral patterns of the user. This impact on user's privacy leads to trust deficit in these applications, and hence defeats their purpose. In this work we discuss the various scenarios which a contact tracing application should be able to handle. We highlight the privacy handling of some of the prominent contact tracing applications. Additionally, we describe the various threat actors who can disrupt its working, or misuse end user's data, or hamper its mass adoption. Finally, we present privacy guidelines for contact tracing applications from different stakeholder's perspective. To best of our knowledge, this is the first generic work which provides privacy guidelines for contact tracing applications., Comment: 10 pages, 0 images
- Published
- 2020
28. MultiMBNN: Matched and Balanced Causal Inference with Neural Networks
- Author
-
Sharma, Ankit, Gupta, Garima, Prasad, Ranjitha, Chatterjee, Arnab, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Statistics - Methodology ,Computer Science - Machine Learning ,Computer Science - Multiagent Systems - Abstract
Causal inference (CI) in observational studies has received a lot of attention in healthcare, education, ad attribution, policy evaluation, etc. Confounding is a typical hazard, where the context affects both, the treatment assignment and response. In a multiple treatment scenario, we propose the neural network based MultiMBNN, where we overcome confounding by employing generalized propensity score based matching, and learning balanced representations. We benchmark the performance on synthetic and real-world datasets using PEHE, and mean absolute percentage error over ATE as metrics. MultiMBNN outperforms the state-of-the-art algorithms for CI such as TARNet and Perfect Match (PM)., Comment: 7 pages, 3 figures, Accepted in ESANN 2020
- Published
- 2020
29. MetaCI: Meta-Learning for Causal Inference in a Heterogeneous Population
- Author
-
Sharma, Ankit, Gupta, Garima, Prasad, Ranjitha, Chatterjee, Arnab, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Multiagent Systems ,Statistics - Machine Learning - Abstract
Performing inference on data obtained through observational studies is becoming extremely relevant due to the widespread availability of data in fields such as healthcare, education, retail, etc. Furthermore, this data is accrued from multiple homogeneous subgroups of a heterogeneous population, and hence, generalizing the inference mechanism over such data is essential. We propose the MetaCI framework with the goal of answering counterfactual questions in the context of causal inference (CI), where the factual observations are obtained from several homogeneous subgroups. While the CI network is designed to generalize from factual to counterfactual distribution in order to tackle covariate shift, MetaCI employs the meta-learning paradigm to tackle the shift in data distributions between training and test phase due to the presence of heterogeneity in the population, and due to drifts in the target distribution, also known as concept shift. We benchmark the performance of the MetaCI algorithm using the mean absolute percentage error over the average treatment effect as the metric, and demonstrate that meta initialization has significant gains compared to randomly initialized networks, and other methods., Comment: 10 pages, 4 figures, Accepted in CausalML Workshop - NeurIPS 2019
- Published
- 2019
30. Meta-Learning for Few-Shot Time Series Classification
- Author
-
Narwariya, Jyoti, Malhotra, Pankaj, Vig, Lovekesh, Shroff, Gautam, and Tv, Vishnu
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Deep neural networks (DNNs) have achieved state-of-the-art results on time series classification (TSC) tasks. In this work, we focus on leveraging DNNs in the often-encountered practical scenario where access to labeled training data is difficult, and where DNNs would be prone to overfitting. We leverage recent advancements in gradient-based meta-learning, and propose an approach to train a residual neural network with convolutional layers as a meta-learning agent for few-shot TSC. The network is trained on a diverse set of few-shot tasks sampled from various domains (e.g. healthcare, activity recognition, etc.) such that it can solve a target task from another domain using only a small number of training samples from the target task. Most existing meta-learning approaches are limited in practice as they assume a fixed number of target classes across tasks. We overcome this limitation in order to train a common agent across domains with each domain having different number of target classes, we utilize a triplet-loss based learning procedure that does not require any constraints to be enforced on the number of classes for the few-shot TSC tasks. To the best of our knowledge, we are the first to use meta-learning based pre-training for TSC. Our approach sets a new benchmark for few-shot TSC, outperforming several strong baselines on few-shot tasks sampled from 41 datasets in UCR TSC Archive. We observe that pre-training under the meta-learning paradigm allows the network to quickly adapt to new unseen tasks with small number of labeled instances., Comment: CoDS COMAD 2020: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD
- Published
- 2019
31. NISER: Normalized Item and Session Representations to Handle Popularity Bias
- Author
-
Gupta, Priyanka, Garg, Diksha, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
The goal of session-based recommendation (SR) models is to utilize the information from past actions (e.g. item/product clicks) in a session to recommend items that a user is likely to click next. Recently it has been shown that the sequence of item interactions in a session can be modeled as graph-structured data to better account for complex item transitions. Graph neural networks (GNNs) can learn useful representations for such session-graphs, and have been shown to improve over sequential models such as recurrent neural networks [14]. However, we note that these GNN-based recommendation models suffer from popularity bias: the models are biased towards recommending popular items, and fail to recommend relevant long-tail items (less popular or less frequent items). Therefore, these models perform poorly for the less popular new items arriving daily in a practical online setting. We demonstrate that this issue is, in part, related to the magnitude or norm of the learned item and session-graph representations (embedding vectors). We propose a training procedure that mitigates this issue by using normalized representations. The models using normalized item and session-graph representations perform significantly better: i. for the less popular long-tail items in the offline setting, and ii. for the less popular newly introduced items in the online setting. Furthermore, our approach significantly improves upon existing state-of-the-art on three benchmark datasets., Comment: Presented at 1st International Workshop on Graph Representation Learning and its Applications, CIKM 2019
- Published
- 2019
32. Meta-Learning for Black-box Optimization
- Author
-
TV, Vishnu, Malhotra, Pankaj, Narwariya, Jyoti, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Recently, neural networks trained as optimizers under the "learning to learn" or meta-learning framework have been shown to be effective for a broad range of optimization tasks including derivative-free black-box function optimization. Recurrent neural networks (RNNs) trained to optimize a diverse set of synthetic non-convex differentiable functions via gradient descent have been effective at optimizing derivative-free black-box functions. In this work, we propose RNN-Opt: an approach for learning RNN-based optimizers for optimizing real-parameter single-objective continuous functions under limited budget constraints. Existing approaches utilize an observed improvement based meta-learning loss function for training such models. We propose training RNN-Opt by using synthetic non-convex functions with known (approximate) optimal values by directly using discounted regret as our meta-learning loss function. We hypothesize that a regret-based loss function mimics typical testing scenarios, and would therefore lead to better optimizers compared to optimizers trained only to propose queries that improve over previous queries. Further, RNN-Opt incorporates simple yet effective enhancements during training and inference procedures to deal with the following practical challenges: i) Unknown range of possible values for the black-box function to be optimized, and ii) Practical and domain-knowledge based constraints on the input parameters. We demonstrate the efficacy of RNN-Opt in comparison to existing methods on several synthetic as well as standard benchmark black-box functions along with an anonymized industrial constrained optimization problem., Comment: Accepted at ECML-PKDD 2019 Research Track
- Published
- 2019
33. One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis
- Author
-
Sunder, Vishal, Srinivasan, Ashwin, Vig, Lovekesh, Shroff, Gautam, and Rahul, Rohit
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Logic in Computer Science - Abstract
Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, approach where (a) we use pre-trained deep neural networks to populate a relational database with facts about each document-image; and (b) we use a form of deductive reasoning, related to meta-interpretive learning of transition systems to learn extraction programs: Given task-specific transitions defined using the entities and relations identified by the neural detectors and a small number of instances (usually 1, sometimes 2) of images and the desired outputs, a resource-bounded meta-interpreter constructs proofs for the instance(s) via logical deduction; a set of logic programs that extract each desired entity is easily synthesized from such proofs. In most cases a single training example together with a noisy-clone of itself suffices to learn a program-set that generalizes well on test documents, at which time the value of each entity is determined by a majority vote across its program-set. We demonstrate our two-level neuro-deductive approach on publicly available datasets ("Patent" and "Doctor's Bills") and also describe its use in a real-life industrial problem., Comment: 11 pages, appears in the 13th International Workshop on Neural-Symbolic Learning and Reasoning at IJCAI 2019
- Published
- 2019
34. Fast Online 'Next Best Offers' using Deep Learning
- Author
-
Singhal, Rekha, Shroff, Gautam, Kumar, Mukund, Roy, Sharod, Kadarkar, Sanket, virk, Rupinder, Verma, Siddharth, and Tiwari, Vartika
- Subjects
Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Performance ,Statistics - Machine Learning - Abstract
In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time streaming technology stacks. iPrescribe uses an ensemble of deep learning and machine learning algorithms for prediction. We describe the scalable real-time streaming technology stack and optimized machine-learning implementations to achieve a 90th percentile recommendation latency of 38 milliseconds. Optimizations include a novel mechanism to deploy recurrent Long Short Term Memory (LSTM) deep learning networks efficiently., Comment: 7 Pages, Accepted in COMAD-CODS 2019
- Published
- 2019
35. ConvTimeNet: A Pre-trained Deep Convolutional Neural Network for Time Series Classification
- Author
-
Kashiparekh, Kathan, Narwariya, Jyoti, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Training deep neural networks often requires careful hyper-parameter tuning and significant computational resources. In this paper, we propose ConvTimeNet (CTN): an off-the-shelf deep convolutional neural network (CNN) trained on diverse univariate time series classification (TSC) source tasks. Once trained, CTN can be easily adapted to new TSC target tasks via a small amount of fine-tuning using labeled instances from the target tasks. We note that the length of convolutional filters is a key aspect when building a pre-trained model that can generalize to time series of different lengths across datasets. To achieve this, we incorporate filters of multiple lengths in all convolutional layers of CTN to capture temporal features at multiple time scales. We consider all 65 datasets with time series of lengths up to 512 points from the UCR TSC Benchmark for training and testing transferability of CTN: We train CTN on a randomly chosen subset of 24 datasets using a multi-head approach with a different softmax layer for each training dataset, and study generalizability and transferability of the learned filters on the remaining 41 TSC datasets. We observe significant gains in classification accuracy as well as computational efficiency when using pre-trained CTN as a starting point for subsequent task-specific fine-tuning compared to existing state-of-the-art TSC approaches. We also provide qualitative insights into the working of CTN by: i) analyzing the activations and filters of first convolution layer suggesting the filters in CTN are generically useful, ii) analyzing the impact of the design decision to incorporate multiple length decisions, and iii) finding regions of time series that affect the final classification decision via occlusion sensitivity analysis., Comment: Accepted at IJCNN 2019
- Published
- 2019
36. Transfer Learning for Clinical Time Series Analysis using Deep Neural Networks
- Author
-
Gupta, Priyanka, Malhotra, Pankaj, Narwariya, Jyoti, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Deep neural networks have shown promising results for various clinical prediction tasks. However, training deep networks such as those based on Recurrent Neural Networks (RNNs) requires large labeled data, significant hyper-parameter tuning effort and expertise, and high computational resources. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider two scenarios for transfer learning using RNNs: i) domain-adaptation, i.e., leveraging a deep RNN - namely, TimeNet - pre-trained for feature extraction on time series from diverse domains, and adapting it for feature extraction and subsequent target tasks in healthcare domain, ii) task-adaptation, i.e., pre-training a deep RNN - namely, HealthNet - on diverse tasks in healthcare domain, and adapting it to new target tasks in the same domain. We evaluate the above approaches on publicly available MIMIC-III benchmark dataset, and demonstrate that (a) computationally-efficient linear models trained using features extracted via pre-trained RNNs outperform or, in the worst case, perform as well as deep RNNs and statistical hand-crafted features based models trained specifically for target task; (b) models obtained by adapting pre-trained models for target tasks are significantly more robust to the size of labeled data compared to task-specific RNNs, while also being computationally efficient. We, therefore, conclude that pre-trained deep models like TimeNet and HealthNet allow leveraging the advantages of deep learning for clinical time series analysis tasks, while also minimize dependence on hand-crafted features, deal robustly with scarce labeled training data scenarios without overfitting, as well as reduce dependence on expertise and resources required to train deep networks from scratch., Comment: Updated version of this work appeared in Journal of Healthcare Informatics Research, Vol. 4, 2020. arXiv admin note: text overlap with arXiv:1807.01705
- Published
- 2019
37. Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models
- Author
-
TV, Vishnu, Diksha, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time series data is useful to enable condition-based maintenance and ensure high operational availability of equipment. We propose a novel deep learning based approach for Prognostics with Uncertainty Quantification that is useful in scenarios where: (i) access to labeled failure data is scarce due to rarity of failures (ii) future operational conditions are unobserved and (iii) inherent noise is present in the sensor readings. All three scenarios mentioned are unavoidable sources of uncertainty in the RUL estimation process often resulting in unreliable RUL estimates. To address (i), we formulate RUL estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep Long Short Term Memory (LSTM) network based approach to learn the OR function. We show that LSTM-OR naturally allows for incorporation of censored operational instances in training along with the failed instances, leading to more robust learning. To address (ii), we propose a simple yet effective approach to quantify predictive uncertainty in the RUL estimation models by training an ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan engine benchmark datasets, we demonstrate that LSTM-OR is significantly better than the commonly used deep metric regression based approaches for RUL estimation, especially when failed training instances are scarce. Further, our uncertainty quantification approach yields high quality predictive uncertainty estimates while also leading to improved RUL estimates compared to single best LSTM-OR models., Comment: Accepted at International Journal of Prognostics and Health Management (IJPHM), 2019
- Published
- 2019
38. Abstract 15718: Valvulo-Arterial Impedance in ESKD Patients With Aortic Stenosis: Echocardiographic Correlates and Outcomes
- Author
-
Ogugua, Fredrick, Zellmer, Lucas, Mathew, Roy O, and Shroff, Gautam R
- Published
- 2023
- Full Text
- View/download PDF
39. Abstract 12119: Work Environment, Burnout, and Intent to Leave Current Job Among Cardiology Team Members: Results From the National Coping With COVID Survey
- Author
-
Mallick, Sanjoyita, Shroff, Gautam R, Linzer, Mark, Douglas, Pamela S, Sullivan, Erin, Brown, Roger, and Karim, Rehan
- Published
- 2023
- Full Text
- View/download PDF
40. Nontraditional Risk Factors for Progression Through Chronic Kidney Disease Risk Categories: The Coronary Artery Risk Development in Young Adults Study
- Author
-
Choi, Yuni, Jacobs, David R., Jr, Kramer, Holly J., Shroff, Gautam R., Chang, Alexander R., and Duprez, Daniel A.
- Published
- 2023
- Full Text
- View/download PDF
41. MEETING BOT: Reinforcement Learning for Dialogue Based Meeting Scheduling
- Author
-
D, Vishwanath, Vig, Lovekesh, Shroff, Gautam, and Agarwal, Puneet
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
In this paper we present Meeting Bot, a reinforcement learning based conversational system that interacts with multiple users to schedule meetings. The system is able to interpret user utterences and map them to preferred time slots, which are then fed to a reinforcement learning (RL) system with the goal of converging on an agreeable time slot. The RL system is able to adapt to user preferences and environmental changes in meeting arrival rate while still scheduling effectively. Learning is performed via policy gradient with exploration, by utilizing an MLP as an approximator of the policy function. Results demonstrate that the system outperforms standard scheduling algorithms in terms of overall scheduling efficiency. Additionally, the system is able to adapt its strategy to situations when users consistently reject or accept meetings in certain slots (such as Friday afternoon versus Thursday morning), or when the meeting is called by members who are at a more senior designation.
- Published
- 2018
42. Deep Reader: Information extraction from Document images via relation extraction and Natural Language
- Author
-
D, Vishwanath, Rahul, Rohit, Sehgal, Gunjan, Swati, Chowdhury, Arindam, Sharma, Monika, Vig, Lovekesh, Shroff, Gautam, and Srinivasan, Ashwin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case., Comment: Published in 3rd International Workshop on Robust Reading at Asian Conference of Computer Vision 2018
- Published
- 2018
43. Prosocial or Selfish? Agents with different behaviors for Contract Negotiation using Reinforcement Learning
- Author
-
Sunder, Vishal, Vig, Lovekesh, Chatterjee, Arnab, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems ,Statistics - Machine Learning - Abstract
We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial behavior to varying degrees in these agents. Empirical evidence is provided showing consistency in agent behaviors. We further train a meta agent with a mixture of behaviors by learning an ensemble of different models using reinforcement learning. Finally, to ascertain the deployability of the negotiating agents, we conducted experiments pitting the trained agents against human players. Results demonstrate that the agents are able to hold their own against human players, often emerging as winners in the negotiation. Our experiments demonstrate that the meta agent is able to reasonably emulate human behavior., Comment: Proceedings of the 11th International Workshop on Automated Negotiations (held in conjunction with IJCAI 2018)
- Published
- 2018
44. Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks
- Author
-
Gupta, Priyanka, Malhotra, Pankaj, Vig, Lovekesh, and Shroff, Gautam
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Deep neural networks have shown promising results for various clinical prediction tasks such as diagnosis, mortality prediction, predicting duration of stay in hospital, etc. However, training deep networks -- such as those based on Recurrent Neural Networks (RNNs) -- requires large labeled data, high computational resources, and significant hyperparameter tuning effort. In this work, we investigate as to what extent can transfer learning address these issues when using deep RNNs to model multivariate clinical time series. We consider transferring the knowledge captured in an RNN trained on several source tasks simultaneously using a large labeled dataset to build the model for a target task with limited labeled data. An RNN pre-trained on several tasks provides generic features, which are then used to build simpler linear models for new target tasks without training task-specific RNNs. For evaluation, we train a deep RNN to identify several patient phenotypes on time series from MIMIC-III database, and then use the features extracted using that RNN to build classifiers for identifying previously unseen phenotypes, and also for a seemingly unrelated task of in-hospital mortality. We demonstrate that (i) models trained on features extracted using pre-trained RNN outperform or, in the worst case, perform as well as task-specific RNNs; (ii) the models using features from pre-trained models are more robust to the size of labeled data than task-specific RNNs; and (iii) features extracted using pre-trained RNN are generic enough and perform better than typical statistical hand-crafted features., Comment: Accepted at Machine Learning for Medicine and Healthcare Workshop at ACM KDD 2018 Conference
- Published
- 2018
45. Crop Planning using Stochastic Visual Optimization
- Author
-
Sehgal, Gunjan, Gupta, Bindu, Paneri, Kaushal, Singh, Karamjit, Sharma, Geetika, and Shroff, Gautam
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computers and Society - Abstract
As the world population increases and arable land decreases, it becomes vital to improve the productivity of the agricultural land available. Given the weather and soil properties, farmers need to take critical decisions such as which seed variety to plant and in what proportion, in order to maximize productivity. These decisions are irreversible and any unusual behavior of external factors, such as weather, can have catastrophic impact on the productivity of crop. A variety which is highly desirable to a farmer might be unavailable or in short supply, therefore, it is very critical to evaluate which variety or varieties are more likely to be chosen by farmers from a growing region in order to meet demand. In this paper, we present our visual analytics tool, ViSeed, showcased on the data given in Syngenta 2016 crop data challenge 1 . This tool helps to predict optimal soybean seed variety or mix of varieties in appropriate proportions which is more likely to be chosen by farmers from a growing region. It also allows to analyse solutions generated from our approach and helps in the decision making process by providing insightful visualizations, Comment: 5 pages
- Published
- 2017
46. Predicting Remaining Useful Life using Time Series Embeddings based on Recurrent Neural Networks
- Author
-
Gugulothu, Narendhar, TV, Vishnu, Malhotra, Pankaj, Vig, Lovekesh, Agarwal, Puneet, and Shroff, Gautam
- Subjects
Computer Science - Learning - Abstract
We consider the problem of estimating the remaining useful life (RUL) of a system or a machine from sensor data. Many approaches for RUL estimation based on sensor data make assumptions about how machines degrade. Additionally, sensor data from machines is noisy and often suffers from missing values in many practical settings. We propose Embed-RUL: a novel approach for RUL estimation from sensor data that does not rely on any degradation-trend assumptions, is robust to noise, and handles missing values. Embed-RUL utilizes a sequence-to-sequence model based on Recurrent Neural Networks (RNNs) to generate embeddings for multivariate time series subsequences. The embeddings for normal and degraded machines tend to be different, and are therefore found to be useful for RUL estimation. We show that the embeddings capture the overall pattern in the time series while filtering out the noise, so that the embeddings of two machines with similar operational behavior are close to each other, even when their sensor readings have significant and varying levels of noise content. We perform experiments on publicly available turbofan engine dataset and a proprietary real-world dataset, and demonstrate that Embed-RUL outperforms the previously reported state-of-the-art on several metrics., Comment: Presented at 2nd ML for PHM Workshop at SIGKDD 2017, Halifax, Canada
- Published
- 2017
47. Comparative Benchmarking of Causal Discovery Techniques
- Author
-
Singh, Karamjit, Gupta, Garima, Tewari, Vartika, and Shroff, Gautam
- Subjects
Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub- licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur- ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables., Comment: arXiv admin note: text overlap with arXiv:1506.07669, arXiv:1611.03977 by other authors
- Published
- 2017
48. TimeNet: Pre-trained deep recurrent neural network for time series classification
- Author
-
Malhotra, Pankaj, TV, Vishnu, Vig, Lovekesh, Agarwal, Puneet, and Shroff, Gautam
- Subjects
Computer Science - Learning - Abstract
Inspired by the tremendous success of deep Convolutional Neural Networks as generic feature extractors for images, we propose TimeNet: a deep recurrent neural network (RNN) trained on diverse time series in an unsupervised manner using sequence to sequence (seq2seq) models to extract features from time series. Rather than relying on data from the problem domain, TimeNet attempts to generalize time series representation across domains by ingesting time series from several domains simultaneously. Once trained, TimeNet can be used as a generic off-the-shelf feature extractor for time series. The representations or embeddings given by a pre-trained TimeNet are found to be useful for time series classification (TSC). For several publicly available datasets from UCR TSC Archive and an industrial telematics sensor data from vehicles, we observe that a classifier learned over the TimeNet embeddings yields significantly better performance compared to (i) a classifier learned over the embeddings given by a domain-specific RNN, as well as (ii) a nearest neighbor classifier based on Dynamic Time Warping., Comment: 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2017, Bruges, Belgium
- Published
- 2017
49. A complex network analysis of ethnic conflicts and human rights violations
- Author
-
Sharma, Kiran, Sehgal, Gunjan, Gupta, Bindu, Sharma, Geetika, Chatterjee, Arnab, Chakraborti, Anirban, and Shroff, Gautam
- Subjects
Physics - Physics and Society ,Computer Science - Multiagent Systems ,Computer Science - Social and Information Networks - Abstract
News reports in media contain records of a wide range of socio-economic and political events in time. Using a publicly available, large digital database of news records, and aggregating them over time, we study the network of ethnic conflicts and human rights violations. Complex network analyses of the events and the involved actors provide important insights on the engaging actors, groups, establishments and sometimes nations, pointing at their long range effect over space and time. We find power law decays in distributions of actor mentions, co-actor mentions and degrees and dominance of influential actors and groups. Most influential actors or groups form a giant connected component which grows in time, and is expected to encompass all actors globally in the long run. We demonstrate how targeted removal of actors may help stop spreading unruly events. We study the cause-effect relation between types of events, and our quantitative analysis confirm that ethnic conflicts lead to human rights violations, while it does not support the converse., Comment: 16 pages; main paper + Supplementary Information file; accepted for publication in Scientific Reports
- Published
- 2017
- Full Text
- View/download PDF
50. Interobserver variability among experienced electrocardiogram readers to diagnose acute thrombotic coronary occlusion in patients with out of hospital cardiac arrest: Impact of metabolic milieu and angiographic culprit
- Author
-
Sharma, Amit, Miranda, David F., Rodin, Holly, Bart, Bradley A., Smith, Stephen W., and Shroff, Gautam R.
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.