Author: "Prasanna, Viktor K." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Prasanna, Viktor K."' showing total 1,800 results

Start Over Author "Prasanna, Viktor K."

1,800 results on '"Prasanna, Viktor K."'

1. Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Author: Zhang, Pengmiao, Gupta, Neelesh, Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Hardware Architecture, Computer Science - Machine Learning, Computer Science - Operating Systems
Abstract: Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in high inference latency, limiting their feasibility as practical prefetchers. To close the gap, we propose a new approach based on tabularization that significantly reduces model complexity and inference latency without sacrificing prediction accuracy. Our novel tabularization methodology takes as input a distilled, yet highly accurate attention-based model for memory access prediction and efficiently converts its expensive matrix multiplications into a hierarchy of fast table lookups. As an exemplar of the above approach, we develop DART, a prefetcher comprised of a simple hierarchy of tables. With a modest 0.09 drop in F1-score, DART reduces 99.99% of arithmetic operations from the large attention-based model and 91.83% from the distilled model. DART accelerates the large model inference by 170x and the distilled model by 9.4x. DART has comparable latency and storage costs as state-of-the-art rule-based prefetcher BO but surpasses it by 6.1% in IPC improvement. DART outperforms state-of-the-art NN-based prefetchers TransFetch by 33.1% and Voyager by 37.2% in terms of IPC improvement, primarily due to its low prefetching latency.
Published: 2023

2. Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

Author: Zhang, Pengmiao, Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Machine Learning, Computer Science - Hardware Architecture
Abstract: Memory performance is a bottleneck in graph analytics acceleration. Existing Machine Learning (ML) prefetchers struggle with phase transitions and irregular memory accesses in graph processing. We propose MPGraph, an ML-based Prefetcher for Graph analytics using domain specific models. MPGraph introduces three novel optimizations: soft detection for phase transitions, phase-specific multi-modality models for access delta and page predictions, and chain spatio-temporal prefetching (CSTP) for prefetch control. Our transition detector achieves 34.17-82.15% higher precision compared with Kolmogorov-Smirnov Windowing and decision tree. Our predictors achieve 6.80-16.02% higher F1-score for delta and 11.68-15.41% higher accuracy-at-10 for page prediction compared with LSTM and vanilla attention models. Using CSTP, MPGraph achieves 12.52-21.23% IPC improvement, outperforming state-of-the-art non-ML prefetcher BO by 7.58-12.03% and ML-based prefetchers Voyager and TransFetch by 3.27-4.58%. For practical implementation, we demonstrate MPGraph using compressed models with reduced latency shows significantly superior accuracy and coverage compared with BO, leading to 3.58% higher IPC improvement.
Published: 2022

3. Accelerating Graph Analytics Using Attention-Based Data Prefetcher

Author: Zhang, Pengmiao, Kannan, Rajgopal, Nori, Anant V., and Prasanna, Viktor K.
Published: 2024
Full Text: View/download PDF

4. TransforMAP: Transformer for Memory Access Prediction

Author: Zhang, Pengmiao, Srivastava, Ajitesh, Nori, Anant V., Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program. Prefetching relies on accurate memory access prediction, to which task machine learning based methods are increasingly applied. Unlike previous approaches that learn from deltas or offsets and perform one access prediction, we develop TransforMAP, based on the powerful Transformer model, that can learn from the whole address space and perform multiple cache line predictions. We propose to use the binary of memory addresses as model input, which avoids information loss and saves a token table in hardware. We design a block index bitmap to collect unordered future page offsets under the current page address as learning labels. As a result, our model can learn temporal patterns as well as spatial patterns within a page. In a practical implementation, this approach has the potential to hide prediction latency because it prefetches multiple cache lines likely to be used in a long horizon. We show that our approach achieves 35.67% MPKI improvement and 20.55% IPC improvement in simulation, higher than state-of-the-art Best-Offset prefetcher and ISB prefetcher.
Published: 2022

5. Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

Author: Zhang, Pengmiao, Srivastava, Ajitesh, Nori, Anant V., Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Hardware Architecture, Computer Science - Machine Learning
Abstract: Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem for sequence prediction. However, the vast and sparse memory address space leads to large vocabulary, which makes this modeling impractical. The number and order of outputs for multiple cache line prefetching are also fundamentally different from text prediction. We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs. We apply an attention-based network to learn the mapping between input and output. Prediction experiments demonstrate that address segmentation achieves 26% - 36% higher F1-score than delta inputs and 15% - 24% higher F1-score than page & offset inputs for SPEC 2006, SPEC 2017, and GAP benchmarks. Simulation results show that TransFetch achieves 38.75% IPC improvement compared with no prefetching, outperforming the best-performing rule-based prefetcher BOP by 10.44%, and ML-based prefetcher Voyager by 6.64%.
Published: 2022
Full Text: View/download PDF

6. Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

Author: Zhang, Chi, Kuppannagari, Sanmukh Rao, and Prasanna, Viktor K
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Systems and Control
Abstract: Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and thread-level synchronization overheads on CPU. In this work, we propose a framework for generating scalable reinforcement learning implementations on multi-core systems. Replay Buffer is a key component of RL algorithms which facilitates storage of samples obtained from environmental interactions and data sampling for the learning process. We define a new data structure for Prioritized Replay Buffer based on $K$-ary sum tree that supports asynchronous parallel insertions, sampling, and priority updates. To address the challenge of irregular memory accesses, we propose a novel data layout to store the nodes of the sum tree that reduces the number of cache misses. Additionally, we propose $\textit{lazy writing}$ mechanism to reduce thread-level synchronization overheads of the Replay Buffer operations. Our framework employs parallel actors to concurrently collect data via environmental interactions, and parallel learners to perform stochastic gradient descent using the collected data. Our framework supports a wide range of reinforcement learning algorithms including DQN, DDPG, etc. We demonstrate the effectiveness of our framework in accelerating RL algorithms by performing experiments on CPU + GPU platform using OpenAI benchmarks., Comment: 10 pages. HiPC21
Published: 2021

7. BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Author: Zhang, Chi, Kuppannagari, Sanmukh Rao, and Prasanna, Viktor K
Subjects: Computer Science - Machine Learning
Abstract: Online interactions with the environment to collect data samples for training a Reinforcement Learning (RL) agent is not always feasible due to economic and safety concerns. The goal of Offline Reinforcement Learning is to address this problem by learning effective policies using previously collected datasets. Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions and are hence unsuitable for Offline RL. Behavior regularization, which constraints the learned policy within the support set of the dataset, has been proposed to tackle the limitations of standard off-policy algorithms. In this paper, we improve the behavior regularized offline reinforcement learning and propose BRAC+. First, we propose quantification of the out-of-distribution actions and conduct comparisons between using Kullback-Leibler divergence versus using Maximum Mean Discrepancy as the regularization protocol. We propose an analytical upper bound on the KL divergence as the behavior regularizer to reduce variance associated with sample based estimations. Second, we mathematically show that the learned Q values can diverge even using behavior regularized policy update under mild assumptions. This leads to large overestimations of the Q values and performance deterioration of the learned policy. To mitigate this issue, we add a gradient penalty term to the policy evaluation objective. By doing so, the Q values are guaranteed to converge. On challenging offline RL benchmarks, BRAC+ outperforms the baseline behavior regularized approaches by 40%~87% and the state-of-the-art approach by 6%., Comment: 16 pages. Accepted by ACML21
Published: 2021

8. A High Throughput Parallel Hash Table on FPGA using XOR-based Memory

Author: Zhang, Ruizhi, Wijeratne, Sasindu, Yang, Yang, Kuppannagari, Sanmukh R., and Prasanna, Viktor K.
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Hash table is a fundamental data structure for quick search and retrieval of data. It is a key component in complex graph analytics and AI/ML applications. State-of-the-art parallel hash table implementations either make some simplifying assumptions such as supporting only a subset of hash table operations or employ optimizations that lead to performance that is highly data dependent and in the worst case can be similar to a sequential implementation. In contrast, in this work we develop a dynamic hash table that supports all the hash table queries - search, insert, delete, update, while allowing us to support 'p' parallel queries (p>1) per clock cycle via p processing engines (PEs) in the worst case i.e. the performance is data agnostic. We achieve this by implementing novel XOR based multi-ported block memories on FPGAs. Additionally, we develop a technique to optimize the memory requirement of the hash table if the ratio of search to insert/update/delete queries is known beforehand. We implement our design on state-of-the-art FPGA devices. Our design is scalable to 16 PEs and supports throughput up to 5926 MOPS. It matches the throughput of the state-of-the-art hash table design - FASTHash, which only supports search and insert operations. Comparing with the best FPGA design that supports the same set of operations, our hash table achieves up to 12.3x speedup., Comment: 2020 IEEE High Performance Extreme Computing Conference (HPEC)
Published: 2021
Full Text: View/download PDF

9. The EpiBench Platform to Propel AI/ML-based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-level Performance

Author: Srivastava, Ajitesh, Xu, Tianjian, and Prasanna, Viktor K.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: During the COVID-19 pandemic, a significant effort has gone into developing ML-driven epidemic forecasting techniques. However, benchmarks do not exist to claim if a new AI/ML technique is better than the existing ones. The "covid-forecast-hub" is a collection of more than 30 teams, including us, that submit their forecasts weekly to the CDC. It is not possible to declare whether one method is better than the other using those forecasts because each team's submission may correspond to different techniques over the period and involve human interventions as the teams are continuously changing/tuning their approach. Such forecasts may be considered "human-expert" forecasts and do not qualify as AI/ML approaches, although they can be used as an indicator of human expert performance. We are interested in supporting AI/ML research in epidemic forecasting which can lead to scalable forecasting without human intervention. Which modeling technique, learning strategy, and data pre-processing technique work well for epidemic forecasting is still an open problem. To help advance the state-of-the-art AI/ML applied to epidemiology, a benchmark with a collection of performance points is needed and the current "state-of-the-art" techniques need to be identified. We propose EpiBench a platform consisting of community-driven benchmarks for AI/ML applied to epidemic forecasting to standardize the challenge with a uniform evaluation protocol. In this paper, we introduce a prototype of EpiBench which is currently running and accepting submissions for the task of forecasting COVID-19 cases and deaths in the US states and We demonstrate that we can utilize the prototype to develop an ensemble relying on fully automated epidemic forecasts (no human intervention) that reaches human-expert level ensemble currently being used by the CDC., Comment: 8 pages, 6 figures. Accepted at the 5th International Workshop on Health Intelligence in conjunction with the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
Published: 2021

10. Fast and Accurate Forecasting of COVID-19 Deaths Using the SIkJ$\alpha$ Model

Author: Srivastava, Ajitesh, Xu, Tianjian, and Prasanna, Viktor K.
Subjects: Quantitative Biology - Populations and Evolution, Computer Science - Machine Learning, Physics - Physics and Society
Abstract: Forecasting the effect of COVID-19 is essential to design policies that may prepare us to handle the pandemic. Many methods have already been proposed, particularly, to forecast reported cases and deaths at country-level and state-level. Many of these methods are based on traditional epidemiological model which rely on simulations or Bayesian inference to simultaneously learn many parameters at a time. This makes them prone to over-fitting and slow execution. We propose an extension to our model SIkJ$\alpha$ to forecast deaths and show that it can consider the effect of many complexities of the epidemic process and yet be simplified to a few parameters that are learned using fast linear regressions. We also present an evaluation of our method against seven approaches currently being used by the CDC, based on their two weeks forecast at various times during the pandemic. We demonstrate that our method achieves better root mean squared error compared to these seven approaches during majority of the evaluation period. Further, on a 2 core desktop machine, our approach takes only 3.18s to tune hyper-parameters, learn parameters and generate 100 days of forecasts of reported cases and deaths for all the states in the US. The total execution time for 184 countries is 11.83s and for all the US counties ($>$ 3000) is 101.03s., Comment: Fixed a typo
Published: 2020

11. Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Author: Zhang, Chi, Kuppannagari, Sanmukh Rao, and Prasanna, Viktor K
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Model usage is the central challenge of model-based reinforcement learning. Although dynamics model based on deep neural networks provide good generalization for single step prediction, such ability is over exploited when it is used to predict long horizon trajectories due to compounding errors. In this work, we propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR). To eliminate the compounding errors, we only use our model to generate single-step rollouts. Furthermore, we propose to generate \emph{diverse} model rollouts by non-uniform sampling of the environment states such that the entropy of the model rollouts is maximized. We mathematically derived the maximum entropy sampling criteria for one data case under Gaussian prior. To accomplish this criteria, we propose to utilize a prioritized experience replay. Our preliminary experiments in challenging locomotion benchmarks show that our approach achieves the same sample efficiency of the best model-based algorithms, matches the asymptotic performance of the best model-free algorithms, and significantly reduces the computation requirements of other model-based methods., Comment: ICML BIG Workshop 2020, camera ready version
Published: 2020

12. Data-driven Identification of Number of Unreported Cases for COVID-19: Bounds and Limitations

Author: Srivastava, Ajitesh and Prasanna, Viktor K.
Subjects: Quantitative Biology - Populations and Evolution, Computer Science - Machine Learning
Abstract: Accurate forecasts for COVID-19 are necessary for better preparedness and resource management. Specifically, deciding the response over months or several months requires accurate long-term forecasts which is particularly challenging as the model errors accumulate with time. A critical factor that can hinder accurate long-term forecasts, is the number of unreported/asymptomatic cases. While there have been early serology tests to estimate this number, more tests need to be conducted for more reliable results. To identify the number of unreported/asymptomatic cases, we take an epidemiology data-driven approach. We show that we can identify lower bounds on this ratio or upper bound on actual cases as a factor of reported cases. To do so, we propose an extension of our prior heterogeneous infection rate model, incorporating unreported/asymptomatic cases. We prove that the number of unreported cases can be reliably estimated only from a certain time period of the epidemic data. In doing so, we construct an algorithm called Fixed Infection Rate method, which identifies a reliable bound on the learned ratio. We also propose two heuristics to learn this ratio and show their effectiveness on simulated data. We use our approaches to identify the upper bounds on the ratio of actual to reported cases for New York City and several US states. Our results demonstrate with high confidence that the actual number of cases cannot be more than 35 times in New York, 40 times in Illinois, 38 times in Massachusetts and 29 times in New Jersey, than the reported cases., Comment: Fixed a typo
Published: 2020

13. Learning to Forecast and Forecasting to Learn from the COVID-19 Pandemic

Author: Srivastava, Ajitesh and Prasanna, Viktor K.
Subjects: Quantitative Biology - Populations and Evolution, Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods, Statistics - Machine Learning
Abstract: Accurate forecasts of COVID-19 is central to resource management and building strategies to deal with the epidemic. We propose a heterogeneous infection rate model with human mobility for epidemic modeling, a preliminary version of which we have successfully used during DARPA Grand Challenge 2014. By linearizing the model and using weighted least squares, our model is able to quickly adapt to changing trends and provide extremely accurate predictions of confirmed cases at the level of countries and states of the United States. We show that during the earlier part of the epidemic, using travel data increases the predictions. Training the model to forecast also enables learning characteristics of the epidemic. In particular, we show that changes in model parameters over time can help us quantify how well a state or a country has responded to the epidemic. The variations in parameters also allow us to forecast different scenarios such as what would happen if we were to disregard social distancing suggestions., Comment: 12 pages, 8 figures. Added a figure
Published: 2020

14. Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Performance Prediction

Author: Srivastava, Ajitesh, Zhang, Naifeng, Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Performance, Computer Science - Machine Learning
Abstract: Writing high-performance code requires significant expertise in the programming language, compiler optimizations, and hardware knowledge. This often leads to poor productivity and portability and is inconvenient for a non-programmer domain-specialist such as a Physicist. More desirable is a high-level language where the domain-specialist simply specifies the workload in terms of high-level operations (e.g., matrix-multiply(A, B)), and the compiler identifies the best implementation fully utilizing the heterogeneous platform. For creating a compiler that supports productivity, portability, and performance simultaneously, it is crucial to predict the performance of various available implementations (variants) of the dominant operations (kernels) contained in the workload on various hardware to decide (a) which variant should be chosen for each kernel in the workload, and (b) on which hardware resource the variant should run. To enable the performance prediction, we propose lightweight augmented neural networks for arbitrary combinations of kernel-variant-hardware. A key innovation is utilizing the mathematical complexity of the kernels as a feature to achieve higher accuracy. These models are compact to reduce training time and fast inference during compile-time and run-time. Using models with less than 75 parameters, and only 250 training data instances, we are able to obtain a low MAPE of 3%, significantly outperforming traditional feed-forward neural networks on 48 kernel-variant-hardware combinations. We further demonstrate that our variant-selection approach can be used in Halide implementations to obtain up to 1.7x speedup over Halide's auto-scheduler.
Published: 2020

15. Building HVAC Scheduling Using Reinforcement Learning via Neural Network Based Model Approximation

Author: Zhang, Chi, Kuppannagari, Sanmukh R., Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Buildings sector is one of the major consumers of energy in the United States. The buildings HVAC (Heating, Ventilation, and Air Conditioning) systems, whose functionality is to maintain thermal comfort and indoor air quality (IAQ), account for almost half of the energy consumed by the buildings. Thus, intelligent scheduling of the building HVAC system has the potential for tremendous energy and cost savings while ensuring that the control objectives (thermal comfort, air quality) are satisfied. Recently, several works have focused on model-free deep reinforcement learning based techniques such as Deep Q-Network (DQN). Such methods require extensive interactions with the environment. Thus, they are impractical to implement in real systems due to low sample efficiency. Safety-aware exploration is another challenge in real systems since certain actions at particular states may result in catastrophic outcomes. To address these issues and challenges, we propose a model-based reinforcement learning approach that learns the system dynamics using a neural network. Then, we adopt Model Predictive Control (MPC) using the learned system dynamics to perform control with random-sampling shooting method. To ensure safe exploration, we limit the actions within safe range and the maximum absolute change of actions according to prior knowledge. We evaluate our ideas through simulation using widely adopted EnergyPlus tool on a case study consisting of a two zone data-center. Experiments show that the average deviation of the trajectories sampled from the learned dynamics and the ground truth is below $20\%$. Compared with baseline approaches, we reduce the total energy consumption by $17.1\% \sim 21.8\%$. Compared with model-free reinforcement learning approach, we reduce the required number of training steps to converge by 10x., Comment: 10 pages, 13 figures, to be appear in ACM BuildSys '19, November 13-14, 2019, New York, NY, USA
Published: 2019
Full Text: View/download PDF

16. The EpiBench Platform to Propel AI/ML-Based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-Level Performance

Author: Srivastava, Ajitesh, Xu, Tianjian, Prasanna, Viktor K., Kacprzyk, Janusz, Series Editor, Shaban-Nejad, Arash, editor, Michalowski, Martin, editor, and Bianco, Simone, editor
Published: 2022
Full Text: View/download PDF

17. Socio-demographic Characteristics Prediction Using Soft Clustering of Load Consumption Data

Author: Cheung, Chung Ming, Kuppannagari, Sanmukh Rao, Prasanna, Viktor K., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2022
Full Text: View/download PDF

18. Not all Embeddings are created Equal: Extracting Entity-specific Substructures for RDF Graph Embedding

Author: Saeed, Muhammad Rizwan, Chelmis, Charalampos, and Prasanna, Viktor K.
Subjects: Computer Science - Artificial Intelligence
Abstract: Knowledge Graphs (KGs) are becoming essential to information systems that require access to structured data. Several approaches have been recently proposed, for obtaining vector representations of KGs suitable for Machine Learning tasks, based on identifying and extracting relevant graph substructures using uniform and biased random walks. However, such approaches lead to representations comprising mostly "popular", instead of "relevant", entities in the KG. In KGs, in which different types of entities often exist (such as in Linked Open Data), a given target entity may have its own distinct set of most "relevant" nodes and edges. We propose specificity as an accurate measure of identifying most relevant, entity-specific, nodes and edges. We develop a scalable method based on bidirectional random walks to compute specificity. Our experimental evaluation results show that specificity-based biased random walks extract more "meaningful" (in terms of size and relevance) RDF substructures compared to the state-of-the-art and, the graph embedding learned from the extracted substructures, outperform existing techniques in the task of entity recommendation in DBpedia., Comment: 16 pages
Published: 2018

19. Optimal Net-Load Balancing in Smart Grids with High PV Penetration

Author: Kuppannagari, Sanmukh R., Kannan, Rajgopal, and Prasanna, Viktor K.
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Systems and Control
Abstract: Mitigating Supply-Demand mismatch is critical for smooth power grid operation. Traditionally, load curtailment techniques such as Demand Response (DR) have been used for this purpose. However, these cannot be the only component of a net-load balancing framework for Smart Grids with high PV penetration. These grids can sometimes exhibit supply surplus causing over-voltages. Supply curtailment techniques such as Volt-Var Optimizations are complex and computationally expensive. This increases the complexity of net-load balancing systems used by the grid operator and limits their scalability. Recently new technologies have been developed that enable the rapid and selective connection of PV modules of an installation to the grid. Taking advantage of these advancements, we develop a unified optimal net-load balancing framework which performs both load and solar curtailment. We show that when the available curtailment values are discrete, this problem is NP-hard and develop bounded approximation algorithms for minimizing the curtailment cost. Our algorithms produce fast solutions, given the tight timing constraints required for grid operation. We also incorporate the notion of fairness to ensure that curtailment is evenly distributed among all the nodes. Finally, we develop an online algorithm which performs net-load balancing using only data available for the current interval. Using both theoretical analysis and practical evaluations, we show that our net-load balancing algorithms provide solutions which are close to optimal in a small amount of time., Comment: 11 pages. To be published in the 4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys 17) Changes from previous version: Fixed a bug in Algorithm 1 which was causing some min cost solutions to be missed
Published: 2017
Full Text: View/download PDF

20. FPGA Acceleration of Number Theoretic Transform

Author: Ye, Tian, Yang, Yang, Kuppannagari, Sanmukh R., Kannan, Rajgopal, Prasanna, Viktor K., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chamberlain, Bradford L., editor, Varbanescu, Ana-Lucia, editor, Ltaief, Hatem, editor, and Luszczek, Piotr, editor
Published: 2021
Full Text: View/download PDF

21. C-MemMAP: clustering-driven compact, adaptable, and generalizable meta-LSTM models for memory access prediction

Author: Zhang, Pengmiao, Srivastava, Ajitesh, Wang, Ta-Yang, De Rose, Cesar A. F., Kannan, Rajgopal, and Prasanna, Viktor K.
Published: 2022
Full Text: View/download PDF

22. Computational Models for Cascades in Massive Graphs

Author: Srivastava, Ajitesh, primary, Chelmis, Charalampos, additional, and Prasanna, Viktor K., additional
Published: 2022
Full Text: View/download PDF

23. FASTHash: FPGA-Based High Throughput Parallel Hash Table

Author: Yang, Yang, Kuppannagari, Sanmukh R., Srivastava, Ajitesh, Kannan, Rajgopal, Prasanna, Viktor K., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sadayappan, Ponnuswamy, editor, Chamberlain, Bradford L., editor, Juckeland, Guido, editor, and Ltaief, Hatem, editor
Published: 2020
Full Text: View/download PDF

24. MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction

Author: Srivastava, Ajitesh, Wang, Ta-Yang, Zhang, Pengmiao, De Rose, Cesar Augusto F., Kannan, Rajgopal, Prasanna, Viktor K., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lauw, Hady W., editor, Wong, Raymond Chi-Wing, editor, Ntoulas, Alexandros, editor, Lim, Ee-Peng, editor, Ng, See-Kiong, editor, and Pan, Sinno Jialin, editor
Published: 2020
Full Text: View/download PDF

25. The EpiBench Platform to Propel AI/ML-Based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-Level Performance

Author: Srivastava, Ajitesh, primary, Xu, Tianjian, additional, and Prasanna, Viktor K., additional
Published: 2022
Full Text: View/download PDF

26. VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks

Author: Zhang, Bingyi, Kannan, Rajgopal, Busart, Carl, and Prasanna, Viktor K.
Abstract: The emergence of diverse machine learning (ML) models has led to groundbreaking revolutions in computer vision (CV). These ML models include convolutional neural networks (CNNs), graph neural networks (GNNs), and vision transformers (ViTs). However, existing hardware accelerators designed for CV lack the versatility to support various ML models, potentially limiting their applicability to real-world scenarios. To address this limitation, we introduce VisionAGILE, a domain-specific accelerator designed to be versatile and capable of accommodating a range of ML models, including CNNs, GNNs, and ViTs. VisionAGILE comprises a compiler, a runtime system, and a hardware accelerator. For the hardware accelerator, we develop a novel unified architecture with a flexible data path and memory organization to support the computation primitives in various ML models. Regarding the compiler design, we develop a unified compilation workflow that maps various ML models to the proposed hardware accelerator. The runtime system executes dynamic sparsity exploitation to reduce inference latency and dynamic task scheduling for workload balance. The compiler, the runtime system, and the hardware accelerator work synergistically to support a variety of ML models in CV, enabling low-latency inference. We deploy the hardware accelerator on a state-of-the-art data center FPGA (Xilinx Alveo U250). We evaluate VisionAGILE on diverse ML models for CV, including CNNs, GNNs, hybrid models (comprising both CNN and GNN), and ViTs. The experimental results indicate that, compared with state-of-the-art CPU (GPU) implementations, VisionAGILE achieves a speedup of $81.7\times$81.7× ($4.8\times$4.8×) in terms of latency. Evaluated on standalone CNNs, GNNs, and ViTs, VisionAGILE demonstrates comparable or higher performance with state-of-the-art CNN accelerators, GNN accelerators, and ViT accelerators, respectively.
Published: 2024
Full Text: View/download PDF

27. Socio-demographic Characteristics Prediction Using Soft Clustering of Load Consumption Data

Author: Cheung, Chung Ming, primary, Kuppannagari, Sanmukh Rao, additional, and Prasanna, Viktor K., additional
Published: 2021
Full Text: View/download PDF

28. Holistic Measures for Evaluating Prediction Models in Smart Grids

Author: Aman, Saima, Simmhan, Yogesh, and Prasanna, Viktor K.
Subjects: Computer Science - Machine Learning
Abstract: The performance of prediction models is often based on "abstract metrics" that estimate the model's ability to limit residual errors between the observed and predicted values. However, meaningful evaluation and selection of prediction models for end-user domains requires holistic and application-sensitive performance measures. Inspired by energy consumption prediction models used in the emerging "big data" domain of Smart Power Grids, we propose a suite of performance measures to rationally compare models along the dimensions of scale independence, reliability, volatility and cost. We include both application independent and dependent measures, the latter parameterized to allow customization by domain experts to fit their scenario. While our measures are generalizable to other domains, we offer an empirical analysis using real energy use data for three Smart Grid applications: planning, customer education and demand response, which are relevant for energy sustainability. Our results underscore the value of the proposed measures to offer a deeper insight into models' behavior and their impact on real applications, which benefit both data mining researchers and practitioners., Comment: 14 Pages, 8 figures, Accepted and to appear in IEEE Transactions on Knowledge and Data Engineering, 2014. Authors' final version. Copyright transferred to IEEE
Published: 2014
Full Text: View/download PDF

29. Bidirectional Pipelining for Scalable IP Lookup and Packet Classification

Author: Jiang, Weirong, Le, Hoang, and Prasanna, Viktor K.
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Data Structures and Algorithms, 68M10
Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for scalable pipelined solutions. This paper proposes a flexible bidirectional linear pipeline architecture based on widely-used dual-port SRAMs. A search tree is partitioned, and then mapped onto pipeline stages by a bidirectional fine-grained mapping scheme. We introduce the notion of inversion factor and several heuristics to invert subtrees for memory balancing. Due to its linear structure, the architecture maintains packet input order, and supports non-blocking route updates. Our experiments show that, the architecture can achieve a perfectly balanced memory distribution over the pipeline stages, for both trie-based IP lookup and tree-based multi-dimensional packet classification. For IP lookup, it can store a full backbone routing table with 154419 entries using 2MB of memory, and sustain a high throughput of 1.87 billion packets per second (GPPS), i.e. 0.6 Tbps for the minimum size (40 bytes) packets. The throughput can be improved further to be 2.4 Tbps, by employing caching to exploit the Internet traffic locality., Comment: tech report
Published: 2011

30. Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

Author: Zhang, Pengmiao, primary, Kannan, Rajgopal, additional, and Prasanna, Viktor K., additional
Published: 2023
Full Text: View/download PDF

31. Parallel Totally Induced Edge Sampling on FPGAs1

Author: Goel, Akshit, primary, Kuppannagari, Sanmukh R., additional, Yang, Yang, additional, Srivastava, Ajitesh, additional, and Prasanna, Viktor K., additional
Published: 2020
Full Text: View/download PDF

32. MemMAP: Compact and Generalizable Meta-LSTM Models for Memory Access Prediction

Author: Srivastava, Ajitesh, primary, Wang, Ta-Yang, additional, Zhang, Pengmiao, additional, De Rose, Cesar Augusto F., additional, Kannan, Rajgopal, additional, and Prasanna, Viktor K., additional
Published: 2020
Full Text: View/download PDF

33. G-MAP: A Graph Neural Network-Based Framework for Memory Access Prediction

Author: Gorle, Abhiram Rao, primary, Zhang, Pengmiao, additional, Kannan, Rajgopal, additional, and Prasanna, Viktor K., additional
Published: 2023
Full Text: View/download PDF

34. GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN Inference

Author: Zhang, Bingyi, primary, Zeng, Hanqing, additional, and Prasanna, Viktor K., additional
Published: 2023
Full Text: View/download PDF

35. PRIMER – A Regression-Rule Learning System for Intervention Optimization

Author: Harris, Greg, Panangadan, Anand, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Alferes, Jose Julio, editor, Bertossi, Leopoldo, editor, Governatori, Guido, editor, Fodor, Paul, editor, and Roman, Dumitru, editor
Published: 2016
Full Text: View/download PDF

36. FPGA-Based Acceleration of Pattern Matching in YARA

Author: Singapura, Shreyas G., Yang, Yi-Hua E., Panangadan, Anand, Nemeth, Tamas, Ng, Peter, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Bonato, Vanderlei, editor, Bouganis, Christos, editor, and Gorgon, Marek, editor
Published: 2016
Full Text: View/download PDF

37. Event Extraction from Unstructured Text Data

Author: Shang, Chao, Panangadan, Anand, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Qiming, editor, Hameurlain, Abdelkader, editor, Toumani, Farouk, editor, Wagner, Roland, editor, and Decker, Hendrik, editor
Published: 2015
Full Text: View/download PDF

38. UFOMQ: An Algorithm for Querying for Similar Individuals in Heterogeneous Ontologies

Author: Zhang, Yinuo, Panangadan, Anand, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Madria, Sanjay, editor, and Hara, Takahiro, editor
Published: 2015
Full Text: View/download PDF

39. Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA

Author: Chen, Ren, Singapura, Shreyas G., Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Malyshkin, Victor, editor
Published: 2015
Full Text: View/download PDF

40. FP-CPNNQ: A Filter-Based Protocol for Continuous Probabilistic Nearest Neighbor Query

Author: Zhang, Yinuo, Panangadan, Anand, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Renz, Matthias, editor, Shahabi, Cyrus, editor, Zhou, Xiaofang, editor, and Cheema, Muhammad Aamir, editor
Published: 2015
Full Text: View/download PDF

41. Learning of Performance Measures from Crowd-Sourced Data with Application to Ranking of Investments

Author: Harris, Greg, Panangadan, Anand, Prasanna, Viktor K., Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Cao, Tru, editor, Lim, Ee-Peng, editor, Zhou, Zhi-Hua, editor, Ho, Tu-Bao, editor, Cheung, David, editor, and Motoda, Hiroshi, editor
Published: 2015
Full Text: View/download PDF

42. DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-Based Systems

Author: Chen, Ren, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Sano, Kentaro, editor, Soudris, Dimitrios, editor, Hübner, Michael, editor, and Diniz, Pedro C., editor
Published: 2015
Full Text: View/download PDF

43. Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Author: Singapura, Shreyas G., Panangadan, Anand, Prasanna, Viktor K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Sano, Kentaro, editor, Soudris, Dimitrios, editor, Hübner, Michael, editor, and Diniz, Pedro C., editor
Published: 2015
Full Text: View/download PDF

44. Packet Classification on Multi-core Platforms

Author: Qu, Yun R., Zhou, Shijie, Prasanna, Viktor K., Khan, Samee U., editor, and Zomaya, Albert Y., editor
Published: 2015
Full Text: View/download PDF

45. Network Virtualization in Data Centers: A Data Plane Perspective

Author: Jiang, Weirong, Prasanna, Viktor K., Khan, Samee U., editor, and Zomaya, Albert Y., editor
Published: 2015
Full Text: View/download PDF

46. Compact hash tables for decision-trees

Author: Qu, Yun R. and Prasanna, Viktor K.
Published: 2016
Full Text: View/download PDF

47. Sparse Causal Temporal Modeling to Inform Power System Defense

Author: Misyrlis, Michail, Kannan, Rajgopal, Chelmis, Charalampos, and Prasanna, Viktor K.
Published: 2016
Full Text: View/download PDF

48. HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

Author: Lin, Yi-Chien, Zhang, Bingyi, and Prasanna, Viktor K.
Abstract: As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
Published: 2024
Full Text: View/download PDF

49. Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA

Author: Chen, Ren, Singapura, Shreyas G., and Prasanna, Viktor K.
Published: 2017
Full Text: View/download PDF

50. Multi-core Implementation of Decomposition-Based Packet Classification Algorithms

Author: Zhou, Shijie, Qu, Yun R., Prasanna, Viktor K., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, and Malyshkin, Victor, editor
Published: 2013
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

1,800 results on '"Prasanna, Viktor K."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources