Author: "Willke, Theodore" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Willke, Theodore"' showing total 27 results

Start Over Author "Willke, Theodore" Database arXiv

27 results on '"Willke, Theodore"'

1. A structure-aware framework for learning device placements on computation graphs

Author: Duan, Shukai, Ping, Heng, Kanakaris, Nikos, Xiao, Xiongye, Zhang, Peiyu, Kyriakis, Panagiotis, Ahmed, Nesreen K., Ma, Guixiang, Capota, Mihai, Nazarian, Shahin, Willke, Theodore L., and Bogdan, Paul
Subjects: Computer Science - Machine Learning, Computer Science - Performance
Abstract: Existing approaches for device placement ignore the topological features of computation graphs and rely mostly on heuristic methods for graph partitioning. At the same time, they either follow a grouper-placer or an encoder-placer architecture, which requires understanding the interaction structure between code operations. To bridge the gap between encoder-placer and grouper-placer techniques, we propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit using reinforcement learning. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into consideration the directed and acyclic nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and personalized graph partitioning jointly, using an unspecified number of groups. To train the entire framework, we utilize reinforcement learning techniques by employing the execution time of the suggested device placements to formulate the reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to $58.2\%$ over CPU execution and by up to $60.24\%$ compared to other commonly used baselines.
Published: 2024

2. Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

Author: Cheng, Kewei, Ahmed, Nesreen K., Willke, Theodore, and Sun, Yizhou
Subjects: Computer Science - Computation and Language
Abstract: Although Large Language Models (LLMs) excel at addressing straightforward reasoning tasks, they frequently struggle with difficulties when confronted by more complex multi-step reasoning due to a range of factors. Firstly, natural language often encompasses complex relationships among entities, making it challenging to maintain a clear reasoning chain over longer spans. Secondly, the abundance of linguistic diversity means that the same entities and relationships can be expressed using different terminologies and structures, complicating the task of identifying and establishing connections between multiple pieces of information. Graphs provide an effective solution to represent data rich in relational information and capture long-term dependencies among entities. To harness the potential of graphs, our paper introduces Structure Guided Prompt, an innovative three-stage task-agnostic prompting framework designed to improve the multi-step reasoning capabilities of LLMs in a zero-shot setting. This framework explicitly converts unstructured text into a graph via LLMs and instructs them to navigate this graph using task-specific strategies to formulate responses. By effectively organizing information and guiding navigation, it enables LLMs to provide more accurate and context-aware responses. Our experiments show that this framework significantly enhances the reasoning capabilities of LLMs, enabling them to excel in a broader spectrum of natural language scenarios.
Published: 2024

3. Locally-Adaptive Quantization for Streaming Vector Search

Author: Aguerrebere, Cecilia, Hildebrand, Mark, Bhati, Ishwar Singh, Willke, Theodore, and Tepper, Mariano
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval
Abstract: Retrieving the most similar vector embeddings to a given query among a massive collection of vectors has long been a key component of countless real-world applications. The recently introduced Retrieval-Augmented Generation is one of the most prominent examples. For many of these applications, the database evolves over time by inserting new data and removing outdated data. In these cases, the retrieval problem is known as streaming similarity search. While Locally-Adaptive Vector Quantization (LVQ), a highly efficient vector compression method, yields state-of-the-art search performance for non-evolving databases, its usefulness in the streaming setting has not been yet established. In this work, we study LVQ in streaming similarity search. In support of our evaluation, we introduce two improvements of LVQ: Turbo LVQ and multi-means LVQ that boost its search performance by up to 28% and 27%, respectively. Our studies show that LVQ and its new variants enable blazing fast vector search, outperforming its closest competitor by up to 9.4x for identically distributed data and by up to 8.8x under the challenging scenario of data distribution shifts (i.e., where the statistical distribution of the data changes over time). We release our contributions as part of Scalable Vector Search, an open-source library for high-performance similarity search.
Published: 2024

4. The Landscape and Challenges of HPC Research and LLMs

Author: Chen, Le, Ahmed, Nesreen K., Dutta, Akash, Bhattacharjee, Arijit, Yu, Sixing, Mahmud, Quazi Ishtiaque, Abebe, Waqwoya, Phan, Hung, Sarkar, Aishwarya, Butler, Branden, Hasabnis, Niranjan, Oren, Gal, Vo, Vy A., Munoz, Juan Pablo, Willke, Theodore L., Mattson, Tim, and Jannesari, Ali
Subjects: Computer Science - Machine Learning
Abstract: Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing and code-based tasks. Over the past several years, many research labs and institutions have invested heavily in high-performance computing, approaching or breaching exascale performance levels. In this paper, we posit that adapting and utilizing such language model-based techniques for tasks in high-performance computing (HPC) would be very beneficial. This study presents our reasoning behind the aforementioned position and highlights how existing ideas can be improved and adapted for HPC tasks.
Published: 2024

5. Leveraging Reinforcement Learning and Large Language Models for Code Optimization

Author: Duan, Shukai, Kanakaris, Nikos, Xiao, Xiongye, Ping, Heng, Zhou, Chenyu, Ahmed, Nesreen K., Ma, Guixiang, Capota, Mihai, Willke, Theodore L., Nazarian, Shahin, and Bogdan, Paul
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: Code optimization is a daunting task that requires a significant level of expertise from experienced programmers. This level of expertise is not sufficient when compared to the rapid development of new hardware architectures. Towards advancing the whole code optimization process, recent approaches rely on machine learning and artificial intelligence techniques. This paper introduces a new framework to decrease the complexity of code optimization. The proposed framework builds on large language models (LLMs) and reinforcement learning (RL) and enables LLMs to receive feedback from their environment (i.e., unit tests) during the fine-tuning process. We compare our framework with existing state-of-the-art models and show that it is more efficient with respect to speed and computational usage, as a result of the decrement in training steps and its applicability to models with fewer parameters. Additionally, our framework reduces the possibility of logical and syntactical errors. Toward evaluating our approach, we run several experiments on the PIE dataset using a CodeT5 language model and RRHF, a new reinforcement learning algorithm. We adopt a variety of evaluation metrics with regards to optimization quality, and speedup. The evaluation results demonstrate that the proposed framework has similar results in comparison with existing models using shorter training times and smaller pre-trained models. In particular, we accomplish an increase of 5.6% and 2.2 over the baseline models concerning the %OP T and SP metrics.
Published: 2023

6. Memory-Augmented Graph Neural Networks: A Brain-Inspired Review

Author: Ma, Guixiang, Vo, Vy A., Willke, Theodore, and Ahmed, Nesreen K.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: We provide a comprehensive review of the existing literature on memory-augmented GNNs. We review these works through the lens of psychology and neuroscience, which has several established theories on how multiple memory systems and mechanisms operate in biological brains. We propose a taxonomy of memory-augmented GNNs and a set of criteria for comparing their memory mechanisms. We also provide critical discussions on the limitations of these works. Finally, we discuss the challenges and future directions for this area.
Published: 2022

7. End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

Author: Xiao, Yao, Ma, Guixiang, Ahmed, Nesreen K., Capota, Mihai, Willke, Theodore, Nazarian, Shahin, and Bogdan, Paul
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of high-level programs down to the universal intermediate representation, extracting the specific computational patterns and predicting which code segments would run best on a specific core in heterogeneous hardware platforms. The proposed framework extracts multi-fractal topological features from code graphs, utilizes graph autoencoders to learn how to partition the graph into computational kernels, and exploits graph neural networks (GNN) to predict the correct assignment to a processor type. In the evaluation, we validate the PGL framework and demonstrate a maximum speedup of 6.42x compared to the thread-based execution, and 2.02x compared to the state-of-the-art technique.
Published: 2022

8. A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems

Author: Ma, Guixiang, Xiao, Yao, Willke, Theodore L., Ahmed, Nesreen K., Nazarian, Shahin, and Bogdan, Paul
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems.The rapid increase in the consumption of memory and computational resources by these models demands the use of multi-core parallel systems to scale the execution of the complex emerging applications that depend on them. However, parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections. In this paper, we propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems. We design a vertex cut framework for partitioning LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters. First, we construct LLVM graphs by compiling high-level programs into LLVM IR, instrumenting code to obtain the execution order of basic blocks and the execution time for each memory operation, and analyze data dependencies in dynamic LLVM traces. Next, we formulate the problem as Weight Balanced $p$-way Vertex Cut, and propose a generic and flexible framework, wherein four different greedy algorithms are proposed for solving this problem. Lastly, we propose a memory-centric run-time mapping of the linear time complexity to map clusters generated from the vertex cut algorithms onto a multi-core platform. We conclude that our best algorithm, WB-Libra, provides performance improvements of 1.56x and 1.86x over existing state-of-the-art approaches for 8 and 1024 clusters running on a multi-core platform, respectively.
Published: 2020

9. Navigating the Trade-Off between Multi-Task Learning and Learning to Multitask in Deep Neural Networks

Author: Ravi, Sachin, Musslick, Sebastian, Hamin, Maia, Willke, Theodore L., and Cohen, Jonathan D.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: The terms multi-task learning and multitasking are easily confused. Multi-task learning refers to a paradigm in machine learning in which a network is trained on various related tasks to facilitate the acquisition of tasks. In contrast, multitasking is used to indicate, especially in the cognitive science literature, the ability to execute multiple tasks simultaneously. While multi-task learning exploits the discovery of common structure between tasks in the form of shared representations, multitasking is promoted by separating representations between tasks to avoid processing interference. Here, we build on previous work involving shallow networks and simple task settings suggesting that there is a trade-off between multi-task learning and multitasking, mediated by the use of shared versus separated representations. We show that the same tension arises in deep networks and discuss a meta-learning algorithm for an agent to manage this trade-off in an unfamiliar environment. We display through different experiments that the agent is able to successfully optimize its training strategy as a function of the environment.
Published: 2020

10. Deep Graph Similarity Learning: A Survey

Author: Ma, Guixiang, Ahmed, Nesreen K., Willke, Theodore L., and Yu, Philip S.
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In many domains where data are represented as graphs, learning a similarity metric among graphs is considered a key problem, which can further facilitate various learning tasks, such as classification, clustering, and similarity search. Recently, there has been an increasing interest in deep graph similarity learning, where the key idea is to learn a deep learning model that maps input graphs to a target space such that the distance in the target space approximates the structural distance in the input space. Here, we provide a comprehensive review of the existing literature of deep graph similarity learning. We propose a systematic taxonomy for the methods and applications. Finally, we discuss the challenges and future directions for this problem.
Published: 2019

11. Approximating Stacked and Bidirectional Recurrent Architectures with the Delayed Recurrent Neural Network

Author: Turek, Javier S., Jain, Shailee, Vo, Vy, Capota, Mihai, Huth, Alexander G., and Willke, Theodore L.
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning, 62M45, I.2.6, I.5.1
Abstract: Recent work has shown that topological enhancements to recurrent neural networks (RNNs) can increase their expressiveness and representational capacity. Two popular enhancements are stacked RNNs, which increases the capacity for learning non-linear functions, and bidirectional processing, which exploits acausal information in a sequence. In this work, we explore the delayed-RNN, which is a single-layer RNN that has a delay between the input and output. We prove that a weight-constrained version of the delayed-RNN is equivalent to a stacked-RNN. We also show that the delay gives rise to partial acausality, much like bidirectional networks. Synthetic experiments confirm that the delayed-RNN can mimic bidirectional networks, solving some acausal tasks similarly, and outperforming them in others. Moreover, we show similar performance to bidirectional networks in a real-world natural language processing task. These results suggest that delayed-RNNs can approximate topologies including stacked RNNs, bidirectional RNNs, and stacked bidirectional RNNs - but with equivalent or faster runtimes for the delayed-RNNs., Comment: to be published in Proceedings of International Conference on Machine Learning 2020 (ICML)
Published: 2019

12. Clinically Deployed Distributed Magnetic Resonance Imaging Reconstruction: Application to Pediatric Knee Imaging

Author: Anderson, Michael J., Tamir, Jonathan I., Turek, Javier S., Alley, Marcus T., Willke, Theodore L., Vasanawala, Shreyas S., and Lustig, Michael
Subjects: Physics - Medical Physics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing, 68W15, 68U10
Abstract: Magnetic resonance imaging is capable of producing volumetric images without ionizing radiation. Nonetheless, long acquisitions lead to prohibitively long exams. Compressed sensing (CS) can enable faster scanning via sub-sampling with reduced artifacts. However, CS requires significantly higher reconstruction computation, limiting current clinical applications to 2D/3D or limited-resolution dynamic imaging. Here we analyze the practical limitations to T2 Shuffling, a four-dimensional CS-based acquisition, which provides sharp 3D-isotropic-resolution and multi-contrast images in a single scan. Our improvements to the pipeline on a single machine provide a 3x overall reconstruction speedup, which allowed us to add algorithmic changes improving image quality. Using four machines, we achieved additional 2.1x improvement through distributed parallelization. Our solution reduced the reconstruction time in the hospital to 90 seconds on a 4-node cluster, enabling its use clinically. To understand the implications of scaling this application, we simulated running our reconstructions with a multiple scanner setup typical in hospitals.
Published: 2018

13. Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers

Author: Vyas, Apoorv, Jammalamadaka, Nataraj, Zhu, Xia, Das, Dipankar, Kaul, Bharat, and Willke, Theodore L.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leaving out a random subset of training data as OOD data and the rest as in-distribution (ID) data. We propose a novel margin-based loss over the softmax output which seeks to maintain at least a margin $m$ between the average entropy of the OOD and in-distribution samples. In conjunction with the standard cross-entropy loss, we minimize the novel loss to train an ensemble of classifiers. We also propose a novel method to combine the outputs of the ensemble of classifiers to obtain OOD detection score and class prediction. Overall, our method convincingly outperforms Hendrycks et al.[7] and the current state-of-the-art ODIN[13] on several OOD detection benchmarks.
Published: 2018

14. Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

Author: Tang, Linpeng, Wang, Yida, Willke, Theodore L., and Li, Kai
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: For a deep learning model, efficient execution of its computation graph is key to achieving high performance. Previous work has focused on improving the performance for individual nodes of the computation graph, while ignoring the parallelization of the graph as a whole. However, we observe that running multiple operations simultaneously without interference is critical to efficiently perform parallelizable small operations. The attempt of executing the computation graph in parallel in deep learning frameworks usually involves much resource contention among concurrent operations, leading to inferior performance on manycore CPUs. To address these issues, in this paper, we propose Graphi, a generic and high-performance execution engine to efficiently execute a computation graph in parallel on manycore CPUs. Specifically, Graphi minimizes the interference on both software/hardware resources, discovers the best parallel setting with a profiler, and further optimizes graph execution with the critical-path first scheduling. Our experiments show that the parallel execution consistently outperforms the sequential one. The training times on four different neural networks with Graphi are 2.1x to 9.5x faster than those with TensorFlow on a 68-core Intel Xeon Phi processor.
Published: 2018

15. Temporo-Spatial Collaborative Filtering for Parameter Estimation in Noisy DCE-MRI Sequences: Application to Breast Cancer Chemotherapy Response

Author: Zhu, Xia, Sengupta, Dipanjan, Beers, Andrew, Jayashree, Kalpathy-Cramer, and Willke, Theodore L.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a minimally invasive imaging technique which can be used for characterizing tumor biology and tumor response to radiotherapy. Pharmacokinetic (PK) estimation is widely used for DCE-MRI data analysis to extract quantitative parameters relating to microvascu- lature characteristics of the cancerous tissues. Unavoidable noise corruption during DCE-MRI data acquisition has a large effect on the accuracy of PK estimation. In this paper, we propose a general denoising paradigm called gather- noise attenuation and reduce (GNR) and a novel temporal-spatial collaborative filtering (TSCF) denoising technique for DCE-MRI data. TSCF takes advantage of temporal correlation in DCE-MRI, as well as anatomical spatial similar- ity to collaboratively filter noisy DCE-MRI data. The proposed TSCF denoising algorithm decreases the PK parameter normalized estimation error by 57% and improves the structural similarity of PK parameter estimation by 86% com- pared to baseline without denoising, while being an order of magnitude faster than state-of-the-art denoising methods. TSCF improves the univariate linear regression (ULR) c-statistic value for early prediction of pathologic response up to 18%, and shows complete separation of pathologic complete response (pCR) and non-pCR groups on a challenge dataset.
Published: 2018

16. Learning Role-based Graph Embeddings

Author: Ahmed, Nesreen K., Rossi, Ryan, Lee, John Boaz, Willke, Theodore L., Zhou, Rong, Kong, Xiangnan, and Eldardiry, Hoda
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Social and Information Networks, Statistics - Applications
Abstract: Random walks are at the heart of many existing network embedding methods. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to vertex identity. In this work, we introduce the Role2Vec framework which uses the flexible notion of attributed random walks, and serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16.55% while requiring on average 853x less space than existing methods on a variety of graphs., Comment: StarAI workshop @ IJCAI 2018
Published: 2018

17. Segmenting Brain Tumors with Symmetry

Author: Zhang, Hejia, Zhu, Xia, and Willke, Theodore L.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We explore encoding brain symmetry into a neural network for a brain tumor segmentation task. A healthy human brain is symmetric at a high level of abstraction, and the high-level asymmetric parts are more likely to be tumor regions. Paying more attention to asymmetries has the potential to boost the performance in brain tumor segmentation. We propose a method to encode brain symmetry into existing neural networks and apply the method to a state-of-the-art neural network for medical imaging segmentation. We evaluate our symmetry-encoded network on the dataset from a brain tumor segmentation challenge and verify that the new model extracts information in the training images more efficiently than the original model., Comment: NIPS ML4H Workshop 2017
Published: 2017

18. Inductive Representation Learning in Large Attributed Graphs

Author: Ahmed, Nesreen K., Rossi, Ryan A., Zhou, Rong, Lee, John Boaz, Kong, Xiangnan, Willke, Theodore L., and Eldardiry, Hoda
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning, Computer Science - Social and Information Networks
Abstract: Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs. To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\Phi : \mathrm{\rm \bf x} \rightarrow w$ that maps a node attribute vector $\mathrm{\rm \bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks., Comment: NIPS WiML
Published: 2017

19. A Framework for Generalizing Graph-based Representation Learning Methods

Author: Ahmed, Nesreen K., Rossi, Ryan A., Zhou, Rong, Lee, John Boaz, Kong, Xiangnan, Willke, Theodore L., and Eldardiry, Hoda
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Learning, Computer Science - Social and Information Networks
Abstract: Random walks are at the heart of many existing deep learning algorithms for graph data. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to node identity. In this work, we introduce the notion of attributed random walks which serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16.1% while requiring on average 853 times less space than existing methods on a variety of graphs from several domains.
Published: 2017

20. Topological limits to parallel processing capability of network architectures

Author: Petri, Giovanni, Musslick, Sebastian, Dey, Biswadip, Ozcimder, Kayhan, Turner, David, Ahmed, Nesreen K., Willke, Theodore, and Cohen, Jonathan D.
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: The ability to learn new tasks and generalize performance to others is one of the most remarkable characteristics of the human brain and of recent AI systems. The ability to perform multiple tasks simultaneously is also a signature characteristic of large-scale parallel architectures, that is evident in the human brain, and has been exploited effectively more traditional, massively parallel computational architectures. Here, we show that these two characteristics are in tension, reflecting a fundamental tradeoff between interactive parallelism that supports learning and generalization, and independent parallelism that supports processing efficiency through concurrent multitasking. We formally show that, while the maximum number of tasks that can be performed simultaneously grows linearly with network size, under realistic scenarios (e.g. in an unpredictable environment), the expected number that can be performed concurrently grows radically sub-linearly with network size. Hence, even modest reliance on shared representation strictly constrains the number of tasks that can be performed simultaneously, implying profound consequences for the development of artificial intelligence that optimally manages the tradeoff between learning and processing, and for understanding the human brains remarkably puzzling mix of sequential and parallel capabilities., Comment: version 4. Added SIs, 33 pages total, 4 figures + 14 figures in SI, major edits to text
Published: 2017

21. A Formal Approach to Modeling the Cost of Cognitive Control

Author: Ozcimder, Kayhan, Dey, Biswadip, Musslick, Sebastian, Petri, Giovanni, Ahmed, Nesreen K., Willke, Theodore L., and Cohen, Jonathan D.
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: This paper introduces a formal method to model the level of demand on control when executing cognitive processes. The cost of cognitive control is parsed into an intensity cost which encapsulates how much additional input information is required so as to get the specified response, and an interaction cost which encapsulates the level of interference between individual processes in a network. We develop a formal relationship between the probability of successful execution of desired processes and the control signals (additive control biases). This relationship is also used to specify optimal control policies to achieve a desired probability of activation for processes. We observe that there are boundary cases when finding such control policies which leads us to introduce the interaction cost. We show that the interaction cost is influenced by the relative strengths of individual processes, as well as the directionality of the underlying competition between processes., Comment: 6 pages, 3 figures, Conference paper
Published: 2017

22. On Sampling from Massive Graph Streams

Author: Ahmed, Nesreen K., Duffield, Nick, Willke, Theodore, and Rossi, Ryan A.
Subjects: Computer Science - Social and Information Networks, Computer Science - Data Structures and Algorithms, Computer Science - Information Retrieval, Mathematics - Statistics Theory
Abstract: We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive streams of graph edges. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimation variance of counts of specified sets of subgraphs. In distinction with many prior graph sampling schemes, GPS separates the functions of edge sampling and subgraph estimation. We propose two estimation frameworks: (1) Post-Stream estimation, to allow GPS to construct a reference sample of edges to support retrospective graph queries, and (2) In-Stream estimation, to allow GPS to obtain lower variance estimates by incrementally updating the subgraph count estimates during stream processing. Unbiasedness of subgraph estimators is established through a new Martingale formulation of graph stream order sampling, which shows that subgraph estimators, written as a product of constituent edge estimators are unbiased, even when computed at different points in the stream. The separation of estimation and sampling enables significant resource savings relative to previous work. We illustrate our framework with applications to triangle and wedge counting. We perform a large-scale experimental study on real-world graphs from various domains and types. GPS achieves high accuracy with less than 1% error for triangle and wedge counting, while storing a small fraction of the graph with average update times of a few microseconds per edge. Notably, for a large Twitter graph with more than 260M edges, GPS accurately estimates triangle counts with less than 1% error, while storing only 40K edges.
Published: 2017

23. Revisiting Role Discovery in Networks: From Node to Edge Roles

Author: Ahmed, Nesreen K., Rossi, Ryan A., Willke, Theodore L., and Zhou, Rong
Subjects: Statistics - Machine Learning, Computer Science - Learning, Computer Science - Social and Information Networks
Abstract: Previous work in network analysis has focused on modeling the mixed-memberships of node roles in the graph, but not the roles of edges. We introduce the edge role discovery problem and present a generalizable framework for learning and extracting edge roles from arbitrary graphs automatically. Furthermore, while existing node-centric role models have mainly focused on simple degree and egonet features, this work also explores graphlet features for role discovery. In addition, we also develop an approach for automatically learning and extracting important and useful edge features from an arbitrary graph. The experimental results demonstrate the utility of edge roles for network analysis tasks on a variety of graphs from various problem domains.
Published: 2016

24. A Searchlight Factor Model Approach for Locating Shared Information in Multi-Subject fMRI Analysis

Author: Zhang, Hejia, Chen, Po-Hsuan, Chen, Janice, Zhu, Xia, Turek, Javier S., Willke, Theodore L., Hasson, Uri, and Ramadge, Peter J.
Subjects: Statistics - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Quantitative Biology - Neurons and Cognition
Abstract: There is a growing interest in joint multi-subject fMRI analysis. The challenge of such analysis comes from inherent anatomical and functional variability across subjects. One approach to resolving this is a shared response factor model. This assumes a shared and time synchronized stimulus across subjects. Such a model can often identify shared information, but it may not be able to pinpoint with high resolution the spatial location of this information. In this work, we examine a searchlight based shared response model to identify shared information in small contiguous regions (searchlights) across the whole brain. Validation using classification tasks demonstrates that we can pinpoint informative local regions.
Published: 2016

25. A Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation

Author: Chen, Po-Hsuan, Zhu, Xia, Zhang, Hejia, Turek, Javier S., Chen, Janice, Willke, Theodore L., Hasson, Uri, and Ramadge, Peter J.
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning
Abstract: Finding the most effective way to aggregate multi-subject fMRI data is a long-standing and challenging problem. It is of increasing interest in contemporary fMRI studies of human cognition due to the scarcity of data per subject and the variability of brain anatomy and functional response across subjects. Recent work on latent factor models shows promising results in this task but this approach does not preserve spatial locality in the brain. We examine two ways to combine the ideas of a factor model and a searchlight based analysis to aggregate multi-subject fMRI data while preserving spatial locality. We first do this directly by combining a recent factor method known as a shared response model with searchlight analysis. Then we design a multi-view convolutional autoencoder for the same task. Both approaches preserve spatial locality and have competitive or better performance compared with standard searchlight analysis and the shared response model applied across the whole brain. We also report a system design to handle the computational challenge of training the convolutional autoencoder.
Published: 2016

26. Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

Author: Anderson, Michael J., Capotă, Mihai, Turek, Javier S., Zhu, Xia, Willke, Theodore L., Wang, Yida, Chen, Po-Hsuan, Manning, Jeremy R., Ramadge, Peter J., and Norman, Kenneth A.
Subjects: Statistics - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Learning, 68W15, I.2
Abstract: The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores.
Published: 2016
Full Text: View/download PDF

27. Graphlet Decomposition: Framework, Algorithms, and Applications

Author: Ahmed, Nesreen K., Neville, Jennifer, Rossi, Ryan A., Duffield, Nick, and Willke, Theodore L.
Subjects: Computer Science - Social and Information Networks, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Information Retrieval, Statistics - Machine Learning
Abstract: From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with millions of nodes and edges, which impedes the application of graphlets to new problems that require large-scale network analysis. To address these problems, we propose a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithms leverages a number of proven combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. On a large collection of 300+ networks from a variety of domains, our graphlet counting strategies are on average 460x faster than current methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date as well as the largest systematic investigation on over 300+ networks from a variety of domains.
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

27 results on '"Willke, Theodore"'

1. A structure-aware framework for learning device placements on computation graphs

2. Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

3. Locally-Adaptive Quantization for Streaming Vector Search

4. The Landscape and Challenges of HPC Research and LLMs

5. Leveraging Reinforcement Learning and Large Language Models for Code Optimization

6. Memory-Augmented Graph Neural Networks: A Brain-Inspired Review

7. End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

8. A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems

9. Navigating the Trade-Off between Multi-Task Learning and Learning to Multitask in Deep Neural Networks

10. Deep Graph Similarity Learning: A Survey

11. Approximating Stacked and Bidirectional Recurrent Architectures with the Delayed Recurrent Neural Network

12. Clinically Deployed Distributed Magnetic Resonance Imaging Reconstruction: Application to Pediatric Knee Imaging

13. Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers

14. Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

15. Temporo-Spatial Collaborative Filtering for Parameter Estimation in Noisy DCE-MRI Sequences: Application to Breast Cancer Chemotherapy Response

16. Learning Role-based Graph Embeddings

17. Segmenting Brain Tumors with Symmetry

18. Inductive Representation Learning in Large Attributed Graphs

19. A Framework for Generalizing Graph-based Representation Learning Methods

20. Topological limits to parallel processing capability of network architectures

21. A Formal Approach to Modeling the Cost of Cognitive Control

22. On Sampling from Massive Graph Streams

23. Revisiting Role Discovery in Networks: From Node to Edge Roles

24. A Searchlight Factor Model Approach for Locating Shared Information in Multi-Subject fMRI Analysis

25. A Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation

26. Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

27. Graphlet Decomposition: Framework, Algorithms, and Applications

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

27 results on '"Willke, Theodore"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources