Author: "Parnell, Thomas A" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Parnell, Thomas A"' showing total 290 results

Start Over Author "Parnell, Thomas A"

290 results on '"Parnell, Thomas A"'

1. Accelerating Production LLMs with Combined Token/Embedding Speculators

Author: Wertheimer, Davis, Rosenkranz, Joshua, Parnell, Thomas, Suneja, Sahil, Ranganathan, Pavithra, Ganti, Raghu, and Srivatsa, Mudhakar
Subjects: Computer Science - Computation and Language
Abstract: This technical report describes the design and training of novel speculative decoding draft models, for accelerating the inference speeds of large language models in a production environment. By conditioning draft predictions on both context vectors and sampled tokens, we can train our speculators to efficiently predict high-quality n-grams, which the base model then accepts or rejects. This allows us to effectively predict multiple tokens per inference forward pass, accelerating wall-clock inference speeds of highly optimized base model implementations by a factor of 2-3x. We explore these initial results and describe next steps for further improvements., Comment: Original upload 4/29/24, updated 6/6/24 with additional references to concurrent work
Published: 2024

2. Search-based Methods for Multi-Cloud Configuration

Author: Łazuka, Małgorzata, Parnell, Thomas, Anghel, Andreea, and Pozidis, Haralampos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lock-in. While most cloud providers offer similar functionality, they may differ significantly in terms of performance and/or cost. A customer looking to benefit from such differences will naturally want to solve the multi-cloud configuration problem: given a workload, which cloud provider should be chosen and how should its nodes be configured in order to minimize runtime or cost? In this work, we consider solutions to this optimization problem. We develop and evaluate possible adaptations of state-of-the-art cloud configuration solutions to the multi-cloud domain. Furthermore, we identify an analogy between multi-cloud configuration and the selection-configuration problems commonly studied in the automated machine learning (AutoML) field. Inspired by this connection, we utilize popular optimizers from AutoML to solve multi-cloud configuration. Finally, we propose a new algorithm for solving multi-cloud configuration, CloudBandit (CB). It treats the outer problem of cloud provider selection as a best-arm identification problem, in which each arm pull corresponds to running an arbitrary black-box optimizer on the inner problem of node configuration. Our experiments indicate that (a) many state-of-the-art cloud configuration solutions can be adapted to multi-cloud, with best results obtained for adaptations which utilize the hierarchical structure of the multi-cloud configuration domain, (b) hierarchical methods from AutoML can be used for the multi-cloud configuration task and can outperform state-of-the-art cloud configuration solutions and (c) CB achieves competitive or lower regret relative to other tested algorithms, whilst also identifying configurations that have 65% lower median cost and 20% lower median time in production, compared to choosing a random provider and configuration., Comment: Submitted to IEEE Cloud 2022
Published: 2022

3. Towards a General Framework for ML-based Self-tuning Databases

Author: Schmied, Thomas, Didona, Diego, Döring, Andreas, Parnell, Thomas, and Ioannou, Nikolas
Subjects: Computer Science - Databases, Computer Science - Machine Learning
Abstract: Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such as unknown valid ranges of configuration parameters and combinations of parameter values that result in invalid runs, and how we mitigated them. While these issues are typically overlooked, we argue that they are a crucial barrier to the adoption of ML self-tuning techniques in databases, and thus deserve more attention from the research community. Secondly, we present experimental results obtained when tuning FoundationDB using ML methods. Unlike prior work in this domain, we also compare with the simplest of baselines: random search. Our results show that, while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline, finding a configuration that is only 4% worse than the, vastly more complex, ML methods. We conclude that future work in this area may want to focus more on randomized, model-free optimization algorithms.
Published: 2020

4. SnapBoost: A Heterogeneous Boosting Machine

Author: Parnell, Thomas, Anghel, Andreea, Lazuka, Malgorzata, Ioannou, Nikolas, Kurella, Sebastian, Agarwal, Peshal, Papandreou, Nikolaos, and Pozidis, Haralampos
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton's method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, SnapBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how SnapBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that SnapBoost is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.
Published: 2020

5. Differentially Private Stochastic Coordinate Descent

Author: Damaskinos, Georgios, Mendler-Dünner, Celestine, Guerraoui, Rachid, Papandreou, Nikolaos, and Parnell, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning
Abstract: In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on keeping auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning.
Published: 2020

6. SySCD: A System-Aware Parallel Coordinate Descent Algorithm

Author: Ioannou, Nikolas, Mendler-Dünner, Celestine, and Parnell, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Statistics - Machine Learning
Abstract: In this paper we propose a novel parallel stochastic coordinate descent (SCD) algorithm with convergence guarantees that exhibits strong scalability. We start by studying a state-of-the-art parallel implementation of SCD and identify scalability as well as system-level performance bottlenecks of the respective implementation. We then take a principled approach to develop a new SCD variant which is designed to avoid the identified system bottlenecks, such as limited scaling due to coherence traffic of model sharing across threads, and inefficient CPU cache accesses. Our proposed system-aware parallel coordinate descent algorithm (SySCD) scales to many cores and across numa nodes, and offers a consistent bottom line speedup in training time of up to x12 compared to an optimized asynchronous parallel SCD algorithm and up to x42, compared to state-of-the-art GLM solvers (scikit-learn, Vowpal Wabbit, and H2O) on a range of datasets and multi-core CPU architectures., Comment: accepted as a spotlight at NeurIPS 2019, Vancouver, Canada
Published: 2019

7. Breadth-first, Depth-next Training of Random Forests

Author: Anghel, Andreea, Ioannou, Nikolas, Parnell, Thomas, Papandreou, Nikolaos, Mendler-Dünner, Celestine, and Pozidis, Haris
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper we analyze, evaluate, and improve the performance of training Random Forest (RF) models on modern CPU architectures. An exact, state-of-the-art binary decision tree building algorithm is used as the basis of this study. Firstly, we investigate the trade-offs between using different tree building algorithms, namely breadth-first-search (BFS) and depth-search-first (DFS). We design a novel, dynamic, hybrid BFS-DFS algorithm and demonstrate that it performs better than both BFS and DFS, and is more robust in the presence of workloads with different characteristics. Secondly, we identify CPU performance bottlenecks when generating trees using this approach, and propose optimizations to alleviate them. The proposed hybrid tree building algorithm for RF is implemented in the Snap Machine Learning framework, and speeds up the training of RFs by 7.8x on average when compared to state-of-the-art RF solvers (sklearn, H2O, and xgboost) on a range of datasets, RF configurations, and multi-core CPU architectures.
Published: 2019

8. Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

Author: Sarigiannis, Dimitrios, Parnell, Thomas, and Pozidis, Haris
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The combined algorithm selection and hyperparameter tuning (CASH) problem is characterized by large hierarchical hyperparameter spaces. Model-free hyperparameter tuning methods can explore such large spaces efficiently since they are highly parallelizable across multiple machines. When no prior knowledge or meta-data exists to boost their performance, these methods commonly sample random configurations following a uniform distribution. In this work, we propose a novel sampling distribution as an alternative to uniform sampling and prove theoretically that it has a better chance of finding the best configuration in a worst-case setting. In order to compare competing methods rigorously in an experimental setting, one must perform statistical hypothesis testing. We show that there is little-to-no agreement in the automated machine learning literature regarding which methods should be used. We contrast this disparity with the methods recommended by the broader statistics literature, and identify a suitable approach. We then select three popular model-free solutions to CASH and evaluate their performance, with uniform sampling as well as the proposed sampling scheme, across 67 datasets from the OpenML platform. We investigate the trade-off between exploration and exploitation across the three algorithms, and verify empirically that the proposed sampling distribution improves performance in all cases., Comment: Accepted for presentation at The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)
Published: 2019

9. Learning to Tune XGBoost with XGBoost

Author: Sommer, Johanna, Sarigiannis, Dimitrios, and Parnell, Thomas
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this short paper we investigate whether meta-learning techniques can be used to more effectively tune the hyperparameters of machine learning models using successive halving (SH). We propose a novel variant of the SH algorithm (MeSH), that uses meta-regressors to determine which candidate configurations should be eliminated at each round. We apply MeSH to the problem of tuning the hyperparameters of a gradient-boosted decision tree model. By training and tuning our meta-regressors using existing tuning jobs from 95 datasets, we demonstrate that MeSH can often find a superior solution to both SH and random search., Comment: Accepted for presentation at The 3rd Workshop on Meta-Learning (Meta-Learn 2019), Vancouver, Canada
Published: 2019

10. Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle

Author: Kaufmann, Michael, Kourtis, Kornilios, Mendler-Dünner, Celestine, Schüpbach, Adrian, and Parnell, Thomas
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, Statistics - Machine Learning
Abstract: Distributed machine learning training is one of the most common and important workloads running on data centers today, but it is rarely executed alone. Instead, to reduce costs, computing resources are consolidated and shared by different applications. In this scenario, elasticity and proper load balancing are vital to maximize efficiency, fairness, and utilization. Currently, most distributed training frameworks do not support the aforementioned properties. A few exceptions that do support elasticity, imitate generic distributed frameworks and use micro-tasks. In this paper we illustrate that micro-tasks are problematic for machine learning applications, because they require a high degree of parallelism which hinders the convergence of distributed training at a pure algorithmic level (i.e., ignoring overheads and scalability limitations). To address this, we propose Chicle, a new elastic distributed training framework which exploits the nature of machine learning algorithms to implement elasticity and load balancing without micro-tasks. We use Chicle to train deep neural network as well as generalized linear models, and show that Chicle achieves performance competitive with state of the art rigid frameworks, while efficiently enabling elastic execution and dynamic load balancing.
Published: 2019

11. 5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

Author: Dazzi, Martino, Sebastian, Abu, Francese, Pier Andrea, Parnell, Thomas, Benini, Luca, and Eleftheriou, Evangelos
Subjects: Computer Science - Machine Learning
Abstract: In-memory computing is an emerging computing paradigm that could enable deeplearning inference at significantly higher energy efficiency and reduced latency. The essential idea is to map the synaptic weights corresponding to each layer to one or more computational memory (CM) cores. During inference, these cores perform the associated matrix-vector multiply operations in place with O(1) time complexity, thus obviating the need to move the synaptic weights to an additional processing unit. Moreover, this architecture could enable the execution of these networks in a highly pipelined fashion. However, a key challenge is to design an efficient communication fabric for the CM cores. Here, we present one such communication fabric based on a graph topology that is well suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state of-the-art CNNs by proving the existence of a homomorphism between one graph representation of these networks and the proposed graph topology. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present a concrete example of mapping ResNet-32 onto an array of CM cores.
Published: 2019

12. Sampling Acquisition Functions for Batch Bayesian Optimization

Author: De Palma, Alessandro, Mendler-Dünner, Celestine, Parnell, Thomas, Anghel, Andreea, and Pozidis, Haralampos
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We present Acquisition Thompson Sampling (ATS), a novel technique for batch Bayesian Optimization (BO) based on the idea of sampling multiple acquisition functions from a stochastic process. We define this process through the dependency of the acquisition functions on a set of model hyper-parameters. ATS is conceptually simple, straightforward to implement and, unlike other batch BO methods, it can be employed to parallelize any sequential acquisition function or to make existing parallel methods scale further. We present experiments on a variety of benchmark functions and on the hyper-parameter optimization of a popular gradient boosting tree algorithm. These demonstrate the advantages of ATS with respect to classical parallel Thompson Sampling for BO, its competitiveness with two state-of-the-art batch BO methods, and its effectiveness if applied to existing parallel BO algorithms., Comment: Presented at BNP@NeurIPS 2018
Published: 2019

13. Elastic CoCoA: Scaling In to Improve Convergence

Author: Kaufmann, Michael, Parnell, Thomas, and Kourtis, Kornilios
Subjects: Computer Science - Machine Learning, Computer Science - Human-Computer Interaction, Statistics - Machine Learning
Abstract: In this paper we experimentally analyze the convergence behavior of CoCoA and show, that the number of workers required to achieve the highest convergence rate at any point in time, changes over the course of the training. Based on this observation, we build Chicle, an elastic framework that dynamically adjusts the number of workers based on feedback from the training algorithm, in order to select the number of workers that results in the highest convergence rate. In our evaluation of 6 datasets, we show that Chicle is able to accelerate the time-to-accuracy by a factor of up to 5.96x compared to the best static setting, while being robust enough to find an optimal or near-optimal setting automatically in most cases.
Published: 2018

14. Parallel training of linear models without compromising convergence

Author: Ioannou, Nikolas, Dünner, Celestine, Kourtis, Kornilios, and Parnell, Thomas
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks, and apply optimizations that improve data parallelism, cache line locality, and cache line prefetching of the algorithm. These modifications reduce the per-epoch run-time significantly, but take a toll on algorithm convergence in terms of the required number of epochs. To alleviate these shortcomings of our systems-optimized version, we propose a novel, dynamic data partitioning scheme across threads which allows us to approach the convergence of the sequential version. The combined set of optimizations result in a consistent bottom line speedup in convergence of up to 12x compared to the initial asynchronous parallel training algorithm and up to 42x, compared to state of the art implementations (scikit-learn and h2o) on a range of multi-core CPU architectures., Comment: Presented at the Workshop on Systems for ML and Open Source Software at NeurIPS 2018
Published: 2018

15. Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms

Author: Anghel, Andreea, Papandreou, Nikolaos, Parnell, Thomas, De Palma, Alessandro, and Pozidis, Haralampos
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Gradient boosting decision trees (GBDTs) have seen widespread adoption in academia, industry and competitive data science due to their state-of-the-art performance in many machine learning tasks. One relative downside to these models is the large number of hyper-parameters that they expose to the end-user. To maximize the predictive power of GBDT models, one must either manually tune the hyper-parameters, or utilize automated techniques such as those based on Bayesian optimization. Both of these approaches are time-consuming since they involve repeatably training the model for different sets of hyper-parameters. A number of software GBDT packages have started to offer GPU acceleration which can help to alleviate this problem. In this paper, we consider three such packages: XGBoost, LightGBM and Catboost. Firstly, we evaluate the performance of the GPU acceleration provided by these packages using large-scale datasets with varying shapes, sparsities and learning tasks. Then, we compare the packages in the context of hyper-parameter optimization, both in terms of how quickly each package converges to a good validation score, and in terms of generalization performance., Comment: Workshop on Systems for ML and Open Source Software at NeurIPS 2018, Montreal, Canada
Published: 2018

16. Snap ML: A Hierarchical Framework for Machine Learning

Author: Dünner, Celestine, Parnell, Thomas, Sarigiannis, Dimitrios, Ioannou, Nikolas, Anghel, Andreea, Ravi, Gummadi, Kandasamy, Madhusudanan, and Pozidis, Haralampos
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern computing systems. We prove theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter-node communication. Additionally, we provide a review of the implementation of Snap ML in terms of GPU acceleration, pipelining, communication patterns and software architecture, highlighting aspects that were critical for achieving high performance. We evaluate the performance of Snap ML in both single-node and multi-node environments, quantifying the benefit of the hierarchical scheme and the data streaming functionality, and comparing with other widely-used machine learning software frameworks. Finally, we present a logistic regression benchmark on the Criteo Terabyte Click Logs dataset and show that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results, including those obtained using TensorFlow and scikit-learn., Comment: in Proceedings of the Thirty-Second Conference on Neural Information Processing Systems (NeurIPS 2018)
Published: 2018

17. Linear-Complexity Relaxed Word Mover's Distance with GPU Acceleration

Author: Atasu, Kubilay, Parnell, Thomas, Dünner, Celestine, Sifalakis, Manolis, Pozidis, Haralampos, Vasileiadis, Vasileios, Vlachos, Michail, Berrospi, Cesar, and Labbi, Abdel
Subjects: Computer Science - Information Retrieval, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Data Structures and Algorithms
Abstract: The amount of unstructured text-based data is growing every day. Querying, clustering, and classifying this big data requires similarity computations across large sets of documents. Whereas low-complexity similarity metrics are available, attention has been shifting towards more complex methods that achieve a higher accuracy. In particular, the Word Mover's Distance (WMD) method proposed by Kusner et al. is a promising new approach, but its time complexity grows cubically with the number of unique words in the documents. The Relaxed Word Mover's Distance (RWMD) method, again proposed by Kusner et al., reduces the time complexity from qubic to quadratic and results in a limited loss in accuracy compared with WMD. Our work contributes a low-complexity implementation of the RWMD that reduces the average time complexity to linear when operating on large sets of documents. Our linear-complexity RWMD implementation, henceforth referred to as LC-RWMD, maps well onto GPUs and can be efficiently distributed across a cluster of GPUs. Our experiments on real-life datasets demonstrate 1) a performance improvement of two orders of magnitude with respect to our GPU-based distributed implementation of the quadratic RWMD, and 2) a performance improvement of three to four orders of magnitude with respect to our distributed WMD implementation that uses GPU-based RWMD for pruning., Comment: To appear in the 2017 IEEE International Conference on Big Data (Big Data 2017) http://cci.drexel.edu/bigdata/bigdata2017/ December 11-14, 2017, Boston, MA, USA
Published: 2017

18. Neuromorphic computing with multi-memristive synapses

Author: Boybat, Irem, Gallo, Manuel Le, Nandakumar, S. R., Moraitis, Timoleon, Parnell, Thomas, Tuma, Tomas, Rajendran, Bipin, Leblebici, Yusuf, Sebastian, Abu, and Eleftheriou, Evangelos
Subjects: Computer Science - Emerging Technologies
Abstract: Neuromorphic computing has emerged as a promising avenue towards building the next generation of intelligent computing systems. It has been proposed that memristive devices, which exhibit history-dependent conductivity modulation, could efficiently represent the synaptic weights in artificial neural networks. However, precise modulation of the device conductance over a wide dynamic range, necessary to maintain high network accuracy, is proving to be challenging. To address this, we present a multi-memristive synaptic architecture with an efficient global counter-based arbitration scheme. We focus on phase change memory devices, develop a comprehensive model and demonstrate via simulations the effectiveness of the concept for both spiking and non-spiking neural networks. Moreover, we present experimental results involving over a million phase change memory devices for unsupervised learning of temporal correlations using a spiking neural network. The work presents a significant step towards the realization of large-scale and energy-efficient neuromorphic computing systems.
Published: 2017
Full Text: View/download PDF

19. Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Author: Dünner, Celestine, Parnell, Thomas, and Jaggi, Martin
Subjects: Computer Science - Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Mathematics - Optimization and Control, Statistics - Machine Learning, 90C25, 68W15, 68W10, G.1.6, C.1.4
Abstract: We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of large-scale machine learning models, when the training data exceeds their memory capacity. Also, it provides adaptivity to any system's memory hierarchy in terms of size and processing speed. Our technique is built upon novel theoretical insights regarding primal-dual coordinate methods, and uses duality gap information to dynamically decide which part of the data should be made available for fast processing. To illustrate the power of our approach we demonstrate its performance for training of generalized linear models on a large-scale dataset exceeding the memory size of a modern GPU, showing an order-of-magnitude speedup over existing approaches.
Published: 2017

20. Temporal correlation detection using computational phase-change memory

Author: Sebastian, Abu, Tuma, Tomas, Papandreou, Nikolaos, Gallo, Manuel Le, Kull, Lukas, Parnell, Thomas, and Eleftheriou, Evangelos
Subjects: Computer Science - Emerging Technologies
Abstract: For decades, conventional computers based on the von Neumann architecture have performed computation by repeatedly transferring data between their processing and their memory units, which are physically separated. As computation becomes increasingly data-centric and as the scalability limits in terms of performance and power are being reached, alternative computing paradigms are searched for in which computation and storage are collocated. A fascinating new approach is that of computational memory where the physics of nanoscale memory devices are used to perform certain computational tasks within the memory unit in a non-von Neumann manner. Here we present a large-scale experimental demonstration using one million phase-change memory devices organized to perform a high-level computational primitive by exploiting the crystallization dynamics. Also presented is an application of such a computational memory to process real-world data-sets. The results show that this co-existence of computation and storage at the nanometer scale could be the enabler for new, ultra-dense, low power, and massively parallel computing systems.
Published: 2017
Full Text: View/download PDF

21. Large-Scale Stochastic Learning using GPUs

Author: Parnell, Thomas, Dünner, Celestine, Atasu, Kubilay, Sifalakis, Manolis, and Pozidis, Haris
Subjects: Computer Science - Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU implementation of the widely used stochastic coordinate descent/ascent algorithm that can provide up to 35x speed-up over a sequential CPU implementation. In order to train on very large datasets that do not fit inside the memory of a single GPU, we then consider techniques for distributed stochastic learning. We propose a novel method for optimally aggregating model updates from worker nodes when the training data is distributed either by example or by feature. Using this technique, we demonstrate that one can scale out stochastic learning across up to 8 worker nodes without any significant loss of training time. Finally, we combine GPU acceleration with the optimized distributed method to train on a dataset consisting of 200 million training examples and 75 million features. We show by scaling out across 4 GPUs, one can attain a high degree of training accuracy in around 4 seconds: a 20x speed-up in training time compared to a multi-threaded, distributed implementation across 4 CPUs., Comment: Accepted for publication in ParLearning 2017: The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics, Orlando, Florida, May 2017
Published: 2017

22. Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark

Author: Dünner, Celestine, Parnell, Thomas, Atasu, Kubilay, Sifalakis, Manolis, and Pozidis, Haralampos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Learning
Abstract: In this paper we explore the performance limits of Apache Spark for machine learning applications. We begin by analyzing the characteristics of a state-of-the-art distributed machine learning algorithm implemented in Spark and compare it to an equivalent reference implementation using the high performance computing framework MPI. We identify critical bottlenecks of the Spark framework and carefully study their implications on the performance of the algorithm. In order to improve Spark performance we then propose a number of practical techniques to alleviate some of its overheads. However, optimizing computational efficiency and framework related overheads is not the only key to performance -- we demonstrate that in order to get the best performance out of any implementation it is necessary to carefully tune the algorithm to the respective trade-off between computation time and communication latency. The optimal trade-off depends on both the properties of the distributed algorithm as well as infrastructure and framework-related characteristics. Finally, we apply these technical and algorithmic optimizations to three different distributed linear machine learning algorithms that have been implemented in Spark. We present results using five large datasets and demonstrate that by using the proposed optimizations, we can achieve a reduction in the performance difference between Spark and MPI from 20x to 2x., Comment: To appear in the 2017 IEEE International Conference on Big Data (Big Data 2017), December 11-14, 2017, Boston, MA, USA
Published: 2016
Full Text: View/download PDF

23. Scalable and interpretable product recommendations via overlapping co-clustering

Author: Heckel, Reinhard, Vlachos, Michail, Parnell, Thomas, and Dünner, Celestine
Subjects: Computer Science - Information Retrieval
Abstract: We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs)., Comment: In IEEE International Conference on Data Engineering (ICDE) 2017
Published: 2016

24. Information Storage and Retrieval for Probe Storage using Optical Diffraction Patterns

Author: van Honschoten, Joost, de Jong, Henri, Koelmans, Wabe W., Parnell, Thomas P., and Zaboronski, Oleg V.
Subjects: Computer Science - Information Theory, Computer Science - Information Retrieval, Physics - Optics
Abstract: A novel method for fast information retrieval from a probe storage device is considered. It is shown that information can be stored and retrieved using the optical diffraction patterns obtained by the illumination of a large array of cantilevers by a monochromatic light source. In thermo-mechanical probe storage, the information is stored as a sequence of indentations on the polymer medium. To retrieve the information, the array of probes is actuated by applying a bending force to the cantilevers. Probes positioned over indentations experience deflection by the depth of the indentation, probes over the flat media remain un-deflected. Thus the array of actuated probes can be viewed as an irregular optical grating, which creates a data-dependent diffraction pattern when illuminated by laser light. We develop a low complexity modulation scheme, which allows the extraction of information stored in the pattern of indentations on the media from Fourier coefficients of the intensity of the diffraction pattern. We then derive a low-complexity maximum likelihood sequence detection algorithm for retrieving the user information from the Fourier coefficients. The derivation of both the modulation and the detection schemes is based on the Fraunhofer formula for data-dependent diffraction patterns. We show that for as long as the Fresnel number F<0.1, the optimal channel detector derived from Fraunhofer diffraction theory does not suffer any significant performance degradation., Comment: 14 pages, 11 figures. Version 2: minor misprints corrected, experimental section expanded
Published: 2011
Full Text: View/download PDF

25. Information theory of massively parallel probe storage channels

Author: Hambrey, Oliver, Parnell, Thomas, and Zaboronski, Oleg
Subjects: Computer Science - Information Theory, Computer Science - Information Retrieval, 68P20, 68P30
Abstract: Motivated by the concept of probe storage, we study the problem of information retrieval using a large array of N nano-mechanical probes, N ~ 4000. At the nanometer scale it is impossible to avoid errors in the positioning of the array, thus all signals retrieved by the probes of the array at a given sampling moment are affected by the same amount of random position jitter. Therefore a massively parallel probe storage device is an example of a noisy communication channel with long range correlations between channel outputs due to the global positioning errors. We find that these correlations have a profound effect on the channel's properties. For example, it turns out that the channel's information capacity does approach 1 bit per probe in the limit of high signal-to-noise ratio, but the rate of the approach is only polynomial in the channel noise strength. Moreover, any error correction code with block size N >> 1 such that codewords correspond to the instantaneous outputs of the all probes in the array exhibits an error floor independently of the code rate. We illustrate this phenomenon explicitly using Reed-Solomon codes the performance of which is easy to simulate numerically. We also discuss capacity-achieving error correction codes for the global jitter channel and their complexity., Comment: 16 pages, 10 figures
Published: 2011

26. Detection and decoding algorithms for nanoscale data storage

Author: Parnell, Thomas P.
Subjects: 510, QA Mathematics, TK Electrical engineering. Electronics Nuclear engineering
Abstract: Scanning probe technology can be used for the modification of surfaces on the nanoscale and therefore has potential applications for data storage: data can be stored as a sequence of indentations in a polymer medium for example. In order to achieve the throughput requirements of a modern storage device the proposed probe storage systems consist of large arrays of probes reading/writing/erasing data in parallel. One of the most important tasks when designing a commercial storage device is to ensure that data can always be retrieved with a very low probability of error. The small scales offered by probe storage can potentially allow very high areal densities of information storage (larger than 1Tbit/in2) but there is a price to pay: many distortions arise when trying to retrieve this data (positioning errors for example) that make it harder to determine the correct information originally stored by the user. This thesis is concerned with signal processing for probe storage. Firstly channel models are developed for the read-back signal from a probe storage device that take into account the various distortions that occur. These models are then used for the design of probabilistic data detection algorithms and error-correcting codes that ensure the probability of error associated with data retrieval is sufficiently low. These intensively mathematical algorithms are designed with their complexity in mind to ensure they allow an implementation that satisfies the silicon area, power and timing constraints of a highly parallelized probe storage device. Making use of the tools provided by such fields as information theory, probability theory and asymptotic analysis the performance of these signal processing algorithms is studied theoretically and fundamental limits concerning the performance of a probe storage device are computed. The system-level implications of these results are carefully considered.
Published: 2010

27. Acceleration of Decision-Tree Ensemble Models on the IBM Telum Processor

Author: Papandreou, Nikolaos, primary, van Lunteren, Jan, additional, Anghel, Andreea, additional, Parnell, Thomas, additional, Petermann, Martin, additional, Stanisavljevic, Milos, additional, Lichtenau, Cedric, additional, Sica, Andrew, additional, Röhm, Dominic, additional, Tzortzatos, Elpida, additional, and Pozidis, Haralampos, additional
Published: 2023
Full Text: View/download PDF

28. Chapter 1. Swift and His World

Author: Swift, Jonathan, primary, Smedley, Jonathan, additional, Winstanley, John, additional, Parnell, Thomas, additional, Concanen, Matthew, additional, Pilkington, Matthew, additional, Barber, Mary, additional, Grierson, Constantia, additional, Pilkington, Laetitia, additional, and Du Bois, Dorothea, additional
Published: 2019
Full Text: View/download PDF

29. 17 Pre-operative patient characteristics predict outpatient opioid use

Author: Spirtos, Alexandra, primary, Werner, Bethany, additional, Barth, Jackson, additional, Parnell, Thomas, additional, Street, Austin, additional, LoCoco, Salvatore, additional, Carlson, Matthew, additional, Miller, David S., additional, and Lea, Jayanthi, additional
Published: 2022
Full Text: View/download PDF

30. 40 Pre-operative non-narcotic analgesia decreases the use of postoperative narcotics

Author: Spirtos, Alexandra, primary, Parnell, Thomas, additional, Barth, Jackson, additional, Huang, Weijiao, additional, Street, Austin, additional, and Lea, Jayanthi, additional
Published: 2022
Full Text: View/download PDF

31. Search-based Methods for Multi-Cloud Configuration

Author: Lazuka, Malgorzata, primary, Parnell, Thomas, additional, Anghel, Andreea, additional, and Pozidis, Haralampos, additional
Published: 2022
Full Text: View/download PDF

32. Efficient Pipelined Execution of CNNs Based on In-Memory Computing and Graph Homomorphism Verification

Author: Dazzi, Martino, primary, Sebastian, Abu, additional, Parnell, Thomas, additional, Francese, Pier Andrea, additional, Benini, Luca, additional, and Eleftheriou, Evangelos, additional
Published: 2021
Full Text: View/download PDF

33. Differentially Private Stochastic Coordinate Descent

Author: Damaskinos, Georgios, primary, Mendler-Dünner, Celestine, additional, Guerraoui, Rachid, additional, Papandreou, Nikolaos, additional, and Parnell, Thomas, additional
Published: 2021
Full Text: View/download PDF

34. Peripheral administration of poly I:C leads to increased hippocampal amyloid-beta and cognitive deficits in a non-transgenic mouse

Author: Weintraub, Marielle K., Kranjac, Dinko, Eimerbrink, Micah J., Pearson, Scott J., Vinson, Ben T., Patel, Jigna, Summers, Whitney M., Parnell, Thomas B., Boehm, Gary W., and Chumley, Michael J.
Published: 2014
Full Text: View/download PDF

35. Towards a General Framework for ML-based Self-tuning Databases

Author: Schmied, Thomas, primary, Didona, Diego, additional, Döring, Andreas, additional, Parnell, Thomas, additional, and Ioannou, Nikolas, additional
Published: 2021
Full Text: View/download PDF

36. Tera-scale coordinate descent on GPUs

Author: Parnell, Thomas, primary, Dünner, Celestine, additional, Atasu, Kubilay, additional, Sifalakis, Manolis, additional, and Pozidis, Haralampos, additional
Published: 2020
Full Text: View/download PDF

37. Weighted Sampling for Combined Model Selection and Hyperparameter Tuning

Author: Sarigiannis, Dimitrios, primary, Parnell, Thomas, additional, and Pozidis, Haralampos, additional
Published: 2020
Full Text: View/download PDF

38. Open Block Characterization and Read Voltage Calibration of 3D QLC NAND Flash

Author: Papandreou, Nikolaos, primary, Pozidis, Haralampos, additional, Ioannou, Nikolas, additional, Parnell, Thomas, additional, Pletka, Roman, additional, Stanisavljevic, Milos, additional, Stoica, Radu, additional, Tomic, Sasa, additional, Breen, Patrick, additional, Tressler, Gary, additional, Fry, Aaron, additional, Fisher, Timothy, additional, and Walls, Andrew, additional
Published: 2020
Full Text: View/download PDF

39. A Radiation Dosimeter Concept for the Lunar Surface Environment

Author: Adams, James H, Christl, Mark J, Watts, John, Kuznetsov, Eugeny N, Parnell, Thomas A, and Pendleton, Geoff N
Subjects: Solar Physics
Abstract: A novel silicon detector configuration for radiation dose measurements in an environment where solar energetic particles are of most concern is described. The dosimeter would also measure the dose from galactic cosmic rays. In the lunar environment a large range in particle flux and ionization density must be measured and converted to dose equivalent. This could be accomplished with a thick (e.g. 2mm) silicon detector segmented into cubic volume elements "voxels" followed by a second, thin monolithic silicon detector. The electronics needed to implement this detector concept include analog signal processors (ASIC) and a field programmable gate array (FPGA) for data accumulation and conversion to linear energy transfer (LET) spectra and to dose-equivalent (Sievert). Currently available commercial ASIC's and FPGA's are suitable for implementing the analog and digital systems.
Published: 2007

40. Snap ML: A Hierarchical Framework for Machine Learning

Author: D��nner, Celestine, Parnell, Thomas, Sarigiannis, Dimitrios, Ioannou, Nikolas, Anghel, Andreea, Ravi, Gummadi, Kandasamy, Madhusudanan, and Pozidis, Haralampos
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG)
Abstract: We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern computing systems. We prove theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter-node communication. Additionally, we provide a review of the implementation of Snap ML in terms of GPU acceleration, pipelining, communication patterns and software architecture, highlighting aspects that were critical for achieving high performance. We evaluate the performance of Snap ML in both single-node and multi-node environments, quantifying the benefit of the hierarchical scheme and the data streaming functionality, and comparing with other widely-used machine learning software frameworks. Finally, we present a logistic regression benchmark on the Criteo Terabyte Click Logs dataset and show that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results, including those obtained using TensorFlow and scikit-learn., in Proceedings of the Thirty-Second Conference on Neural Information Processing Systems (NeurIPS 2018)
Published: 2018

41. Reliability of 3D NAND flash memory with a focus on read voltage calibration from a system aspect

Author: Papandreou, Nikolaos, primary, Ioannou, Nikolas, additional, Parnell, Thomas, additional, Pletka, Roman, additional, Stanisavljevic, Milos, additional, Stoica, Radu, additional, Tomic, Sasa, additional, and Pozidis, Haralampos, additional
Published: 2019
Full Text: View/download PDF

42. Addressing Interpretability and Cold-Start in Matrix Factorization for Recommender Systems

Author: Vlachos, Michail, primary, Dunner, Celestine, additional, Heckel, Reinhard, additional, Vassiliadis, Vassilios G., additional, Parnell, Thomas, additional, and Atasu, Kubilay, additional
Published: 2019
Full Text: View/download PDF

43. Characterization and Analysis of Bit Errors in 3D TLC NAND Flash Memory

Author: Papandreou, Nikolaos, primary, Pozidis, Haralampos, additional, Parnell, Thomas, additional, Ioannou, Nikolas, additional, Pletka, Roman, additional, Tomic, Sasa, additional, Breen, Patrick, additional, Tressler, Gary, additional, Fry, Aaron, additional, and Fisher, Timothy, additional
Published: 2019
Full Text: View/download PDF

44. Management of Next-Generation NAND Flash to Achieve Enterprise-Level Endurance and Latency Targets

Author: Pletka, Roman, primary, Koltsidas, Ioannis, additional, Ioannou, Nikolas, additional, Tomić, Saša, additional, Papandreou, Nikolaos, additional, Parnell, Thomas, additional, Pozidis, Haralampos, additional, Fry, Aaron, additional, and Fisher, Tim, additional
Published: 2018
Full Text: View/download PDF

45. Radiation Effects and Protection for Moon and Mars Missions

Author: Parnell, Thomas A, Watts, John W., Jr, and Armstrong, Tony W
Subjects: Lunar And Planetary Exploration
Abstract: Manned and robotic missions to the Earth's moon and Mars are exposed to a continuous flux of Galactic Cosmic Rays (GCR) and occasional, but intense, fluxes of Solar Energetic Particles (SEP). These natural radiations impose hazards to manned exploration, but also present some constraints to the design of robotic missions. The hazards to interplanetary flight crews and their uncertainties have been studied recently by a National Research Council Committee (Space Studies Board 1996). Considering the present uncertainty estimates, thick spacecraft shielding would be needed for manned missions, some of which could be accomplished with onboard equipment and expendables. For manned and robotic missions, the effects of radiation on electronics, sensors, and controls require special consideration in spacecraft design. This paper describes the GCR and SEP particle fluxes, secondary particles behind shielding, uncertainties in radiobiological effects and their impact on manned spacecraft design, as well as the major effects on spacecraft equipment. The principal calculational tools and considerations to mitigate the radiation effects are discussed, and work in progress to reduce uncertainties is included.
Published: 1998

46. LDEF contributions to cosmic ray and radiation environments research

Author: Parnell, Thomas A
Subjects: Space Radiation
Abstract: LDEF-1 carried three experiments which are producing significant advances in our knowledge of ultra heavy and anomalous cosmic rays, solar flare particles, and heavy nuclei in the trapped belts. Nine other experiments made measurements on the radiation environments or performed dosmetric monitoring. Data from those experiments, and from measurements of induced radioactivity in LDEF components have significantly improved our knowledge of LEO radiation environment. Measurements at various locations shielding depths of radiation absorbed dose, linear energy transfer spectra, proton, neutron and heavy ion fluences, and induced radioactivity have been made, and many of these results have been compared to models. This has allowed the assessment of accuracy, and the potential for improvement, of the models. Serendipitous results from the radiation measurements include the discovery of atmospheric Be-7 plated on the front surface of LDEF, which has motivated a series of new investigations. A sample of measurements and modeling results will be presented, as well as the status of archiving the measurements and models.
Published: 1995

47. Status of LDEF activation measurements and archive

Author: Harmon, B. Alan, Parnell, Thomas A, and Laird, Christopher E
Subjects: Atomic And Molecular Physics
Abstract: We review the status of induced radioactivity measurements for the LDEF spacecraft which includes studies of the nuclide, target, directional and depth dependences of the activation. Analysis of the data has focused on extraction of the specific activities for many materials to develop a global picture of the low Earth orbital environment to which the LDEF was subjected. Preliminary comparisons of data in a previous review showed that it was possible to make meaningful intercomparisons between results obtained at different facilities. Generally these comparisons were good and gave results to within 10-20 percent, although some analysis remains. These results clearly provide constraints for recent calculations being performed of the radiation environment of the LDEF. We are not anticipating a period of production of final activation results. An archive is being prepared jointly between NASA/Marshall and Eastern Kentucky University which will include gamma ray spectra and other intermediate results.
Published: 1995

48. Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

Author: D��nner, Celestine, Parnell, Thomas, and Jaggi, Martin
Subjects: FOS: Computer and information sciences, G.1.6, C.1.4, Machine Learning (stat.ML), Machine Learning (cs.LG), Computer Science - Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Optimization and Control (math.OC), Statistics - Machine Learning, FOS: Mathematics, 90C25, 68W15, 68W10, Distributed, Parallel, and Cluster Computing (cs.DC), Mathematics - Optimization and Control
Abstract: We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of large-scale machine learning models, when the training data exceeds their memory capacity. Also, it provides adaptivity to any system's memory hierarchy in terms of size and processing speed. Our technique is built upon novel theoretical insights regarding primal-dual coordinate methods, and uses duality gap information to dynamically decide which part of the data should be made available for fast processing. To illustrate the power of our approach we demonstrate its performance for training of generalized linear models on a large-scale dataset exceeding the memory size of a modern GPU, showing an order-of-magnitude speedup over existing approaches.
Published: 2017

49. Improving Endurance in 3D-NAND Flash

Author: Pletka, Roman, Koltsidas, Ioannis, Ioannou, Nikolas, Sasa Tomic, Papandreou, Nikolaos, Parnell, Thomas, Haralampos Pozidis, Fry, Aaron, and Fisher, Tim
Published: 2017
Full Text: View/download PDF

50. Status of LDEF ionizing radiation measurements and analysis

Author: Parnell, Thomas A
Subjects: Space Radiation
Abstract: At this symposium significant new data and analyses were reported in cosmic ray research, radiation dosimetry, induced radioactivity, and radiation environment modeling. Measurements of induced radioactivity and absorbed dose are nearly complete, but much analysis and modeling remains. Measurements and analyses of passive nuclear track detectors (PNTD), used to derive the cosmic ray composition and spectra, and linear energy transfer (LET) spectra, are only a few percent complete, but important results have already emerged. As one might expect at this stage of the research, some of the new information has produced questions rather than answers. Low-energy heavy nuclei detected by two experiments are not compatible with known solar or cosmic components. Various data sets on absorbed dose are not consistent, and a new trapped proton environment model does not match the absorbed dose data. A search for cosmogenic nuclei other than Be-7 on Long Duration Exposure Facility (LDEF) surfaces has produced an unexpected result, and some activation data relating to neutrons is not yet understood. Most of these issues will be resolved by the analysis of further experiment data, calibrations, or the application of the large LDEF data set that offers alternate data or analysis techniques bearing on the same problem. The scope of the papers at this symposium defy a compact technical summary. I have attempted to group the new information that I noted into the following groups: induced radioactivity; absorbed dose measurements; LET spectra and heavy ion dosimetry; environment modeling and three dimensional shielding effects; cosmogenic nuclei; and cosmic rays and other heavy ions. The papers generally are expository and have excellent illustrations, and I refer to their figures rather than reproduce them here. The general program and objectives of ionizing radiation measurements and analyses on LDEF has been described previously.
Published: 1993

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

290 results on '"Parnell, Thomas A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources