Author: "Merrett, Geoff" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Exploration of Decision Sub-Network Architectures for FPGA-based Dynamic DNNs

Author: Dimitriou, Anastasios, Hu, Mingyu, Hare, Jonathon, and Merrett, Geoff
Abstract: Dynamic Deep Neural Networks (DNNs) can achieve faster execution and less computationally intensive inference by spending fewer resources on easy to recognise or less informative parts of an input. They make data-dependent decisions, which strategically deactivate a model’s components, e.g. layers, channels or sub-networks. However, dynamic DNNs have only been explored and applied on conventional computing systems (CPU+GPU) and programmed with libraries designed for static networks, limiting their effects. In this paper, we propose and explore two approaches for efficiently realising the sub-networks that make these decisions on FPGAs. A pipeline approach targets the use of the existing hardware to execute the sub-network, while a parallel approach uses dedicated circuitry for it. We explore the performance of each using the BranchyNet early exit approach on LeNet-5, and evaluate on a Xilinx ZCU106. The pipeline approach is 36% faster than a desktop CPU. It consumes 0.51 mJ per inference, 16x lower than a non-dynamic network on the same platform and 8x lower than an Nvidia Jetson Xavier NX. The parallel approach executes 17% faster than the pipeline approach when on dynamic inference no early exits are taken, but incurs an increase in energy consumption of 28%.
Published: 2023

2. Energy-efficient memory tracing for state retention in transient computing systems

Author: Verykios, Theodoros D., Balsamo, Domenico, and Merrett, Geoff
Abstract: Transient computing systems, also known as intermittent computing systems, are batteryless systems powered by energy harvesting (EH) sources that do not require large energy storage for system operations. Instead, they rely on retaining their state, i.e. a snapshot, in non-volatile memory (NVM) in the event of a power outage and restoring it when the power recovers. In this paper, we first discuss the limitations of state-of-the-art techniques that attempt to minimize the amount of system state saved to NVM. Therefore, we propose a novel energy-efficient system-level approach for state retention through memory tracing based on a custom hardware module named MeTra that traces changes in the main (volatile) memory between power outages. MeTra allows the voltage threshold that activates the state retention process to be dynamically adjusted according to the energy requirement of each snapshot. Thus, a great proportion of the energy harvested can be spent on useful operations. Experimental results show that the system’s active time can be extended up to 17x for Flash-based systems and 92.2% for FRAM-based systems, compared to saving the entire system state, with an area overhead of as little as 2.48%.
Published: 2023

3. TinyOps: ImageNet Scale Deep Learning on Microcontrollers

Author: Sadiq, Sulaiman, Hare, Jonathon, Maji, Partha, Craske, Simon, and Merrett, Geoff
Abstract: Deep Learning on microcontroller (MCU) based IoT devices is extremely challenging due to memory constraints. Prior approaches focus on using internal memory or external memories exclusively which limit either accuracy or latency. We find that a hybrid method using internal and external MCU memories outperforms both approaches in accuracy and latency. We develop TinyOps, an inference engine which accelerates inference latency of models in slow external memory, using a partitioning and overlaying scheme via the available Direct Memory Access (DMA) peripheral to combine the advantages of external memory(size) and internal memory (speed). Experimental results show that architectures deployed with TinyOps significantly outperform models designed for internal memory with up to 6% higher accuracy and importantly, 1.3-2.2x faster inference latency to set the state-of-the-art in TinyML ImageNet classification. Our work shows that the TinyOps space is more efficient compared to the internal or external memory design spaces and should be explored further for TinyML applications.
Published: 2022
Full Text: View/download PDF

4. Dynamic DNNs meet runtime resource management on mobile and embedded platforms

Author: Xun, Lei, Al-Hashimi, Bashir, Hare, Jonathon, and Merrett, Geoff
Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to low latency and better privacy. However, efficient deployment on these platforms is challenging due to the intensive computation and memory access. We propose a holistic system design for DNN performance and energy optimisation, combining the trade-off opportunities in both algorithms and hardware. The system can be viewed as three abstract layers: the device layer contains heterogeneous computing resources; the application layer has multiple concurrent workloads; and the runtime resource management layer monitors the dynamically changing algorithms' performance targets as well as hardware resources and constraints, and tries to meet them by tuning the algorithm and hardware at the same time. Moreover, We illustrate the runtime approach through a dynamic version of 'once-for-all network' (namely Dynamic-OFA), which can scale the ConvNet architecture to fit heterogeneous computing resources efficiently and has good generalisation for different model architectures such as Transformer. Compared to the state-of-the-art Dynamic DNNs, our experimental results using ImageNet on a Jetson Xavier NX show that the Dynamic-OFA is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency. Furthermore, compared with Linux governor (e.g. performance, schedutil), our runtime approach reduces the energy consumption by 16.5% at similar latency.
Published: 2022

5. Dynamic DNNs Meet Runtime Resource Management on Mobile and Embedded Platforms

Author: Xun, Lei, Al-Hashimi, Bashir M., Hare, Jonathon, and Merrett, Geoff V.
Subjects: FOS: Computer and information sciences, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture
Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to low latency and better privacy. However, efficient deployment on these platforms is challenging due to the intensive computation and memory access. We propose a holistic system design for DNN performance and energy optimisation, combining the trade-off opportunities in both algorithms and hardware. The system can be viewed as three abstract layers: the device layer contains heterogeneous computing resources; the application layer has multiple concurrent workloads; and the runtime resource management layer monitors the dynamically changing algorithms' performance targets as well as hardware resources and constraints, and tries to meet them by tuning the algorithm and hardware at the same time. Moreover, We illustrate the runtime approach through a dynamic version of 'once-for-all network' (namely Dynamic-OFA), which can scale the ConvNet architecture to fit heterogeneous computing resources efficiently and has good generalisation for different model architectures such as Transformer. Compared to the state-of-the-art Dynamic DNNs, our experimental results using ImageNet on a Jetson Xavier NX show that the Dynamic-OFA is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency. Furthermore, compared with Linux governor (e.g. performance, schedutil), our runtime approach reduces the energy consumption by 16.5% at similar latency., Accepted as a presentation at Fourth UK Mobile, Wearable and Ubiquitous Systems Research Symposium (MobiUK 2022)
Published: 2022

6. Intermittent Opportunistic Routing Components for the INET Framework

Author: Longman, Edward, El-Hajjar, Mohammed, and Merrett, Geoff V.
Subjects: Networking and Internet Architecture (cs.NI), Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Networking and Internet Architecture, Computer Science - Performance, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS
Abstract: Intermittently-powered wireless sensor networks (WSNs) use energy harvesting and small energy storage to remove the need for battery replacement and to extend the operational lifetime. However, an intermittently-powered forwarder regularly turns on or off, which requires alternative networking solutions. Opportunistic routing (OR) is a potential cross-layer solution for this novel application, but due to the interaction with the energy storage, the operation of these protocols is highly dynamic. To compare protocols and components in like-for-like scenarios we propose module interfaces for MAC, routing and discovery protocols, that enable clear separation of concerns and good interchangeability. We also suggest some candidates for each of the protocols based on our own implementation and research., Published in: M. Marek, G. Nardini, V. Vesely (Eds.), Proceedings of the 8th OMNeT++ Community Summit, Virtual Summit, September 8-10, 2021
Published: 2021

7. GhostShiftAddNet: More Features from Energy-Efficient Operations

Author: Bi, Jia, Hare, Jonathon, and Merrett, Geoff V.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with fewer redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-wise shifts with addition operations to generate more feature maps that fully learn to capture information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy with fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by 1.3x and 2x on the GPU and CPU respectively. Code is available open-source on \url{https://github.com/JIABI/GhostShiftAddNet}.
Published: 2021

8. Runtime DNN performance scaling through resource management on heterogeneous embedded platforms

Author: Xun, Lei, Al-Hashimi, Bashir, Hare, Jonathon, and Merrett, Geoff
Abstract: DNN inference is increasingly being executed locally on embedded platforms, due to the clear advantages in latency, privacy and connectivity. Modern SoCs typically execute a combination of different and dynamic workloads concurrently, it is challenging to consistently meet latency/energy budgets because the local computing resources available to the DNN vary considerably. In this poster, we show how resource management can be applied to optimise the performance of DNN workloads by monitoring and tuning both software and hardware constantly at runtime. This work shows how dynamic DNNs trade-off accuracy with latency/energy/power on heterogeneous embedded CPU-GPU platform.
Published: 2021

9. DEff-ARTS: differentiable efficient ARchiTecture search

Author: Sadiq, Sulaiman, Maji, Partha, Hare, Jonathon, and Merrett, Geoff
Abstract: Manual design of efficient Deep Neural Networks (DNNs) for mobile and edge devices is an involved process which requires expert human knowledge to improve efficiency in different dimensions. In this paper, we present DEff-ARTS, a differentiable efficient architecture search method for automatically deriving CNN architectures for resource constrained devices. We frame the search as a multi-objective optimisation problem where we minimise the classification loss and the computational complexity of performing inference on the target hardware. Our formulation allows for easy trading-off between the sub-objectives depending on user requirements. Experimental results on CIFAR-10 classification showed that our approach achieved a highly competitive test error rate of 3:24% with 30% fewer parameters and multiply and accumulate (MAC) operations compared to Differentiable ARchiTecture Search (DARTS).
Published: 2020

10. Energy-driven systems and compute: Towards self-powered embedded computing systems

Author: Merrett, Geoff
Abstract: An energy harvester is a small part of a larger embedded system. Historically, such systems have typically been designed in the same way as their battery-powered systems, often adding significant complexity to make the harvester 'appear' to the load as if it were in fact a battery. In this talk, I will propose an alternative approach, that of energy-driven computing, where the design of applications and systems is rethought such that the energy environment is a key factor in the design process. I will illustrate this through two approaches: intermittent computing and power-neutral computing, highlighting the challenges and opportunities that they bring.
Published: 2020

11. Energy-driven occupant behaviour sensing

Author: Wong, Samuel Chang Bing, Gauthier, Stephanie, and Merrett, Geoff
Published: 2020

12. Mitigating interactive performance degradation from mobile device thermal throttling

Author: Bantock, James, Robert Benjamin, Al-Hashimi, Bashir, and Merrett, Geoff
Abstract: Mobile devices are limited in mass and volume reducing the viability of active device cooling implementations, this requires the use of less effective passive techniques to maintain device skin temperature levels. Application performance demands on a modern mobile device are driven by sustained performance workloads, such as 3D games, Virtual and Augmented Reality. Mobile System-on-Chips have corresponding increases in performance through both architectural changes and frequency of operation increases; which has resulted in the peak power consumption exceeding the sustainable thermal envelope defined by device skin temperature requirements. Existing thermal throttling techniques mitigate this by capping the frequency of operation of the System-on-Chip. Through experimentation with a modern smartphone platform using sequences from real-world applications, we demonstrate in this paper that Frequency Capping can have a significant effect on the performance of interactive applications, increasing the number of frame rate defects by up to 146%. We propose Task Utilization Scaling, a new lever for thermal throttling, which scales performance for critical interactive periods by the same factor as non-critical periods. Experiments demonstrate that the proposed approach can result in a decrease in frame rate defects of up to 18% compared with Frequency Capping or a skin temperature reduction of up to 2°C.
Published: 2020

13. Efficient deployment of UAV-powered sensors for optimal coverage and connectivity

Author: Cetinkaya, Oktay and Merrett, Geoff
Abstract: The Internet of Things (IoT) digitizes the physical world with wireless devices sensing their surroundings and delivering periodic notifications of parameters they are monitoring. However, this operation is bound by finite-capacity batteries, in which replenishment is practically infeasible due to the envisioned size of the IoT networks. By also considering the autonomous and self-sufficient service vision of the IoT paradigm, the need for novel approaches overcoming the energy constraints is evident. Here, unmanned aerial vehicles (UAVs) come into prominence. The UAVs can remotely energize wireless devices, via wireless power transfer (WPT), and thus guarantee reliable sensing coverage as well as longevity in the IoT domain. However, this can be only achieved by the precise alignment of both UAVs and wireless devices. Thus, this paper presents an efficient deployment strategy based on the circle packing problem, in which a lower-bound for the required number of wireless devices achieving optimal coverage is derived. The analysis, based on empirical measurements, reveals the design considerations for an energy harvesting (EH)-aided UAV scenario with regard to Federal Communications Commission (FCC) regulations, power consumption of wireless devices, and reporting frequency requirements of the IoT applications. Our results elaborate on a number of trade-offs, based on UAV, device, and medium characteristics, and provide realistic guidelines, achieving optimal coverage while meeting application requirements.
Published: 2020

14. Incremental training and group convolution pruning for runtime DNN performance scaling on heterogeneous embedded platforms

Author: Xun, Lei, Tran-Thanh, Long, Al-Hashimi, Bashir, and Merrett, Geoff
Abstract: Inference for Deep Neural Networks is increasingly being executed locally on mobile and embedded platforms due to its advantages in latency, privacy and connectivity. Since modern System on Chips typically execute a combination of different and dynamic workloads concurrently, it is challenging to consistently meet inference time/energy budget at runtime because of the local computing resources available to the DNNs vary considerably. To address this challenge, a variety of dynamic DNNs were proposed. However, these works have significant memory overhead, limited runtime recoverable compression rate and narrow dynamic ranges of performance scaling. In this paper, we present a dynamic DNN using incremental training and group convolution pruning. The channels of the DNN convolution layer are divided into groups, which are then trained incrementally. At runtime, later groups can be pruned for inference time/energy reduction or added back for accuracy recovery without model retraining. In addition, we combine task mapping and Dynamic Voltage Frequency Scaling (DVFS) with our dynamic DNN to deliver finer trade-off between accuracy and time/power/energy over a wider dynamic range. We illustrate the approach by modifying AlexNet for the CIFAR10 image dataset and evaluate our work on two heterogeneous hardware platforms: Odroid XU3 (ARM big.LITTLE CPUs) and Nvidia Jetson Nano (CPU and GPU). Compared to the existing works, our approach can provide up to 2.36x (energy) and 2.73x (time) wider dynamic range with a 2.4x smaller memory footprint at the same compression rate. It achieved 10.6x (energy) and 41.6x (time) wider dynamic range by combining with task mapping and DVFS.
Published: 2020

15. Managing power in heterogeneous multicore systems

Author: Merrett, Geoff
Abstract: Power- and energy-efficiency continues to be a primary concern in the design and management of computing systems, through from mobile devices (battery life and temperature) to HPC (electricity bills and temperature). Managing this is an increasingly complex task, as systems shift from having a single processing element to multi- and many-core computing platforms with numerous cores of differing types. In this talk I will present our research into the runtime management (RTM) of such systems that have come out of the PRiME (www.prime-project.org) research project. I will present a range of different approaches that we have developed and experimentally validated, and the key findings that we have made along the way. These encompass 1) exploring RTM on both novel and heterogeneous/homogeneous COTS multi-core platforms, 2) the impact of core scaling on RTMs, 3) issues and approaches for managing concurrently executing workloads on shared resource, and 4) comparing the impact of offline vs online characterisation approaches. I will also present a range of open-source tools that we have developed and released through these projects, spanning simulation and runtime power models for multi-core CPUs, to a framework for researchers to incorporate multi-core runtime management into their system and enable level comparison with the SoA.
Published: 2019

16. Energy harvesting meets iot: Fuelling adoption of transient computing in embedded systems

Author: Balsamo, Domenico, Magno, Michele, Kubara, Kacper, Lazarescu, Bogdan, Merrett, Geoff, and Cetinkaya, Oktay
Subjects: Class (computer programming), business.industry, Computer science, 020208 electrical & electronic engineering, 02 engineering and technology, Flash memory, Energy harvesting, Transient computing, Internet of Things, Arm mbed programming framework, 020202 computer hardware & architecture, Software, Embedded system, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Transient (computer programming), State (computer science), business
Abstract: The emerging class of transient computing systems enables computation to be sustained despite power outages due to the variable nature of energy harvesting. However, existing approaches are largely designed for specific architectures, and hence are not broadly applicable across different IoT devices. Emerging platforms based on portable, hardware-independent software should rely on lightweight operating systems (OSs) designed specifically for embedded IoT applications, such as Arm mbed OS and Contiki OS. To enable the widespread use of transient computing, transient approaches need to be integrated into these operating systems. In this paper, we discuss the challenges of providing software primitives for transient computing to facilitate hardware-independent implementation using standard OS APIs, and present the integration of a state-of-art transient approach, Hibernus into mbed OS. This OS is chosen due to the large community of developers and the open-source IoT code availability. Transient computing is offered through a modular and layered structure that uses the available mbed OS APIs, including different strategies for retaining the system state designed for different types of flash memory. To illustrate the applicability of the proposed design, we implemented Hibernus on two mbed platforms with different flash memories, which respectively requires 4.7mF and 4.9mF of additional storage.
Published: 2019

17. Enabling intermittent computing on high-performance out-of-order processors

Author: Sliper, Sivert Tvedt, Balsamo, Domenico, Weddell, Alexander, and Merrett, Geoff
Abstract: Intermittent computing is a new paradigm enabling battery-less computing devices to be powered directly from energy harvesting, enabling IoT devices that are free from the cost, size and lifetime constraints of batteries. To cope with frequent power interruptions, intermittent computing systems save computational progress before power is lost, and restore it when power returns. Recent research in power-neutral operation of multiprocessor system-on-chips (MPSoCs), where performance scaling is used to instantaneously match power consumption with supply, motivates the need for intermittent computing on high-performance systems. Existing works provide solutions for microcontrollers, but with the increased complexity of high-performance SoCs, new challenges such as hierarchical memory and dependence on large existing libraries emerge. In this paper, we provide a taxonomy of published intermittent computing methods and identify the most suitable method for high-performance SoCs. The chosen method is then implemented and experimentally validated on an Arm A9 out-of-order application processor. Results show that state can be saved/restored correctly in 8.6 ms for a minimal bare-metal application, which is an order of magnitude faster than the platform’s hardware boot time.
Published: 2018

18. Run-time power and energy management of multi- and many-core systems

Author: Merrett, Geoff
Published: 2018

19. Run-time power and energy management of many-core systems

Author: Merrett, Geoff
Abstract: Energy-efficiency is important at all scales of computing system, from microcontrollers through to HPC. Established mechanisms like DPM and DVFS provide controls to affect power consumption, but careful management is required for effective use. In this talk, I provide an overview of the different run-time power management (RTM) approaches that we have developed, explored and practically validated through the EPSRC PRiME programme grant (www.prime-project.org), and discuss key findings and lessons learnt. I also refer to a range of open-source tools that we have released as a result of the project, from multi-core power modelling to a cross-platform framework for RTM.
Published: 2018

20. Application control and monitoring in heterogeneous multiprocessor systems

Author: Leech, Charles R., Bragg, Graeme McLachlan, Balsamo, Domenico, Weber Wachter, Eduardo, Merrett, Geoff, and Al-Hashimi, Bashir
Abstract: Multiprocessor systems provide both highperformance and energy-efficient execution of applications on mobile and embedded systems under dynamic workload requirements, and can provide increased lifetime for devices in energy-constrained environments. However, their increasing complexity means that management at runtime has become a non-trivial task, especially in heterogeneous multiprocessor systems. In addition, there is no standardised mechanism to expose and manage the sources of control and monitoring from within applications and hardware resources at runtime.This paper presents an analysis of applications, platforms and runtime management approaches to motivate the need for a standardised framework that enables fully applicationand platform-agnostic runtime management. The exposure of application controls and requirements through the presented framework is demonstrated with a stereo matching algorithm, including runtime management of multi-threading and frequency scaling on the 61-core Xeon Phi platform. In addition, the trading of application parameters, such as throughput and accuracy, is demonstrated within the framework using a runtime controller on the Odroid-XU3 platform. An open-source implementation of this framework has been released.
Published: 2018

21. Power-neutral performance scaling for self-powered multicore computing systems

Author: Balsamo, Domenico, Fletcher, Benjamin, James, and Merrett, Geoff
Published: 2018

22. An Application- and Platform-agnostic Control and Monitoring Framework for Multicore Systems

Author: Bragg, Graeme McLachlan, Leech, Charles R., Balsamo, Domenico, Davis, James J., Weber Wachter, Eduardo, Merrett, Geoff, Constantinides, George A., Al-Hashimi, Bashir, and Engineering & Physical Science Research Council (E
Abstract: Heterogeneous multiprocessor systems have increased in complexity to provide both high performance and energy efficiency for a diverse range of applications. This motivates the need for a standard framework that enables the management, at runtime, of software applications executing on these processors. This paper proposes the first fully application- and platform-agnostic framework for runtime management approaches that control and optimise software applications and hardware resources. This is achieved by separating the system into three distinct layers connected by an API and cross-layer constructs called knobs and monitors. The proposed framework also supports the management of applications that are executing concurrently on heterogeneous platforms. The operation of the proposed framework is experimentally validated using a basic runtime controller and two heterogeneous platforms, to show how it is application- and platform-agnostic and easy to use. Furthermore, the management of concurrently executing applications through the framework is demonstrated. Finally, two recently reported runtime management approaches are implemented to demonstrate how the framework enables their operation and comparison. The energy and latency overheads introduced by the framework have been quantified and an open-source implementation has been released.
Published: 2018

23. Memory and thread synchronization contention-aware DVFS for HPC systems

Author: Basireddy, Karunakar Reddy, Weber Wachter, Eduardo, Al-Hashimi, Bashir, and Merrett, Geoff
Abstract: Due to the operating costs and failure rates of computing platforms, energy efficiency has become a major concern for modern and future many-core systems. In the quest for high performance, the power consumption growth rate must slow down while delivering more performance per unit of power. To improve the energy efficiency of such systems, processors are equipped with low-power techniques such as dynamic voltage and frequency scaling (DVFS) and power capping. These techniques must be controlled carefully as per the workload; otherwise, it may result in significant performance loss and/or power consumption due to system overheads (e.g. DVFS transition latency). Existing approaches [1], [2] are not effective in adapting to workload variations as they do not consider the combined effect of application compute-/memory-intensity, thread synchronization contention, and non-uniform memory accesses (NUMAs) owing to the underlying processor architecture. This poster discusses a workload-aware runtime energy management technique that takes the aforementioned factors into account for efficient V-f control.
Published: 2018

24. Run-time power management of multi- and many-core systems

Author: Merrett, Geoff
Abstract: Power- and energy-efficiency continues to be a primary concern in the design and management of computing systems, through from mobile devices (battery life and temperature) to HPC (electricity bills and temperature). In this talk I will give a summary of our research into the runtime management (RTM) of multi- and many-core computing systems, that have come out of the PRiME (www.prime-project.org) and Graceful research projects. I will present a range of different approaches that we have developed and experimentally validated, and the key findings that we have made along the way. These encompass 1) exploring RTM on both novel and heterogeneous/homogeneous COTS multi-core platforms, 2) the impact of core scaling on RTMs, 3) issues and approaches for managing concurrently executing workloads on shared resource, and 4) comparing the impact of offline vs online characterisation approaches. I will also present a range of open-source tools that we have developed and released through these projects, spanning simulation and runtime power models for multi-core CPUs, to a framework for researchers to incorporate multi-core runtime management into their system and enable level comparison with the SoA.
Published: 2018

25. Adaptation in heterogeneous multi-core SoCs

Author: Singh, Amit Kumar, Merrett, Geoff, and Al-Hashimi, Bashir
Published: 2018

26. Accurate and stable empirical CPU power modelling for multi- and many-core systems

Author: Walker, Matthew, Diestelhorst, Stephan, Merrett, Geoff, and Al-Hashimi, Bashir
Abstract: Modern processors must provide an increasing level of performance, and are therefore including higher numbers of Heterogeneous Multi-Processing (HMP) elements. Intelligent run-time control of performance and power consumption is required to extend battery-life in mobile systems, reduce energy and cooling costs in data centres, and increase peak performance while respecting thermal and power constraints. Accurate online power estimation is essential in guiding run-time power management mechanisms and energy-aware scheduling decisions. We present a statistically-rigorous methodology for developing accurate and stable run-time power models and we experimentally demonstrate their ability to perform more accurately across a wider range of workloads. We highlight significant shortcomings in existing techniques and present an improved model formulation that also accounts for thermal effects. Moreover, we present the Powmon software tools that automates our methodology, allowing power models to be developed for other platforms.Accurate performance and power modelling is also essential in full-system simulation. We present the GemStone open-source software tool, which automates the process of characterising hardware platforms; identifying sources of error in gem5 performance models using machine learning techniques; applying the empirical power models to simulation data; and quantifying the effect of simulation errors on the performance, power and energy estimations, including their scaling across Dynamic Voltage-Frequency Scaling (DVFS) levels and HMP core types.The presented work enables the development and implementation of smart run-time power management and energy-aware scheduling algorithms, as well as hardware-validated performance, power and energy simulation for design-space exploration and optimisation of future systems.
Published: 2018

27. The PRiME Framework: Application- & platform-agnostic system management

Author: Bragg, Graeme McLachlan, Balsamo, Domenico, Leech, Charles R, and Merrett, Geoff
Abstract: Multi-core and heterogeneous processors in modern embedded platforms have increased in complexity to provide both high-performance and energy-efficient execution of applications.As a result, the runtime management and control of these platforms has become a non-trivial process with many different approaches being reported in the literature. In addition, applications have become increasingly dynamic to exploit these processors runtime adjustable parameters that can be tuned to optimise and influence their behaviour. These two challenges motivate the need for a consistent approach to runtime management that is cross-platform and generic in the support of applications. This abstract presents the PRiME Framework, a cross-layer framework that enables application- and platformagnostic runtime management by separating a system into three distinct layers connected by an API and cross-layer constructs called knobs and monitors. The motivation for the framework’s underlying concepts are discussed and its use is demonstrated with a range of platforms and applications.An open-source implementation of this framework has been released.
Published: 2018

28. Online concurrent workload classification for multi-core energy management

Author: Basireddy, Karunakar Reddy, Singh, Amit, Merrett, Geoff, and Al-Hashimi, Bashir
Abstract: Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches.
Published: 2018

29. Application- and platform-agnostic runtime power management of heterogeneous embedded systems

Author: Balsamo, Domenico, Bragg, Graeme McLachlan, Leech, Charles, and Merrett, Geoff
Abstract: Increasing energy efficiency and reliability at runtime is a key challenge of heterogeneous many-core systems. We demonstrate how contributions from the PRiME project integrate to enable application- and platform-agnostic runtime management that respects application performance targets. We consider opportunities to enable runtime management across the system stack and we enable cross-layer interactions to trade-off power and reliability with performance and accuracy. We consider a system as three distinct layers, with abstracted communication between them, which enables the direct comparison of different approaches, without requiring specific application or platform knowledge. Application-agnostic runtime management is demonstrated with a selection of runtime managers from PRiME, including linear regression modelling and predictive thermal management, operating across multiple applications. Platform-independent runtime management is demonstrated using two heterogeneous platforms.
Published: 2018

30. Hibernus++:A Self-Calibrating and Adaptive System for Transiently-Powered Embedded Devices

Author: Balsamo, Domenico, Weddell, Alex S., Das, Anup, Rodriguez Arreola, Alberto, Brunelli, Davide, Al-Hashimi, Bashir M., Merrett, Geoff V., Benini, Luca, Balsamo, Domenico, Weddell, Alex S., Das, Anup, Arreola, Alberto Rodriguez, Brunelli, Davide, Al Hashimi, Bashir M., Merrett, Geoff V., and Benini, Luca
Subjects: energy harvesting, Embedded systems, intermittent supply, Electrical and Electronic Engineering, low-power design, Embedded system, transient computing, Computer Graphics and Computer-Aided Design, Software
Abstract: Energy harvesters are being used to power autonomous systems, but their output power is variable and intermittent. To sustain computation, these systems integrate batteries or supercapacitors to smooth out rapid changes in harvester output. Energy storage devices require time for charging and increase the size, mass, and cost of systems. The field of transient computing moves away from this approach, by powering the system directly from the harvester output. To prevent an application from having to restart computation after a power outage, approaches such as Hibernus allow these systems to hibernate when supply failure is imminent. When the supply reaches the operating threshold, the last saved state is restored and the operation is continued from the point it was interrupted. This paper proposes Hibernus++ to intelligently adapt the hibernate and restore thresholds in response to source dynamics and system load properties. Specifically, capabilities are built into the system to autonomously characterize the hardware platform and its performance during hibernation in order to set the hibernation threshold at a point which minimizes wasted energy and maximizes computation time. Similarly, the system auto-calibrates the restore threshold depending on the balance of energy supply and consumption in order to maximize computation time. Hibernus++ is validated both theoretically and experimentally on microcontroller hardware using both synthesized and real energy harvesters. Results show that Hibernus++ provides an average 16% reduction in energy consumption and an improvement of 17% in application execution time over state-of-the-art approaches.
Published: 2016
Full Text: View/download PDF

31. High-speed low-complexity guided image filtering-based disparity estimation

Author: Vala, Charan Kumar, Immadisetty, Koushik, Acharyya, Amit, Leech, Charles, Balagopal, Vibishna, Merrett, Geoff V., and Al-Hashimi, Bashir
Subjects: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Abstract: Stereo vision is a methodology to obtain depth in a scene based on the stereo image pair. In this paper, we introduce a discrete wavelet transform (DWT)-based methodology for a state-of-the-art disparity estimation algorithm that resulted in significant performance improvement in terms of speed and computational complexity. In the initial stage of the proposed algorithm, we apply DWT to the input images, reducing the number of samples to be processed in subsequent stages by 50%, thereby decreasing computational complexity and improving processing speed. Subsequently, the architecture has been designed based on this proposed methodology and prototyped on a Xilinx Virtex-7 FPGA. The performance of the proposed methodology has been evaluated against four standard Middlebury Benchmark image pairs viz. Tsukuba, Venus, Teddy, and Cones. The proposed methodology results in the improvement of about 44.4% cycles per frame, 52% frames/s, and 61.5% and 59.6% LUT and register utilization, respectively, compared with state-of-the-art designs.
Published: 2018

32. PRiME: Power-efficient Reliable Many-core Embedded systems

Author: Merrett, Geoff
Published: 2018

33. ARM mbed support for transient computing in energy harvesting IoT systems

Author: Lazarescu, Bogdan, Balsamo, Domenico, and Merrett, Geoff
Abstract: Energy harvesters offer the possibility for embedded IoT computing systems to operate without batteries. However, their output power is usually unpredictable and highly variable. To mitigate the effect of this variability, systems incorporate large energy buffers, increasing their size, mass and cost. The emerging class of transient computing systems differs from this approach, operating directly from the energy harvesting source and minimizing or removing additional energy storage. Existing transient approaches are largely designed for specific applications and architectures. Hence, they suffer from not being broadly applicable across multiple embedded IoT platforms. To address this challenge, transient approaches need to be integrated within a general IoT programming framework such as ARM’s mbed IoT Device Platform. This support is offered through libraries and application programming interfaces(APIs) which enable transient computing to be implemented as a service on top of IoT application protocols.
Published: 2017

34. Accurate and Stable Run-Time Power Modeling for Mobile and Embedded CPUs

Author: Walker, Matthew, Diestelhorst, Stephan, Hansson, Andreas, Das, Anup, Yang, Sheng, Al-Hashimi, Bashir M., and Merrett, Geoff V.
Subjects: Embedded systems, performance monitoring counters (PMCs), PMC event selection, power modeling and estimation
Abstract: Modern mobile and embedded devices are required to be increasingly energy-efficient while running more sophisticated tasks, causing the CPU design to become more complex and employ more energy-saving techniques. This has created a greater need for fast and accurate power estimation frameworks for both run-time CPU energy management and design-space exploration. We present a statistically rigorous and novel methodology for building accurate run-time power models using performance monitoring counters (PMCs) for mobile and embedded devices, and demonstrate how our models make more efficient use of limited training data and better adapt to unseen scenarios by uniquely considering stability. Our robust model formulation reduces multicollinearity, allows separation of static and dynamic power, and allows a 100× reduction in experiment time while sacrificing only 0.6% accuracy. We present a statistically detailed evaluation of our model, highlighting and addressing the problem of heteroscedasticity in power modeling. We present software implementing our methodology and build power models for ARM Cortex-A7 and Cortex-A15 CPUs, with 3.8% and 2.8% average error, respectively. We model the behavior of the nonideal CPU voltage regulator under dynamic CPU activity to improve modeling accuracy by up to 5.5% in situations where the voltage cannot be measured. To address the lack of research utilizing PMC data from real mobile devices, we also present our data acquisition method and experimental platform software. We support this paper with online resources including software tools, documentation, raw data and further results.
Published: 2017
Full Text: View/download PDF

35. Software-defined PMC for Runtime Power Management of a Many-core Neuromorphic Platform

Author: Sugiarto, Indar, Shang, Delong, Singh, Amit Kumar, Ouni, Bassem, Merrett, Geoff, Al-Hashimi, Bashir, and Furber, Stephen
Subjects: SpiNNaker, RTM, Neuromorphic, PMC, Many-core
Abstract: This paper presents an approach to provide a Run-time Management (RTM) system for a many-core neuromorphic platform. RTM frameworks are commonly used to achieve an energy saving while satisfying application performance requirements. In commodity processors, the RTM can be implemented by utilizing the output of Performance Monitoring Counters (PMCs) to control the frequency of the processor's clock. However, many neuromorphic platforms such as SpiNNaker do not have PMC units; thus, we propose a software-defined PMC that can be implemented using standard programming tool-chains in such platforms. In this paper, we evaluate several control strategies for RTM in SpiNNaker. These control programs are equivalent with governors in standard operating systems such as Linux. For evaluation, we use the RTM with several image processing applications. The results show that our proposed method, called Improved-Conservative, produces the lowest thermal risk and energy consumption while achieving the same performance as other adaptive governors.
Published: 2017

36. Dataset supporting the article entitled 'The Slowdown or Race-to-idle Question: Workload-Aware Energy Optimization of SMT Multicore Platforms under Process Variation'

Author: Das, Anup, Merrett, Geoff V., and Al-Hashimi, Bashir
Subjects: 010302 applied physics, 0103 physical sciences, +Electronics+and+Computer+Science%22">Faculty of Physical Sciences and Engineering > Electronics and Computer Science, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, 01 natural sciences, 020202 computer hardware & architecture
Abstract: This dataset supports the article entitled "The Slowdown or Race-to-idle Question: Workload-Aware Energy Optimization of SMT Multicore Platforms under Process Variation" accepted for publication in DATE conference 2016.
Published: 2017
Full Text: View/download PDF

37. Exploring energy efficient state retention in transiently-powered computing systems

Author: Verykios, Theodoros D., Balsamo, Domenico, and Merrett, Geoff V.
Abstract: Batteries have traditionally been used to power embedded electronic devices. However, requirements such as a long lifetime, low cost, and weight, pose significant challenges to battery-powered systems. Energy harvesting offers the potential for embedded systems to operate without batteries. Nonetheless, harvesting has been traditionally coupled with large energy buffers such as supercapacitors to tackle the instability of the source. Transiently-powered computing systems enable computation to be sustained despite the sourcevariability, without the need for additional energy storage. To make this feasible, the system state (e.g. registers and RAM) needs to be saved to Non-Volatile Memory (NVM) before a power outage, and restored once power is available again. Existing transient systems save the entire state of the system upon power failure and do not consider the properties of different NVM technologies, leading into a sub-optimal state retention process. As a consequence, the time and energy spent towards useful computation are decreased significantly, affecting the forward progress that the system can achieve. The aim of this research is to introduce novel methods to reduce the time and energy overhead of the state retention process, exploring solutions both in the software and hardware domain.
Published: 2017

38. Selective policies for efficient state retention in transiently-powered systems

Author: Verykios, Theodoros D., Balsamo, Domenico, and Merrett, Geoff
Abstract: Energy harvesting offers the potential for embedded systems to operate without batteries. However, harvesting has been traditionally coupled with large energy buffers such as supercapacitors to mitigate the effect of the source variability. An emerging class of transiently-powered sensing systems enable computation to be sustained during intermittent supply, without using any additional energy storage. To deal with the intermittent nature of the input source, the system state (e.g. registers and RAM) is saved to Non-Volatile Memory (NVM) before a power failure, and restored when the power supply recovers. Existing approaches save the entire state of the system upon power failure, but this is energy and time consuming. In this poster, novel selective policies for efficiently retaining state are explored, which exploit properties of different NVM technologies.
Published: 2017

39. ITMD: run-time management of concurrent multi-threaded applications on heterogeneous multi-cores

Author: Basireddy, Karunakar Reddy, Singh, Amit, Merrett, Geoff V., and Al-Hashimi, Bashir M.
Abstract: Heterogeneous multi-cores often deal with multiple applications having different performance requirements concurrently, which generate varying and mixed workloads. Runtime management is required for adapting to such performance requirements and workload variabilities, and to achieve energy efficiency. It is challenging to efficiently exploit different types of cores simultaneously and DVFS potential of cores. We present a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization. We demonstrate the proposed run-time management approach on the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches.
Published: 2017

40. Energy-driven computing for energy-harvesting embedded systems

Author: Merrett, Geoff V.
Abstract: There has been increasing interest over the last decade in the powering of embedded systems from ‘harvested’ energy, and this has been further fuelled by the promise and vision of IoT. Energy harvesting systems present numerous challenges, although some of these are also posed by their battery-powered counterparts: e.g. ultra-low power consumption. However, a significant challenge not witnessed in battery-powered systems is a requirement to manage the combination of a highly unpredictable and variable (spatially and temporally) power supply with a highly dynamic (across many orders of magnitude) and often event-driven system power consumption. This problem is typically rectified through the addition of energy storage (e.g. a supercapacitor) to provide energy buffering to smooth out the dynamics of supply and consumption. This has the significant advantage of making the system ‘look like’ a battery-powered system, yet usually adds volume, mass and cost to the resultant system – something that is counterproductive in future flexible, wearable and implantable IoT systems. Such systems can, alternatively, include only a very small amount (or even zero) energy-storage. Now, instead of the system’s operation being dictated solely by the application, operation starts to become ‘energy-driven’, with execution being highly intertwined with power and energy availability. In this presentation, I will first introduce the landscape of energy-harvesting computing systems, and articulate how energy-driven computing presents a different class of computing to conventional approaches. A significant issue in the successful operation of these systems is their ability to operate from an intermittent, constrained and variable supply, and I will show how transient operation and power-neutrality can be used to achieve the vision for these systems, and hence enable the proliferation of tiny self-powered systems that will underpin much of the IoT.
Published: 2016

41. Learning transfer-based adaptive energy minimization in embedded systems

Author: Shafik, Rishad Ahmed, Yang, Sheng, Das, Anup K., Maeda-Nunez, Luis Alfonso, Merrett, Geoff V., and Al-Hashimi, Bashir
Abstract: Embedded systems execute applications with different performance requirements. These applications exercise the hardware differently depending on the types of computation being carried out, generating varying workloads with time. We will demonstrate that energy minimization with such workload and performance variations within (intra) and across (inter) applications is particularly challenging. To address this challenge we propose an online energy minimization approach, capable of minimizing energy through adaptation to these variations. At the core of the approach is an initial learning through reinforcement learning algorithm that suitably selects the appropriate voltage/frequency scalings (VFS) based on workload predictions to meet the applications’ performance requirements. The adaptation is then facilitated and expedited through learning transfer, which uses the interaction between the system application, runtime and hardware layers to adjust the power control levers. The proposed approach is implemented as a power governor in Linux and validated on an ARM Cortex-A8 running different benchmark applications. We show that with intra- and inter-application variations, our proposed approach can effectively minimize energy consumption by up to 33% compared to existing approaches. Scaling the approach further to multi-core systems, we also show that it can minimize energy by up to 18% with 2X reduction in the learning time when compared with a recently reported approach.
Published: 2016

42. Transient and power-neutral computing: a paradigm shift for embedded systems?

Author: Merrett, Geoff V.
Abstract: Embedded systems powered from time-varying energy harvesting power sources, for example solar PV or mechanical vibration, have traditionally operated using the principles of energy-neutral computing. That is, over a sensible period of time (e.g. 24 hours), the energy consumed is equal to the energy that was harvested. This has the advantage of making the system ‘look like’ a battery-powered system, yet typically results in large, complex and expensive power conversion circuitry and introduces challenges such as fast and reliable cold-start. In recent years, the concept of transient computing has emerged to challenge this, whereby low-power embedded systems can be designed to operate and perform useful computation when energy is available, and carefully ‘hibernate’ when the power disappears such that it can continue where it left off when supply is regained. In this talk I will explain this shift towards transient computing and the different approaches that have been proposed, and the new challenges that are raised as a result. I will also discuss a complementary approach to the powering of transient systems, named power-neutral computing. Instead of equating energy consumption to energy supply, as is the case in energy-neutral systems, power-neutral systems attempt to match instantaneous power consumption to the instantaneous power supplied. This fine-grained control permits better use of available resources while overcoming the disadvantages of energy-neutral computing; furthermore, it can work alongside aforementioned transient computing techniques if supply disappears altogether.
Published: 2016

43. Hibernus++: A Self-Calibrating and Adaptive System for Transiently-Powered Embedded Devices

Author: Balsamo, Domenico, Weddell, Alex, Das, Anup, Arreola, Alberto, Brunelli, Davide, Al Hashimi, Bashir, Merrett, Geoff, and Benini, Luca
Subjects: Transient analysis Circuit faults, Checkpointing, Batteries, Microcontrollers, Nonvolatile memory, Batteries, Transient analysis Circuit faults, Nonvolatile memory, Checkpointing, Microcontrollers
Published: 2016

44. Poster Abstract: Enspect—Simplifying the Design of Energy Harvesting Systems

Author: Tinsley, Nick F., Witts, Stuart T., Ansell, Jacob M. R., Barnes, Emily, Jenkins, Simeon M., Raveendran, Dhanushan, Merrett, Geoff V., and Weddell, Alex S.
Abstract: The design of sensing systems powered from energy harvesting can be complex. Design decisions are required concerning the properties and parameters of energy harvesting, conversion, and storage devices. The quantity and properties of environmental energy are typically both temporally and spatially variant, while the current consumption of the load electronics also changes dynamically. In this paper we describe Enspect, an open-source hardware/software tool which simplifies the design of energy harvesting sensing systems by assisting in the specification of harvesting and storage devices. It does this by enabling the long-term collection of data on energy availability, and modeling and simulating the performance of a complete system.
Published: 2015

45. Data-driven low-complexity nitrate loss model utilizing sensor information – towards collaborative farm management with wireless sensor networks

Author: Zia, Huma, Harris, Nick, and Merrett, Geoff V.
Published: 2015

46. A model-based trace testing approach for validation of formal co-simulation models

Author: Intana, Adisak, Poppleton, Michael R., and Merrett, Geoff V.
Abstract: This paper presents a model-based trace testing (MBTT) approach to strengthen verification and validation techniques for formal co-simulation based wireless sensor network development (FoCoSim-WSN). This framework enables the functionality and protocol algorithms to be encoded in the controller model in the formal Event-B language. Use of proof tools can guarantee safety properties of this formal model. Also, network reliability and performance analysis is performed by MiXiM simulation including e.g. the network load distribution and the network latency. However, this framework lacks focus in validation coverage since test scenarios for the controller model are generated randomly from the simulation environment. Consequently, the MBTT technique is applied to validate the formal Event-B controller in co-models. This technique enables us to create test scenarios from the sequence of events in our co-simulation master algorithm. We use event trace diagrams, fault injection and recovery testing to specify functional, failing and recovery test scenarios. We define MiXiM co-simulation runs to generate long running test scenarios meeting our test requirements. The result shows how failing test scenarios in these runs (“killer traces”) enable model debugging in terms of absent or erroneous constraints and events.
Published: 2015

47. Thermal-aware adaptive energy minimization of open MP parallel applications

Author: Shafik, Rishad Ahmed, Das, Anup K., Yang, Sheng, Merrett, Geoff V., and Al-Hashimi, Bashir
Abstract: Energy minimization of parallel applications considering thermal distributions among the processor cores is an emerging challenge for current and future generations of many-core computing systems. This paper proposes an adaptive energy minimization approach that hierarchically applies dynamic voltage\slash frequency scaling (DVFS), thread-to-core affinity and dynamic concurrency controls (DCT) to address this challenge. The aim is to minimize the energy consumption and achieve balanced thermal distributions among cores, thereby improving the lifetime reliability of the system, while meeting a specified power budget requirement. Fundamental to this approach is an iterative learning-based control algorithm that adapts the VFS and core allocations dynamically based on the CPU workloads and thermal distributions of the processor cores, guided by the CPU performance counters at regular intervals. The adaptation is facilitated through modified OpenMP library-based power budget annotations. The proposed approach is extensively validated on an Intel Xeon E5-2630 platform with up to 12 CPUs running NAS parallel benchmark applications.
Published: 2015

48. Multimedia Data Processing and Delivery in Wireless Sensor Networks

Author: Molina Cantero, Francisco Javier, Mora-Merchán, Javier María, Barbancho Concejero, Julio, León de Mora, Carlos, Kheng, Tan Yen (Coordinador), Merrett, Geoff (Coordinador), Kheng, Tan Yen, Merrett, Geoff, and Universidad de Sevilla. Departamento de Tecnología Electrónica
Published: 2010

49. PoGo: an application-specific adaptive energy minimisation approach for embedded systems

Author: Maeda-Nunez, Luis Alfonso, Das, Anup K., Shafik, Rishad A., Merrett, Geoff V., and Al-Hashimi, Bashir
Published: 2015

50. Run-time power estimation for mobile and embedded asymmetric multi-core CPUs

Author: Walker, Matthew J., Das, Anup K., Merrett, Geoff V., and Hashimi, B.M.
Published: 2015

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

73 results on '"Merrett, Geoff"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources