10,809 results
Search Results
2. Colony: A Privileged Trusted Execution Environment With Extensibility.
- Author
-
Xia, Yubin, Hua, Zhichao, Yu, Yang, Gu, Jinyu, Chen, Haibo, Zang, Binyu, and Guan, Haibing
- Subjects
SYSTEMS software ,COLONIES ,BURGLARY protection ,CONFERENCE papers ,SEMANTICS ,BEE colonies - Abstract
The code base of system software is growing fast, which results in a large number of vulnerabilities: for example, 296 CVEs have been found in Xen hypervisor and 2195 CVEs in Linux kernel. To reduce the reliance on the trust of system software, many researchers try to provide trusted execution environments (TEEs), which can be categorized into two types: non-privileged TEEs and privileged TEEs. Non-privileged TEEs (e.g., Intel SGX) are extensible, but cannot protect security services like virtual machine introspection (VMI) due to the lack of system-level semantics. On the contrary, privileged TEEs (e.g., the secure world of ARM TrustZone) have system-level semantics, but any additional service implemented in the privileged TEE directly increases the TCB of the entire system. In this article, we propose a new design of TEE to support system-level security services and achieve better extensibility with a small TCB. Each TEE instance of the proposed design is named a Colony. Specifically, we introduce a secure monitor for isolation and capability management. Each Colony is assigned capabilities to access only necessary system-level semantics. We use the new TEE to build four security services, including secure device accessing, VMI tools, a system call tracer, and a much more complex service to virtualize ARM TrustZone with multiple Colonies. We have implemented the system on ARMv7 and ARMv8 platforms, in Xen hypervisor and Linux kernel, and perform a detailed evaluation to show its efficiency. 1. This paper is an extended version of the conference paper published in USENIX Security’17: vTZ: Virtualizing ARM TrustZone. A brief summary of differences is in Section. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Fast Generation of RSA Keys Using Smooth Integers.
- Author
-
Dimitrov, Vassil, Vigneri, Luigi, and Attias, Vidal
- Abstract
Primality generation is the cornerstone of several essential cryptographic systems. The problem has been a subject of deep investigations, but there is still a substantial room for improvements. Typically, the algorithms used have two parts – trial divisions aimed at eliminating numbers with small prime factors and primality tests based on an easy-to-compute statement that is valid for primes and invalid for composites. In this paper, we will showcase a technique that will eliminate the first phase of the primality testing algorithms. The computational simulations show a reduction of the primality generation time by about 30 percent in the case of 1024-bit RSA key pairs. This can be particularly beneficial in the case of decentralized environments for shared RSA keys as the initial trial division part of the key generation algorithms can be avoided at no cost. This also significantly reduces the communication complexity. Another essential contribution of the paper is the introduction of a new one-way function that is computationally simpler than the existing ones used in public-key cryptography. This function can be used to create new random number generators, and it also could be potentially used for designing entirely new public-key encryption systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. A Low Complexity and Long Period Digital Random Sequence Generator Based on Residue Number System and Permutation Polynomial.
- Author
-
Chen, Shilin, Ma, Shang, Qin, Zhuo, Zhu, Bixin, Xiao, Ziqian, and Liu, Meiqing
- Subjects
DIGITAL communications ,NUMBER systems ,CHINESE remainder theorem ,PERMUTATIONS ,FIELD programmable gate arrays ,IMAGE encryption ,CATHODE ray tubes ,RANDOM numbers - Abstract
Long period digital random sequence plays an important role in reliable communications and high security scenarios. This paper improved the method of generating long period digital random sequences based on the Residue Number System (RNS) and the Chinese Remainder Theorem (CRT), and a sequence mapping method after CRT extension. This paper proves that the period of sequence after mapping will not degenerate if the modulus used in the mapping stage is coprime with the period of the original sequence. By using the parallelism of RNS, the proposed method can generate sequences at high speed with fewer hardware resources. The NIST test results show that the pass rate of each test item is above 98.40%, which meets the NIST test confidence requirements, confirming the randomness of the generated sequences. An image encryption test is given as one of the example applications of the generated sequences. On the theoretical basis, by jointly optimizing the sequence mapping and iteration procedure, a hardware implementation architecture is also presented in this paper. The implementation is based on Xilinx XC7Z020CLG484-3 FPGA and compared with the implementations of classical chaotic maps. The results show that the proposed architecture has longer sequence period with less hardware resource consumption and higher generation speed and is more general. Meanwhile, the proposed architecture has fast phase switching ability, which is about 10 clock periods. This is one of the key attributes when the sequence is used in communication systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Curvature and Creases: A Primer on Paper.
- Author
-
Huffman
- Published
- 1976
- Full Text
- View/download PDF
6. A Data Layout With Good Data Locality for Single-Machine Based Graph Engines.
- Author
-
Jo, Yong-Yeon, Jang, Myung-Hwan, Kim, Sang-Wook, and Park, Sunju
- Subjects
GRAPH algorithms ,ENGINES - Abstract
Graph engines have been used in many applications to handle big graphs efficiently. The majority of the research to improve their performance has focused primarily on the design of efficient graph processing. This paper claims, however, the focus should be given also to graph storage design. This is because good storage design can improve both CPU performance and I/O performance of graph engines. In this paper, we propose an efficient data layout for single-machine based graph engines. We identify the common node access pattern of the graph algorithms running on single-machine based graph engines. Based on this finding, we propose the breadth-first (BF) data layout which places the nodes processed together in the same or adjacent storage space so that they can be accessed together as much as possible. The experimental results show that the BF data layout improves both CPU and I/O performances significantly in all single-machine based graph engines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Improving Interference Analysis for Real-Time DAG Tasks Under Partitioned Scheduling.
- Author
-
Wu, Yulong, Zhang, Weizhe, Guan, Nan, and Tang, Yue
- Abstract
Real-time systems with strict timing constraints have been widely applied in many fields. The Directed acyclic graph (DAG) task model has been widely studied and applied to model real-time systems with partial parallelism and precedence constraints in each task. Our paper focuses on the worst-case response time (WCRT) analysis of DAG tasks under partitioned scheduling on multiprocessors. We investigate a parallel structure named $Str$ S t r , which helps obtain more accurate analysis results, and propose a new offline scheduling analysis algorithm named reducing repetitive calculation (RRC). Experiments with synthetic workload are conducted to compare the results calculated by RRC and the state-of-the-art, as well as the observed average response time on a real embedded system. Results show that RRC has better performance in terms of analysis accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Regularity-Based Virtualization Under the ARINC 653 Standard for Embedded Systems.
- Author
-
Dai, Guangli, Paluri, Pavan Kumar, Cheng, Albert Mo Kim, and Liu, Bozheng
- Subjects
VIRTUAL machine systems ,CENTRAL processing units ,TASK performance - Abstract
In embedded real-time virtualized systems (ERTVS), the ARINC 653 standard specifies a cyclic scheduling policy to guarantee the real-time performance of tasks in multiple Virtual Machines (VMs) residing on shared hardware. Based on this policy, the Regularity-based Resource Partitioning (RRP) model defines an efficient interface specification to hierarchically partition and assign resource slices among VMs. Although this model has received plenty of attention recently, three major pieces remain missing for applying this model in ERTVS. (1) Embedded systems are more sensitive to resource utilization efficiency since this may drastically affect their deployment cost for including additional cores. Therefore, this paper proposes an optimal and an approximate RRP resource scheduler for multi-core platforms. (2) A resource reconfiguration is required when an embedded system has to switch between operating modes, resulting in the current cyclic schedule being replaced by another pre-configured and verified cyclic schedule. This paper formalizes a new One-Hop Reconfiguration (OHR) problem tailored for mode-switch-capable embedded systems and introduces a corresponding optimal solution. (3) No RRP-based toolset is currently available for embedded systems. This paper thus presents an optimized RRP toolset tailored for embedded systems. Numerous experiments are conducted to evaluate the efficacy of this toolset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Call for Papers for new Transactions on Emerging Topics in Computing.
- Subjects
- *
MANUSCRIPTS , *MULTIPROGRAMMING (Electronic computers) , *ELECTRONIC data processing , *MULTIPROCESSORS - Abstract
Prospective authors are requested to submit new, unpublished manuscripts for inclusion in the upcoming event described in this call for papers. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
10. Detecting Spectre Attacks Using Hardware Performance Counters.
- Author
-
Li, Congmiao and Gaudiot, Jean-Luc
- Subjects
HARDWARE ,DATA security failures ,SOFTWARE upgrades - Abstract
Spectre attacks can be catastrophic and widespread because they exploit common design flaws caused by the speculative capabilities in modern processors to leak sensitive data through side channels. Completely fixing the problem would require a redesign of the architecture for transient execution or the implementation of a new design on re-configurable hardware. However, such fixes cannot be backported to old machines with fixed hardware design. Completely replacing those machines will take a long time. Moreover, existing software patches may cause significant performance overhead. This paper proposes to detect Spectre by monitoring deviations in microarchitectural events using hardware performance counters with promising accuracy above 90 percent under a variety of workload conditions. However, the attacker may attempt to evade detection by slowing down the attack or mimicking benign programs. This paper thus compares different evasion strategies quantitatively and demonstrates that it is possible for the attacker to avoid detection when operating the attacks at a lower speed while maintaining a reasonable attack success rate. Then, we show that, in order to resist evasion, the original detector must be enhanced by randomly switching between a set of detectors using different features and sampling periods so we can keep the detection accuracy above 80 percent. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Constructing Completely Independent Spanning Trees in a Family of Line-Graph-Based Data Center Networks.
- Author
-
Wang, Yifeng, Cheng, Baolei, Qian, Yu, and Wang, Dajin
- Subjects
SPANNING trees ,COMPLETE graphs ,GENEALOGY ,ARCHITECTURAL models ,BIPARTITE graphs ,DATA warehousing ,SERVER farms (Computer network management) - Abstract
The past decade has seen growing importance being attached to the Completely Independent Spanning Trees (CISTs). The CISTs can facilitate many network functionalities, and the existence and construction schemes of CISTs in various networks can be an indicator of the network's robustness. In this paper, we establish the number of CISTs that can be constructed in the line graph of the complete graph $K_n$ K n (denoted $L(K_n)$ L (K n) , for $n\geq 4$ n ≥ 4 ), and present an algorithm to construct the optimal (i.e., maximal) number of CISTs in $L(K_n)$ L (K n) . The $L(K_n)$ L (K n) is a special class of SWCube [13], an architectural model proposed for data center networks. Our construction algorithm is also implemented to verify its validity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Call for Papers.
- Published
- 1981
- Full Text
- View/download PDF
13. Handling Transients of Dynamic Real-Time Workload Under EDF Scheduling.
- Author
-
Casini, Daniel, Biondi, Alessandro, and Buttazzo, Giorgio
- Subjects
SCHEDULING ,TRANSIENT analysis - Abstract
Real-time dynamic workload consists of tasks that can arbitrarily join and leave the system at run-time. To avoid incurring deadline misses, tasks that request to join the system must pass an admission test, which has to cope with potential scheduling transients originated by the residual effect of the tasks that previously left the system. This phenomenon may require some tasks to suffer an admission delay before being accepted for execution. This paper focuses on uniprocessor earliest-deadline first (EDF) scheduling with constrained deadlines and explicitly considers methods for handling scheduling transients in the presence of dynamic real-time workload. A generalized analysis framework is first presented to overcome several limitations of the existing approaches (including the support for overlapping transients), and is then used to derive methods for computing bounds on the admission delays incurred by tasks. Building on such results, an on-line protocol is proposed to handle the admission control of a dynamic workload, which also comes with a variant that can execute in polynomial time to favor its practical application. Furthermore, the paper shows how the presented analysis can be used off-line for analyzing mode-changes among static task sets. Experimental results are finally presented to evaluate the proposed algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. B70-6 Computing Methods in Optimization Problems.
- Author
-
Tabak, D.
- Published
- 1970
- Full Text
- View/download PDF
15. Call for Papers.
- Published
- 1980
- Full Text
- View/download PDF
16. Call For Papers: Networks-on-Chip.
- Subjects
- *
MULTICORE processors , *ENERGY consumption , *SEMICONDUCTORS , *INTEGRATED circuit interconnections , *MANUSCRIPTS , *SWITCHING circuits - Published
- 2011
- Full Text
- View/download PDF
17. Call For Papers: System-Level Design and Validation of Heterogeneous Chip Multiprocessors.
- Subjects
- *
SYSTEMS design , *HETEROGENEOUS computing , *INTEGRATED circuits , *MULTIPROCESSORS , *MULTICORE processors , *GRAPHICS processing units , *ADAPTIVE computing systems - Published
- 2011
- Full Text
- View/download PDF
18. Call For Papers: Energy Efficient Computing.
- Subjects
- *
ENERGY consumption , *USER-centered system design , *PERFORMANCE evaluation , *CALORIC expenditure , *BIOTIC communities , *MANUSCRIPTS , *ENERGY management - Published
- 2011
- Full Text
- View/download PDF
19. Call for Papers: Special Section on Computer Arithmetic.
- Subjects
- *
COMPUTER science periodicals , *PERIODICAL publishing , *COMPUTER arithmetic , *ALGORITHMS , *DIGITAL signal processing , *CRYPTOGRAPHY , *MULTIMEDIA communications - Published
- 2011
- Full Text
- View/download PDF
20. Technical Paper Referees.
- Published
- 1971
- Full Text
- View/download PDF
21. Call For Papers.
- Published
- 1984
- Full Text
- View/download PDF
22. Hardware-Assisted Malware Detection and Localization Using Explainable Machine Learning.
- Author
-
Pan, Zhixin, Sheldon, Jennifer, and Mishra, Prabhat
- Subjects
RECURRENT neural networks ,MACHINE learning ,MALWARE ,ANTIVIRUS software ,DECISION trees ,COMPUTER systems - Abstract
Malicious software, popularly known as malware, is widely acknowledged as a serious threat to modern computing systems. Software-based solutions, such as anti-virus software (AVS), are not effective since they rely on matching patterns that can be easily fooled by carefully crafted malware with obfuscation or other deviation capabilities. While recent malware detection methods provide promising results through an effective utilization of hardware features, the detection results cannot be interpreted in a meaningful way. In this paper, we propose a hardware-assisted malware detection framework using explainable machine learning. This paper makes three important contributions. First, we theoretically establish that our proposed method can provide an interpretable explanation of classification results to address the challenge of transparency. Next, we show that the explainable outcome through effective utilization of hardware performance counters and embedded trace buffer can lead to accurate localization of malicious behavior. Finally, we have performed efficiency versus accuracy trade-off analysis using decision tree and recurrent neural networks. Extensive evaluation using a wide variety of real-world malware dataset demonstrates that our framework can produce accurate and human-understandable malware detection results with provable guarantees. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Spiking Generative Adversarial Networks With a Neural Network Discriminator: Local Training, Bayesian Models, and Continual Meta-Learning.
- Author
-
Rosenfeld, Bleema, Simeone, Osvaldo, and Rajendran, Bipin
- Subjects
GENERATIVE adversarial networks ,ACTION potentials ,ARTIFICIAL neural networks ,ITERATIVE learning control ,STIMULUS & response (Psychology) ,PSYCHOLOGICAL feedback ,HYBRID power systems ,LEARNING strategies - Abstract
Neuromorphic data carries information in spatio-temporal patterns encoded by spikes. Accordingly, a central problem in neuromorphic computing is training spiking neural networks (SNNs) to reproduce spatio-temporal spiking patterns in response to given spiking stimuli. Most existing approaches model the input-output behavior of an SNN in a deterministic fashion by assigning each input to a specific desired output spiking sequence. In contrast, in order to fully leverage the time-encoding capacity of spikes, this work proposes to train SNNs so as to match distributions of spiking signals rather than individual spiking signals. To this end, the paper introduces a novel hybrid architecture comprising a conditional generator, implemented via an SNN, and a discriminator, implemented by a conventional artificial neural network (ANN). The role of the ANN is to provide feedback during training to the SNN within an adversarial iterative learning strategy that follows the principle of generative adversarial network (GANs). In order to better capture multi-modal spatio-temporal distribution, the proposed approach – termed SpikeGAN – is further extended to support Bayesian learning of the generator's weight. Finally, settings with time-varying statistics are addressed by proposing an online meta-learning variant of SpikeGAN. Experiments bring insights into the merits of the proposed approach as compared to existing solutions based on (static) belief networks and maximum likelihood (or empirical risk minimization). In our experiments, handwritten digit images generated by SpikeGAN are observed to train an ANN classifier with $20\%$ 20 % higher accuracy than a comparable belief network. Our experiments also demonstrate the use of SpikeGAN to generate neuromorphic data sets from handwritten digits. It is shown that these data can be used to train an SNN classifier that achieves an accuracy level approaching the baseline accuracy of an SNN classifier trained on rate-encoded real data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Accelerating Address Translation for Virtualization by Leveraging Hardware Mode.
- Author
-
Sha, Sai, Zhang, Yi, Luo, Yingwei, Wang, Xiaolin, and Wang, Zhenlin
- Subjects
WALKING speed ,HARDWARE ,VIRTUAL machine systems - Abstract
The overhead of memory virtualization remains nontrivial. The traditional shadow paging (TSP) resorts to a shadow page table (SPT) to achieve the native page walk speed, but page table updates require hypervisor interventions. Alternatively, nested paging enables low-overhead page table updates, but utilizes the hardware MMU to perform a long-latency two-dimensional page walk. This paper proposes new memory virtualization solutions based on hardware (machine) mode—the highest CPU privilege level in some architectures like Sunway and RISC-V. A programming interface, running in hardware mode, enables software-implementation of hardware support functions. We first propose Software-based Nested Paging (SNP), which extends the software MMU to perform a two-dimensional page walk in hardware mode. Second, we present Swift Shadow Paging (SSP), which accomplishes page table synchronization by intercepting TLB flushing in hardware mode. Finally we propose Accelerated Shadow Paging (ASP) combining SSP and SNP. ASP handles the last-level SPT page faults by walking two-dimensional page tables in hardware mode, which eliminates most hypervisor interventions. This paper systematically compares multiple memory virtualization models by analyzing their designs and evaluating their performance both on a real system and a simulator. The experiments show that the virtualization overhead of ASP is less than 4.5% for all workloads. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Multiplicative Complexity of XOR Based Regular Functions.
- Author
-
Bernasconi, Anna, Cimato, Stelvio, Ciriani, Valentina, and Molteni, Maria Chiara
- Subjects
BOOLEAN functions ,TECHNOLOGICAL innovations ,LOGIC design ,LOGIC circuits - Abstract
XOR-AND Graphs (XAGs) are an enrichment of the classical AND-Inverter Graphs (AIGs) with XOR nodes. In particular, XAGs are networks composed by ANDs, XORs, and inverters. Besides several emerging technologies applications, XAGs are often exploited in cryptography-related applications based on the multiplicative complexity of a Boolean function. The multiplicative complexity of a function is the minimum number of AND gates (i.e., multiplications) that are sufficient to represent the function over the basis {AND, XOR, NOT}. In fact, the minimization of the number of AND gates is important for high-level cryptography protocols such as secure multiparty computation, where processing AND gates is more expensive than processing XOR gates. Moreover, it is an indicator of the degree of vulnerability of the circuit, as a small number of AND gates corresponds to a high vulnerability to algebraic attacks. In this paper we study the multiplicative complexity of Boolean functions characterized by two particular regularities, called autosymmetry and D-reducibility. Moreover, we exploit these regularities for decreasing the number of AND nodes in XAGs. The experimental results validate the proposed approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. A Reputation-Based Mechanism for Transaction Processing in Blockchain Systems.
- Author
-
Zhang, Jiarui, Cheng, Yukun, Deng, Xiaotie, Wang, Bo, Xie, Jan, Yang, Yuanyuan, and Zhang, Mengqian
- Subjects
BLOCKCHAINS ,SPAM email ,PEER-to-peer architecture (Computer networks) ,DENIAL of service attacks ,REPUTATION ,MULTICASTING (Computer networks) - Abstract
Blockchain protocols require nodes to verify all received transactions before forwarding them. However, massive spam transactions cause the participants in blockchain systems to consume many resources in verifying and propagating transactions. This paper proposes a reputation-based mechanism to increase the efficiency of processing transactions by considering the reputations of the sending nodes. Reputations are in turn adjusted based on the quality of transaction processing. Our proposed reputation-based mechanism offers three main contributions. First, we modify the verification strategy so that nodes set a probability of verifying a received transaction considering the likelihood of it being spam: transactions from a node with a low reputation have a high probability of being verified. Second, we optimize the transaction forwarding protocol to reduce propagation delay by prioritizing forwarding transactions to reputable receivers. Third, we design a data request protocol that provides alternative data exchange methods for nodes with different reputations. A series of simulations demonstrate the performance of our reputation-based mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Call for Papers.
- Published
- 1982
- Full Text
- View/download PDF
28. Call for Papers Special Issue on Supersystems.
- Published
- 1981
- Full Text
- View/download PDF
29. Call for Papers.
- Published
- 1980
- Full Text
- View/download PDF
30. Call for Papers.
- Published
- 1980
- Full Text
- View/download PDF
31. Evaluation of Cache Attacks on Arm Processors and Secure Caches.
- Author
-
Deng, Shuwen, Matyunin, Nikolay, Xiong, Wenjie, Katzenbeisser, Stefan, and Szefer, Jakub
- Subjects
ARM microprocessors ,CACHE memory ,RADIO frequency - Abstract
Timing-based side and covert channels in processor caches continue to be a threat to modern computers. This work shows for the first time, a systematic, large-scale analysis of Arm devices and the detailed results of attacks the processors are vulnerable to. Compared to x86, Arm uses different architectures, microarchitectural implementations, cache replacement policies, etc., which affects how attacks can be launched, and how security testing for the vulnerabilities should be done. To evaluate security, this paper presents security benchmarks specifically developed for testing Arm processors and their caches. The benchmarks are evaluated with sensitivity tests, which examine how sensitive the benchmarks are to having a correct configuration in the testing phase. Further, to evaluate a large number of devices, this work leverages a novel approach of using a cloud-based Arm device testbed for architectural and security research on timing channels and runs the benchmarks on 34 different physical devices. In parallel, there has been much interest in secure caches to defend the various attacks. Consequently, this paper also investigates secure cache architectures using proposed benchmarks. Especially, this paper implements and evaluates secure PL and RF caches, showing the security of PL and RF caches, but also uncovers new weaknesses. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Memory-Aware Denial-of-Service Attacks on Shared Cache in Multicore Real-Time Systems.
- Author
-
Bechtel, Michael and Yun, Heechul
- Subjects
DENIAL of service attacks ,SHARED workspaces ,RANDOM access memory ,MULTICORE processors ,MICROELECTROMECHANICAL systems - Abstract
In this paper, we identify that memory performance plays a crucial role in the feasibility and effectiveness for performing denial-of-service attacks on shared cache. Based on this insight, we introduce new cache DoS attacks, which can be mounted from the user-space and can cause extreme worst-case execution time (WCET) impacts to cross-core victims—even if the shared cache is partitioned—by taking advantage of the platform’s memory address mapping information and HugePage support. We deploy these enhanced attacks on two popular embedded out-of-order multicore platforms using both synthetic and real-world benchmarks. The proposed DoS attacks achieve up to 111X WCET increases on the tested platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. $\mathsf{MC{-}FLEX}$ MC - FLEX : Flexible Mixed-Criticality Real-Time Scheduling by Task-Level Mode Switch.
- Author
-
Lee, Jaewoo and Lee, Jinkyu
- Subjects
SCHEDULING ,AUTOMOTIVE engineering ,TASK analysis - Abstract
Mixed-criticality (MC) scheduling becomes popular in real-time systems as it supports different criticality levels in a resource-efficient manner. Although it has been well established (i) how to guarantee MC schedulability offline, existing studies have paid less attention to achieve (ii) how to minimize deadline misses of low-criticality tasks at runtime; in addition, it has not matured yet how to address (ii) without compromising (i). In this paper, we propose $\mathsf{MC{-}FLEX}$ MC - FLEX , which employs a task-level mode transition mechanism (as opposed to system-level one). $\mathsf{MC{-}FLEX}$ MC - FLEX not only determines time instants at which each high-criticality task enters and exits the critical mode in a task level, but also selects time instants and target low-criticality task(s) to be dropped and resumed for each task-level mode switch of individual high-criticality tasks, yielding the achievement of both (i) and (ii). Via simulation results, we demonstrate that the proposed framework reduces the job deadline miss ratio of low-criticality tasks at runtime (by over 54.8% compared to the existing work), without compromising offline MC schedulability. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Learned FBF: Learning-Based Functional Bloom Filter for Key–Value Storage.
- Author
-
Byun, Hayoung and Lim, Hyesook
- Subjects
CONVOLUTIONAL neural networks ,CRANES (Birds) ,DATA structures - Abstract
As a challenging attempt to replace a traditional data structure with a learned model, this paper proposes a learned functional Bloom filter (L-FBF) for a key–value storage. The learned model in the proposed L-FBF learns the characteristics and the distribution of given data and classifies each input. It is shown through theoretical analysis that the L-FBF provides a lower search failure rate than a single FBF in the same memory size, while providing the same semantic guarantees. For model training, character-level neural networks are used with pretrained embeddings. In experiments, four types of different character-level neural networks are trained: a single gated recurrent unit (GRU), two GRUs, a single long short-term memory (LSTM), and a single one-dimensional convolutional neural network (1D-CNN). Experimental results prove the validity of theoretical results, and show that the L-FBF reduces the search failures by 82.8% to 83.9% when compared with a single FBF under the same amount of memory used. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Efficient, Flexible, and Constant-Time Gaussian Sampling Hardware for Lattice Cryptography.
- Author
-
Karabulut, Emre, Alkim, Erdem, and Aysu, Aydin
- Subjects
RSA algorithm ,GAUSSIAN distribution ,DIGITAL signatures ,SEARCH algorithms ,HARDWARE ,SAMPLING (Process) ,CRYPTOGRAPHY - Abstract
This paper proposes a discrete Gaussian sampling hardware design that can flexibly support different sampling parameters, that is more efficient (in area-delay product) compared to the majority of earlier proposals, and that has constant execution time. The proposed design implements a Cumulative Distribution Table (CDT) approach, reduces the table size with Gaussian convolutions, and adopts an innovative fusion tree search algorithm to achieve a compact and fast sampling technique—to our best knowledge, this is the first hardware implementation of fusion tree search algorithm. The proposed hardware can support all the discrete Gaussian distributions used in post-quantum digital signatures and key encapsulation algorithms (FALCON, qTESLA, and FrodoKEM), the homomorphic encryption library of SEAL, and other algorithms such BLISS digital signature and LP public-key encryption. Our proposed hardware can be configured at design-time to optimize a single configuration or at run-time to support multiple Gaussian distribution parameters. Our design, furthermore, has constant-time behavior by design, eliminating timing side-channel attacks—this is achieved by reading all table contents at the same time to also reduce the latency. The results on a Xilinx Virtex-7 FPGA show that our solution can outperform all prior proposals in area-delay product by 1.67–235.88×, only falling short to those designed for the LP encryption scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Real-Time Task Scheduling for Machine Perception in Intelligent Cyber-Physical Systems.
- Author
-
Liu, Shengzhong, Yao, Shuochao, Fu, Xinzhe, Shao, Huajie, Tabish, Rohan, Yu, Simon, Bansal, Ayoosh, Yun, Heechul, Sha, Lui, and Abdelzaher, Tarek
- Subjects
CYBER physical systems ,ARTIFICIAL intelligence ,ARTIFICIAL neural networks ,PARTITIONS (Building) ,RESOURCE allocation - Abstract
This paper explores criticality-based real-time scheduling of neural-network-based machine inference pipelines in cyber-physical systems (CPS) to mitigate the effect of algorithmic priority inversion. We specifically focus on the perception subsystem, an important subsystem feeding other components (e.g., planning and control). In general, priority inversion occurs in real-time systems when computations that are of lower priority are performed together with or ahead of those that are of higher priority. In current machine perception software, significant priority inversion occurs because resource allocation to the underlying neural network models does not differentiate between critical and less critical data within a scene. To remedy this problem, in recent work, we proposed an architecture to partition the input data into regions of different criticality, then formulated a utility-based optimization problem to batch and schedule their processing in a manner that maximizes confidence in perception results, subject to criticality-based time constraints. This journal extension matures the work in several directions: (i) We extend confidence maximization to a generalized utility optimization formulation that accounts for criticality in the utility function itself, offering finer-grained control over resource allocation within the perception pipeline; (ii) we further instantiate and compare two different criticality metrics (distance-based and relative velocity-based) to understand their relative advantages; and (iii) we explore the limitations of the approach, specifically how inaccuracies in criticality-based attention cueing affect performance. All experiments are conducted on the NVIDIA Jetson AGX Xavier platform with a real-world driving dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. A Data Layout and Fast Failure Recovery Scheme for Distributed Storage Systems With Mixed Erasure Codes.
- Author
-
Xu, Liangliang, Lyu, Min, Li, Zhipeng, Li, Cheng, and Xu, Yinlong
- Subjects
DATA recovery ,STORAGE ,FAULT tolerance (Engineering) ,LINEAR network coding - Abstract
Erasure coding becomes increasingly popular in distributed storage systems (DSSes) for providing high reliability with low storage overhead. However, traditional random data placement induces massive cross-rack traffic and severely imbalanced load during failure recovery, which degrades the recovery performance significantly. In addition, various erasure codes coexisting in a DSS exacerbates the above problems. In this paper, we propose PDL, a PBD-based Data Layout, to optimize failure recovery performance in DSSes. PDL is constructed based on Pairwise Balanced Design, a combinatorial design scheme with uniform mathematical properties, and thus presents a uniform data layout for mixed erasure codes. Then we propose rPDL, a failure recovery scheme based on PDL. rPDL reduces cross-rack traffic effectively and provides nearly balanced cross-rack traffic distribution by uniformly choosing replacement nodes and retrieving determined available blocks to recover the lost blocks. We implemented PDL and rPDL in Hadoop 3.1.1. Compared with the existing data layout and recovery scheme in HDFS, experimental results show that rPDL achieves much higher recovery throughput, $6.27\times$ 6. 27 × on average for single-node failures, $5.14\times$ 5. 14 × for multi-node failures and $1.48\times$ 1. 48 × for single-rack failures, respectively. It also reduces degraded read latency by an average of 62.83 percent, and provides better support to front-end applications in case of component failures. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. The Security of ARM TrustZone in a FPGA-Based SoC.
- Author
-
Benhani, E. M., Bossuet, L., and Aubert, A.
- Subjects
SYSTEMS on a chip ,SMART devices ,FIELD programmable gate arrays ,SOFTWARE architecture - Abstract
Cybersecurity of embedded systems has become a major challenge for the development of the Internet of Things, of Cloud computing and other trendy applications without devoting a significant part of the design budget to industrial players. Technologies like TrustZone, provided by ARM, support a Trusted Execution Environment (TEE) software architecture and are inexpensive integrated solutions. While this technology allows isolation and secure execution of critical software applications (e.g., banking), recent preliminary works highlighted some security breaches or limitations when the ARM processors are embedded in a FPGA-based heterogeneous SoCs such as the Xilinx Zynq or Intel SoC FPGA devices. This paper highlights the security issue of such complex SoCs and details six efficient attacks on the ARM TrustZone extension in the SoC. A prototype system design on a Xilinx Zynq SoC is the target of the attacks presented in this paper but they could be adapted to other SoCs. This paper also includes recommendations and security solutions to design a trustworthy embedded system with a FPGA-based heterogeneous SoC. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Efficient Mitchell’s Approximate Log Multipliers for Convolutional Neural Networks.
- Author
-
Kim, Min Soo, Barrio, Alberto A. Del, Oliveira, Leonardo Tavares, Hermida, Roman, and Bagherzadeh, Nader
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,ENERGY consumption ,DESIGN techniques ,COMPUTER vision - Abstract
This paper proposes energy-efficient approximate multipliers based on the Mitchell's log multiplication, optimized for performing inferences on convolutional neural networks (CNN). Various design techniques are applied to the log multiplier, including a fully-parallel LOD, efficient shift amount calculation, and exact zero computation. Additionally, the truncation of the operands is studied to create the customizable log multiplier that further reduces energy consumption. The paper also proposes using the one's complements to handle negative numbers, as an approximation of the two's complements that had been used in the prior works. The viability of the proposed designs is supported by the detailed formal analysis as well as the experimental results on CNNs. The experiments also provide insights into the effect of approximate multiplication in CNNs, identifying the importance of minimizing the range of error.The proposed customizable design at $w$w = 8 saves up to 88 percent energy compared to the exact fixed-point multiplier at 32 bits with just a performance degradation of 0.2 percent for the ImageNet ILSVRC2012 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. Predicting the Effect of Memory Contention in Multi-Core Computers Using Analytic Performance Models.
- Author
-
Bardhan, Shouvik and Menasce, Daniel A.
- Subjects
MULTICORE processors ,COMPUTER storage devices ,PARAMETERS (Statistics) ,PREDICTION models ,COMPUTER performance ,QUEUING theory - Abstract
Analyzing and predicting the performance of applications that run on multi-core computers is essential. This paper demonstrates experimentally that memory contention resulting from multiple cores accessing shared memory resources can become a significant component (i.e., over 50 percent) of an application’s execution time. The paper develops single- and multi-class analytic performance models for predicting the effect of memory contention on a job’s execution time. The models consider local and remote memory as in NUMA architectures. Model validation was done using a micro-benchmark and programs from HBench, UnixBench, and SPECCPU2006 running on machines with 4, 12, and 16 cores. The paper shows how to derive the model parameters and demonstrates that there is a significant difference in predicted values when memory contention is ignored. For example, a model that ignores memory contention predicts an average execution time about four times smaller than the experimental value for a concurrency level of 18 while the model with memory contention predicts a value that is 90 percent of the experimental value for the same concurrency level. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
41. Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control.
- Author
-
Horvath, Tibor, Abdelzaher, Tarek, Skadron, Kevin, and Xue Liu
- Subjects
ELECTRIC potential ,INTERNET servers ,DATABASES ,DATA pipelining ,WEB services ,HTTP (Computer network protocol) ,COMPUTER architecture ,COMPUTER algorithms ,REAL-time computing - Abstract
The energy and cooling costs of Web server farms are among their main financial expenditures. This paper explores the benefits of dynamic voltage scaling (DVS) for power management in server farms. Unlike previous work, which addressed DVS on individual servers and on load-balanced server replicas, this paper addresses DVS in multistage service pipelines. Contemporary Web server installations typically adopt a three-tier architecture in which the first tier presents a Web interface, the second executes scripts that implement business logic, and the third serves database accesses. From a user's perspective, only the end-to-end response across the entire pipeline is relevant. This paper presents a rigorous optimization methodology and an algorithm for minimizing the total energy expenditure of the multistage pipeline subject to soft end-to-end response-time constraints. A distributed power management service is designed and evaluated on a real three-tier server prototype for coordinating DVS settings in a way that minimizes global energy consumption while meeting end-to-end delay constraints. The service is shown to consume as much as 30 percent less energy compared to the default (Linux) energy saving policy. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
42. Schedulability Analysis for Real-Time Task Set on Resource with Performance Degradation and Dual-Level Periodic Rejuvenations.
- Author
-
Hua, Xiayu, Guo, Chunhui, Wu, Hao, Lautner, Douglas, and Ren, Shangping
- Subjects
COMPUTER systems ,REAL-time computing ,REAL-time programming ,ELECTRONIC data processing ,MONOTONIC functions - Abstract
Researches in real-time scheduling often assume that the performance of a computing resource does not change over time. However, as system softwares and system architectures become increasingly complex, resource performance degradation over time becomes more evident. In this paper, we study the schedulability of a hard real-time task set on a resource which has performance degradation over time with a known pattern and use both cold and warm periodic rejuvenations as countermeasures. Such resource model is referred to as $P^2D$
-resource model forp erformance degradation andp eriodic rejuvenation withd ual-levels. In this paper, we study (1) the formal specification of the $P^2D$- Published
- 2017
- Full Text
- View/download PDF
43. Reducing the Memory Bandwidth Overheads of Hardware Security Support for Multi-Core Processors.
- Author
-
Lee, Junghoon, Kim, Taehoon, and Huh, Jaehyuk
- Subjects
BANDWIDTH allocation ,OVERHEAD costs ,COMPUTER security ,COMPUTER performance ,DATA integrity - Abstract
To prevent physical attacks on systems, secure processors have been proposed to reduce trusted computing base to the processor itself. In a secure processor, all off-chip data are encrypted and their integrity is protected. This paper investigates how the limited memory bandwidth of multi-core processors affects the design of secure processors. Although the performance of a single-core secure processor has improved significantly with the counter-mode encryption combined with Bonsai Merkle Tree, our results indicate that multi-core secure processors can suffer from significant performance degradation due to the limited memory bandwidth. To mitigate the performance overheads, this paper proposes three techniques for the multi-core design of secure processors. First, the paper advocates to use a combined cache for all normal and security-supporting data. Second, the paper proposes memory scheduling and mapping schemes for secure processors. Finally, the paper investigates a type-aware cache insertion scheme considering the distinct characteristics of normal and security-supporting data. Our simulation results show that the combined techniques reduce the performance degradation for supporting full confidentiality and integrity, from 25-34 percent to less than 8-14 percent in 8-core and 16-core secure processors, with minimal extra hardware costs. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
44. Decimal Multiformat Online Addition.
- Author
-
Garcia-Vega, Carlos, Gonzalez-Navarro, Sonia, Chica, Pedro Balboa-La, and Villalba-Moreno, Julio
- Subjects
ADDERS (Digital electronics) ,GRAPHICS processing units ,COMPUTER architecture ,REDUNDANCY in engineering ,NUMERICAL calculations ,CODE converters - Abstract
This paper presents and analyzes two different strategies for designing multiformat online decimal adders (olDFA
Mformat ). The first strategy uses a code conversion stage plus an online Decimal Full Adder (olDFA); the second one involves designing specific adders by modifying the architecture of the olDFA. These strategies are applied in the design of specific architectures to deal with financial analysis calculations. We use synthesis results to verify the theoretical aspects of the designs and to analyze the robustness and lacks of the strategies. The guidelines presented in the paper are valuable to designers of online multiformat-based solutions. [ABSTRACT FROM AUTHOR]- Published
- 2016
- Full Text
- View/download PDF
45. Malware Analysis by Combining Multiple Detectors and Observation Windows.
- Subjects
DETECTORS ,FEATURE extraction - Abstract
Malware developers continually attempt to modify the execution pattern of malicious code hiding it inside apparent normal applications, which makes its detection and classification challenging. This article proposes an ensemble detector, which exploits the capabilities of the main analysis algorithms proposed in the literature designed to offer greater resilience to specific evasion techniques. In particular, the article presents different methods to optimally combine both generic and specialized detectors during the analysis process, which can be used to increase the unpredictability of the detection strategy, as well as improve the detection rate in presence of unknown malware families and provide better detection performance in the absence of a constant re-training of detector needed to cope with the evolution of malware. The paper also presents an alpha-count mechanism that explores how the length of the observation time window can affect the detection accuracy and speed of different combinations of detectors during the malware analysis. An extended experimental campaign has been conducted on both an open-source sandbox and an Android smartphone with different malware datasets. A trade-off among performance, training time, and mean-time-to-detect is presented. Finally, a comparison with other ensemble detectors is also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. Polynomial Computation Using Unipolar Stochastic Logic and Correlation Technique.
- Author
-
Chu, Shao-I, Wu, Chi-Long, Nguyen, Tu N., and Liu, Bing-Hong
- Subjects
DISTRIBUTION (Probability theory) ,POLYNOMIALS ,RANDOM numbers ,LOGIC ,LOGIC circuits ,BINARY sequences - Abstract
This article addresses polynomial computation with unipolar stochastic logic by exploiting correlation between the bit-streams. The AND-OR, double-NAND, OR-AND and double-NOR circuits are presented for polynomials with all positive coefficients whose sum is less than or equal to one by mathematically analyzing the joint probability distribution of coefficient bit-streams. The NAND-AND expansion is also developed for polynomials with alternatively positive and negative coefficients whose absolute values are decreasing by applying the same idea. Unlike the original methods with multiple uncorrelated random number sources (RNSs) for coefficient bit-stream generation, the presented methods only require a single RNS. Since the RNSs take up huge hardware resource in stochastic circuits, the proposed RNS-sharing techniques for polynomial computation result in a significant reduction of hardware complexity. For the factorization technique in the general polynomials, this paper enhances the original stochastic designs for the second-order polynomial and further presents the simple correlation-dependent circuits. Results show that the proposed architectures are superior to the previous ones by reducing the total number of RNSs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Multi-Bank On-Chip Memory Management Techniques for CNN Accelerators.
- Author
-
Kang, Duseok, Kang, Donghyun, and Ha, Soonhoi
- Subjects
CONVOLUTIONAL neural networks ,MEMORY ,DATABASES - Abstract
Since off-chip DRAM access affects both performance and power consumption significantly, convolutional neural network (CNN) accelerators commonly aim to maximize data reuse in on-chip memory. By organizing the on-chip memory to multiple banks, we may hide off-chip DRAM access delay by prefetching data to unused banks during computation. When and where to prefetch data and how to reuse the feature map data between layers define the multi-bank on-chip memory management (MOMM) problem. In this paper, we propose compiler techniques to solve the MOMM problem with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. Compared with the baseline approach that does not reuse the feature map between layers, we could reduce the DRAM access volume and the processing delay up to 55.0 and 79.4 percent, respectively. Moreover, we extend the proposed techniques to consider layer fusion that aims to reuse feature maps between layers. Experiment results confirm the superiority of the proposed hybrid fusion technique to the per-layer processing technique and the pure fusion technique. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. New Results on Non-Normalized Floating-Point Formats.
- Author
-
Gonzalez-Navarro, Sonia and Hormigo, Javier
- Subjects
FIELD programmable gate arrays - Abstract
Compulsory normalization of the represented numbers is a key requirement of the floating-point standard. This requirement contributes to fundamental characteristics of the standard, such as taking the most of the precision available, reproducibility and facilitation of comparison and other operations. However, it also imposes a high restriction in effectiveness of basic arithmetic operation implementation. In many embedded applications may be worth to sacrifice the benefits of normalization for gaining in implementation metrics. This paper analyzes and measures the effect of removing the normalization requirement in terms of precision and implementation savings for embedded applications. We propose several adder and multiplier architectures to deal with non-normalized floating-point numbers, and quantify the accuracy loss and the improvements in hardware implementation. Our experiments show that it is possible to reduce the area and power consumption up to 78 percent in ASIC and 50 percent in FPGA implementations with a reasonable accuracy loss. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Sharing Logic for Built-In Generationof Functional Broadside Tests.
- Author
-
Pomeranz, Irith
- Subjects
INTEGRATED circuit fault tolerance ,LOGIC circuits ,INFORMATION sharing ,FAULT location (Engineering) ,ENERGY dissipation - Abstract
When built-in test generation is used for a design that can be partitioned into logic blocks, it is advantageous to identify groups of blocks whose tests have similar characteristics, and use the same built-in test generation logic for the blocks in each group. This paper studies this issue for a built-in test generation method that produces functional broadside tests. Functional broadside tests are important for addressing overtesting of delay faults as well as avoiding excessive power dissipation during test application. The paper discusses the design of the test generation logic for a group of logic blocks, and the selection of the groups. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
50. Guest Editors' Introduction - Special Issue on Network-on-Chip.
- Author
-
Ginosar, Ran and Chatha, Karam S.
- Subjects
PERIODICAL editors ,SPECIAL issues of periodicals ,NETWORKS on a chip ,SYSTEMS on a chip ,QUALITY of service - Published
- 2014
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.