214 results on '"Memory pool"'
Search Results
2. An Empirical Study of Memory Pool Based Allocation and Reuse in CUDA Graph
- Author
-
Qian, Ruyi, Gao, Mengjuan, Shi, Qinwen, Xu, Yuanchao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tari, Zahir, editor, Li, Keqiu, editor, and Wu, Hongyi, editor
- Published
- 2024
- Full Text
- View/download PDF
3. A memory-based simulated annealing algorithm and a new auxiliary function for the fixed-outline floorplanning with soft blocks.
- Author
-
Zou, Dexuan, Wang, Gai-Ge, Sangaiah, Arun K., and Kong, Xiangyong
- Abstract
A memory-based simulated annealing (MSA) algorithm is proposed for the fixed-outline floorplanning with soft blocks. MSA constructs a memory pool to store some historical best solutions. Moreover, it adopts a real-time monitoring strategy to check whether a solution has been trapped in a local optimum. In case a solution encounters this predicament, it will be replaced by the one from the memory pool, and the current temperature will be regenerated by continuously perturbing the new solution several times. To meet the fixed-outline requirements, a new auxiliary function is formulated based on the geometric structure of the current floorplan, and it is very helpful in driving MSA to search towards potential solution space. Concretely, the area information of all violated blocks is utilized to construct an auxiliary function. Moreover, the excessive area of a violated block can be weighted by three different coefficients, which depend on the relative position of the block and the fixed-outline. Additionally, due to its simple topology and strong applicability, B ⋆ -tree representation is employed to perturb a solution in each generation. The efficiency of the proposed method is demonstrated on six GSRC floorplan benchmark examples with various white space and aspect ratios. Two groups of Matlab simulations show that our approach can achieve better floorplanning results and satisfy both the fixed-outline and non-overlapping constraints while optimizing circuit performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Real-Time Operating Systems
- Author
-
Ünsalan, Cem, Gürhan, Hüseyin Deniz, Yücel, Mehmet Erkin, Ünsalan, Cem, Gürhan, Hüseyin Deniz, and Yücel, Mehmet Erkin
- Published
- 2022
- Full Text
- View/download PDF
5. Gen-Z memory pool system implementation and performance measurement
- Author
-
Won-ok Kwon, Song-Woo Sok, Chan-ho Park, Myeong-Hoon Oh, and Seokbin Hong
- Subjects
cxl ,gen-z ,memory centric computing ,memory pool ,pmdk ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an 86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.
- Published
- 2022
- Full Text
- View/download PDF
6. Preventing DDoS Attacks on Bitcoin Memory Pool by the Dynamic Fee Threshold Mechanism
- Author
-
Luo, Shunchao, Sang, Yingpeng, Song, Mingyang, Zeng, Yuying, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zhang, Yong, editor, Xu, Yicheng, editor, and Tian, Hui, editor
- Published
- 2021
- Full Text
- View/download PDF
7. Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network
- Author
-
Linyong Huang, Zhe Zhang, Shuangchen Li, Dimin Niu, Yijin Guan, Hongzhong Zheng, and Yuan Xie
- Subjects
Graph neural network ,large-scale graph processing ,memory pool ,near data processing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Graph Neural Networks have drawn tremendous attention in the past few years due to their convincing performance and high interpretability in various graph-based tasks like link prediction and node classification. With the ever-growing graph size in the real world, especially for industrial graphs at a billion-level, the storage of graphs can easily consume Terabytes so that the process of GNNs has to be processed in a distributed manner. As a result, the execution could be inefficient due to the expensive cross-node communication and irregular memory access. Various GNN accelerators have been proposed for efficient GNN processing. They, however, mainly focused on small and medium-size graphs, which is not applicable to large-scale distributed graphs. In this paper, we present a practical Near-Data-Processing architecture based on a memory-pool system for large-scale distributed GNNs. We propose a customized memory fabric interface to construct the memory pool for low-latency and high throughput cross-node communication, which can provide flexible memory allocation and strong scalability. A practical Near-Data-Processing design is proposed for efficient work offloading and bandwidth utilization improvement. Moreover, we also introduce a partition and scheduling scheme to further improve performance and achieve workload balance. Comprehensive evaluations demonstrate that the proposed architecture can achieve up to $27\times $ and $8\times $ higher training speed compared to two state-of-the-art distributed GNN frameworks: Deep Graph Library and $P^{3}$ , respectively.
- Published
- 2022
- Full Text
- View/download PDF
8. Modeling Patient-Specific CAR-T Cell Dynamics: Multiphasic Kinetics via Phenotypic Differentiation.
- Author
-
Paixão, Emanuelle A., Barros, Luciana R. C., Fassoni, Artur C., and Almeida, Regina C.
- Subjects
- *
DYNAMICS , *TREATMENT effectiveness , *HEMATOLOGIC malignancies , *CELL proliferation , *T cells , *CELL lines , *IMMUNOTHERAPY , *PHENOTYPES - Abstract
Simple Summary: We present the first mathematical model to describe the multiphasic dynamical treatment response in CAR-T cell kinetics through the differentiation of functional (distributed and effector), memory, and exhausted phenotypes, integrated with the dynamics of cancer cells. The CAR-T cell kinetics are evaluated for various hematological cancers and therapy outcomes, providing insights into promising parameters for long-term therapy investigation. Chimeric Antigen Receptor (CAR)-T cell immunotherapy revolutionized cancer treatment and consists of the genetic modification of T lymphocytes with a CAR gene, aiming to increase their ability to recognize and kill antigen-specific tumor cells. The dynamics of CAR-T cell responses in patients present multiphasic kinetics with distribution, expansion, contraction, and persistence phases. The characteristics and duration of each phase depend on the tumor type, the infused product, and patient-specific characteristics. We present a mathematical model that describes the multiphasic CAR-T cell dynamics resulting from the interplay between CAR-T and tumor cells, considering patient and product heterogeneities. The CAR-T cell population is divided into functional (distributed and effector), memory, and exhausted CAR-T cell phenotypes. The model is able to describe the diversity of CAR-T cell dynamical behaviors in different patients and hematological cancers as well as their therapy outcomes. Our results indicate that the joint assessment of the area under the concentration-time curve in the first 28 days and the corresponding fraction of non-exhausted CAR-T cells may be considered a potential marker to classify therapy responses. Overall, the analysis of different CAR-T cell phenotypes can be a key aspect for a better understanding of the whole CAR-T cell dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Gen‐Z memory pool system implementation and performance measurement.
- Author
-
Kwon, Won‐ok, Sok, Song‐Woo, Park, Chan‐ho, Oh, Myeong‐Hoon, and Hong, Seokbin
- Subjects
SEMANTIC memory ,MEMORY ,COMPUTER architecture - Abstract
The Gen‐Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen‐Z hardware system configured using Gen‐Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen‐Z memory pool with an optimized character, a block device driver, and a file system for the Gen‐Z hardware was designed. The Gen‐Z IP was targeted to the FPGA, and a 512 GB Gen‐Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen‐Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen‐Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen‐Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Enhanced Bitcoin Protocol with Effective Block Creation and Verification by Trusted Miners
- Author
-
R. Bala and R. Manoharan
- Subjects
anonymity ,decentralization ,double spending ,block withholding ,51% attack ,transaction rate ,memory pool ,block ,Management information systems ,T58.6-58.62 - Abstract
The Distributed nature of Bitcoin introduces security issues that necessitate security-specific enhancements in Bitcoin protocol. Therefore, proposing a method of incorporating criteria check and verification process for miners to participate in the mining process and join the mining pool respectively. The proposed idea mitigates double spending, block withholding, and 51 percentage attacks. In addition, an increase in the rate of Bitcoin users necessitates performance improvement. Hence,proposing an effective approach of refining the existing block creation and verification strategy for improving transaction rate without compromising security.
- Published
- 2020
- Full Text
- View/download PDF
11. Enhanced Bitcoin Protocol with Effective Block Creation and Verification by Trusted Miners.
- Author
-
Bala, R. and Manoharan, R.
- Subjects
BITCOIN ,MINERAL industries ,INTERNET security ,TRANSACTION costs ,IDENTIFICATION - Abstract
The Distributed nature of Bitcoin introduces security issues that necessitate security-specific enhancements in Bitcoin protocol. Therefore, proposing a method of incorporating criteria check and verification process for miners to participate in the mining process and join the mining pool respectively. The proposed idea mitigates double spending, block withholding, and 51 percentage attacks. In addition, an increase in the rate of Bitcoin users necessitates performance improvement. Hence, proposing an effective approach of refining the existing block creation and verification strategy for improving transaction rate without compromising security. [ABSTRACT FROM AUTHOR]
- Published
- 2020
12. Scalasca v2: Back to the Future
- Author
-
Zhukov, Ilya, Feld, Christian, Geimer, Markus, Knobloch, Michael, Mohr, Bernd, Saviankou, Pavel, Niethammer, Christoph, editor, Gracia, José, editor, Knüpfer, Andreas, editor, Resch, Michael M., editor, and Nagel, Wolfgang E., editor
- Published
- 2015
- Full Text
- View/download PDF
13. Exploring Data Analytics Without Decompression on Embedded GPU Systems
- Author
-
Feng Zhang, Onur Mutlu, Xiaoyong Du, Zaifeng Pan, Yanliang Zhou, Xipeng Shen, and Jidong Zhai
- Subjects
Lossless compression ,Speedup ,Computer science ,business.industry ,Memory pool ,Instruction set ,Computational Theory and Mathematics ,Parallel processing (DSP implementation) ,Hardware and Architecture ,Embedded system ,Signal Processing ,Synchronization (computer science) ,Data analysis ,business ,Efficient energy use - Abstract
With the development of computer architecture, even for embedded systems, GPU devices can be integrated, providing outstanding performance and energy efficiency to meet the requirements of different industries, applications, and deployment environments. Data analytics is an important application scenario for embedded systems. Unfortunately, due to the limitation of the capacity of the embedded device, the scale of problems handled by the embedded system is limited. In this paper, we propose a novel data analytics method, called G-TADOC, for efficient text analytics directly on compression on embedded GPU systems. A large amount of data can be compressed and stored in embedded systems, and can be processed directly in the compressed state, which greatly enhances the processing capabilities of the systems. Particularly, G-TADOC has three innovations. First, a novel fine-grained thread-level workload scheduling strategy for GPU threads has been developed, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, a GPU thread-safe memory pool has been developed to handle inconsistency with low synchronization overheads. Third, a sequence-support strategy is provided to maintain high GPU parallelism while ensuring sequence information for lossless compression. Moreover, G-TADOC involves special optimizations for embedded GPUs, such as utilizing the CPU-GPU shared unified memory. Experiments show that G-TADOC provides 13.2× average speedup compared to the state-of-the-art TADOC. G-TADOC also improves performance-per-cost by 2.6× and energy efficiency by 32.5× over TADOC.
- Published
- 2022
14. Gen‐Z memory pool system implementation and performance measurement
- Author
-
Chanho Park, Song-Woo Sok, Wonok Kwon, Seokbin Hong, and Myeong-Hoon Oh
- Subjects
General Computer Science ,Computer science ,business.industry ,Memory pool ,Performance measurement ,Electrical and Electronic Engineering ,business ,Implementation ,Computer hardware ,Electronic, Optical and Magnetic Materials - Published
- 2021
15. Evolutionary shuffled frog leaping with memory pool for parameter optimization
- Author
-
Huiling Chen, Chao Ma, Xuehua Zhao, Ali Asghar Heidari, Hamza Turabieh, Xiaojia Ye, Yun Liu, Chen Chi, and Rongrong Le
- Subjects
Mathematical optimization ,Computer science ,020209 energy ,Population ,Crossover ,Swarm intelligence ,Parameter extraction ,02 engineering and technology ,020401 chemical engineering ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Local search (optimization) ,0204 chemical engineering ,education ,education.field_of_study ,business.industry ,Solar cell ,Inheritance (genetic algorithm) ,Process (computing) ,Memory pool ,Solver ,TK1-9971 ,General Energy ,Photovoltaic models ,Electrical engineering. Electronics. Nuclear engineering ,business - Abstract
According to the manufacturer’s I-V data, we need to obtain the best parameters for assessing the photovoltaic systems. Although much work has been done in this area, it is still challenging to extract model parameters accurately. An efficient solver called SFLBS is developed to deal with this problem, in which an inheritance mechanism based on crossover and mutation is introduced. Specifically, the memory pool for storing historical population information is designed. During the sub-population evolution, the historical population will cross and mutate with the contemporary population with a certain probability, ultimately inheriting information about the dimensions that perform well. This mechanism ensures the population’s quality during the evolution process and effectively improves the local search ability of traditional SFLA. The proposed SFLBS is applied to extract unknown parameters from the single diode model, double diode model, three diode model, and photovoltaic module model. Based on the experimental results, we found that SFLBS has considerable accuracy in extracting the unknown parameters of the PV system problem, and its convergence speed is satisfactory. Moreover, SFLBS is used to evaluate three commercial PV modules under different irradiance and temperature conditions. The experimental results demonstrate that the performance of SFLBS is outstanding compared to some state-of-the-art competing algorithms. Moreover, SFLBS is still a reliable optimization tool despite the complex external environment. This research is supported by an online service for any question or needs to supplementary materials at https://aliasgharheidari.com .
- Published
- 2021
16. Boost C++ Libraries
- Author
-
Koranne, Sandeep and Koranne, Sandeep
- Published
- 2011
- Full Text
- View/download PDF
17. Modeling Patient-Specific CAR-T Cell Dynamics: Multiphasic Kinetics via Phenotypic Differentiation
- Author
-
Emanuelle Paixão, Artur Fassoni, Luciana Barros, and Regina Almeida
- Subjects
Cancer Research ,applied_mathematics ,Oncology ,hematological malignancies ,treatment outcomes ,CAR-T cell exhaustion ,memory pool ,functional CAR-T cells ,antigen dependent CAR-T expansion - Abstract
Chimeric Antigen Receptor (CAR)-T cell immunotherapy revolutionized cancer treatment and consists of the genetic modification of T lymphocytes with a CAR gene, aiming to increase their ability to recognize and kill antigen-specific tumor cells. The dynamics of CAR-T cell responses in patients presents a multiphasic kinetics with distribution, expansion, contraction, and persistence phases. The characteristics and duration of each phase depend on the tumor type, the infused product, and on patient-specific characteristics. We present a mathematical model which describes the multiphasic CAR-T cell dynamics resulting from the interplay between CAR-T and tumor cells, considering patient and product heterogeneities. The CAR-T cell population is divided into functional (distributed and effector), memory, and exhausted CAR-T cell phenotypes. The model is able to describe the diversity of CAR-T cell dynamic behaviors in different patients and hematological cancers as well as their therapy outcomes. Our results indicate that the joint assessment of the area under the concentration-time curve in the first 28 days and the corresponding fraction of non-exhausted CAR-T cells may be considered as potential markers to classify therapy responses. Overall, the analysis of different CAR-T cell phenotypes can be a key aspect for a better understanding of the whole CAR-T cell dynamics.
- Published
- 2022
- Full Text
- View/download PDF
18. Solving the dynamic economic dispatch by a memory-based global differential evolution and a repair technique of constraint handling.
- Author
-
Zou, Dexuan, Li, Steven, Kong, Xiangyong, Ouyang, Haibin, and Li, Zongyan
- Subjects
- *
ENERGY economics , *DIFFERENTIAL evolution , *ALGORITHMS , *ELECTRIC generators , *ENERGY consumption - Abstract
In this paper, we propose a memory-based global differential evolution (MGDE) algorithm and a repair technique of constraint handling for the dynamic economic dispatch problems. On the one hand, MGDE modifies the mutation of DE/best/1, and uses a memory pool to provide more candidate solutions for this operation. Moreover, it adopts a randomly generated scale factor in the modified mutation to enhance its exploration capacity. In the crossover, a dynamical crossover rate is introduced to balance MGDE's global and local search capacities. On the other hand, a repair technique is designed for handling three kinds of constraints associated with generator capacity, power balance and generating unit ramp-rate. Moreover, a commonly used penalty function method is subsequently employed to handle the possible constraint violations associated with power balance and prohibited operation zones (POZs). To judge the performance of MGDE and the efficiency of the repair technique, we have solved six well-known DED problems taken from different sources. According to the experimental results, MGDE shows a superior performance in comparison with other improved DEs which also solve these problems. In the mean time, the repair technique of constraint handling has a high efficiency in eliminating or reducing the constraint violations. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory
- Author
-
Haris Volos
- Subjects
Non-volatile memory ,Hardware_MEMORYSTRUCTURES ,Memory management ,Memory errors ,Hardware and Architecture ,Computer science ,Encoding (memory) ,Distributed computing ,Collective protection ,Memory pool ,Storage efficiency ,Replication (computing) - Abstract
Disaggregated memory leverages recent technology advances in high-density, byte-addressable non-volatile memory and high-performance interconnects to provide a large memory pool shared across multiple compute nodes. Due to higher memory density, memory errors may become more frequent. Unfortunately, tolerating memory errors through existing memory-error protection techniques becomes impractical due to increasing storage cost. This letter proposes replication-aware memory-error protection to improve storage efficiency of protection in data-centric applications that already rely on memory replication for performance and availability. It lets such applications lower protection storage cost by weakening the protection of each individual replica, but still realize a strong protection target by relying on the collective protection conferred by multiple replicas.
- Published
- 2021
20. Investigating Orphan Transactions in the Bitcoin Network
- Author
-
Ari Trachtenberg, Muhammad Anas Imtiaz, and David Starobinski
- Subjects
Computer Networks and Communications ,Computer science ,Node (networking) ,Byte ,Memory pool ,Joins ,020206 networking & telecommunications ,02 engineering and technology ,Discount points ,Computer security ,computer.software_genre ,Overhead (business) ,0202 electrical engineering, electronic engineering, information engineering ,Network overhead ,Electrical and Electronic Engineering ,Database transaction ,computer - Abstract
Orphan transactions are those whose parental income sources are missing at the time that they are processed. These transactions typically languish in a local buffer until they are evicted or all their parents are discovered, at which point they may be propagated further. To date, there has been little work in the literature on characterizing the nature and impact of such orphans, and yet it is intuitive that they should affect the performance of the Bitcoin network. This work thus seeks to methodically research such effects through a measurement campaign on live Bitcoin nodes. Our data show that about 45% of orphan transactions end up being included in the blockchain. Surprisingly, orphan transactions tend to have fewer parents on average than non-orphan transactions, and their missing parents have a lower fee, larger size, and lower transaction fee per byte than all other received transactions. Moreover, the network overhead incurred by these orphan transactions can be significant, exceeding 17% when using the default orphan memory pool size (i.e., 100 transactions), although this overhead can be made negligible, without significant computational or memory demands, if the pool size is simply increased to 1000 transactions. Finally, we show that when a node with an empty mempool first joins the network, 25% of the transactions that it receives become orphan, whereas in steady-state this quantity drops to about 1%.
- Published
- 2021
21. E2bird: <u>E</u>nhanced <u>E</u>lastic <u>B</u>atch for <u>I</u>mproving <u>R</u>esponsiveness and Throughput of <u>D</u>eep Learning Services
- Author
-
Quan Chen, Minyi Guo, Xiaoxin Tang, Weihao Cui, Han Zhao, and Mengze Wei
- Subjects
Computational Theory and Mathematics ,Artificial neural network ,Hardware and Architecture ,Computer science ,Distributed computing ,Quality of service ,Signal Processing ,Memory pool ,Latency (engineering) ,Throughput (business) ,Host (network) - Abstract
We aim to tackle existing problems about deep learning serving on GPUs in the view of the system. GPUs have been widely adopted to serve online deep learning-based services that have stringent QoS(Quality-of-Service) requirements. However, emerging deep learning serving systems often result in poor responsiveness and low throughput of the inferences that damage user experience and increase the number of GPUs required to host an online service. Our investigation shows that the poor batching operation and the lack of data transfer-computation overlap are the root causes of the poor responsiveness and low throughput. To this end, we propose E $^2$ 2 bird, a deep learning serving system that is comprised of a GPU-resident memory pool, a multi-granularity inference engine, and an elastic batch scheduler. The memory pool eliminates the unnecessary waiting of the batching operation and enables data transfer-computation overlap. The inference engine enables concurrent execution of different batches, improving the GPU resource utilization. The batch scheduler organizes inferences elastically to guarantee the QoS. Our experimental results on an Nvidia Titan RTX GPU show that E $^2$ 2 bird reduces the response latency of inferences by up to 82.4 percent and improves the throughput by up to 62.8 percent while guaranteeing the QoS target compared with TensorFlow Serving.
- Published
- 2021
22. Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network
- Author
-
Huang, Linyong, Zhang, Zhe, Li, Shuangchen, Niu, Dimin, Guan, Yijin, Zheng, Hongzhong, Xie, Yuan, Huang, Linyong, Zhang, Zhe, Li, Shuangchen, Niu, Dimin, Guan, Yijin, Zheng, Hongzhong, and Xie, Yuan
- Abstract
Graph Neural Networks have drawn tremendous attention in the past few years due to their convincing performance and high interpretability in various graph-based tasks like link prediction and node classification. With the ever-growing graph size in the real world, especially for industrial graphs at a billion-level, the storage of graphs can easily consume Terabytes so that the process of GNNs has to be processed in a distributed manner. As a result, the execution could be inefficient due to the expensive cross-node communication and irregular memory access. Various GNN accelerators have been proposed for efficient GNN processing. They, however, mainly focused on small and medium-size graphs, which is not applicable to large-scale distributed graphs. In this paper, we present a practical Near-Data-Processing architecture based on a memory-pool system for large-scale distributed GNNs. We propose a customized memory fabric interface to construct the memory pool for low-latency and high throughput cross-node communication, which can provide flexible memory allocation and strong scalability. A practical Near-Data-Processing design is proposed for efficient work offloading and bandwidth utilization improvement. Moreover, we also introduce a partition and scheduling scheme to further improve performance and achieve workload balance. Comprehensive evaluations demonstrate that the proposed architecture can achieve up to 27 × and 8 × higher training speed compared to two state-of-the-art distributed GNN frameworks: Deep Graph Library and P3, respectively. © 2013 IEEE.
- Published
- 2022
23. The Lightweight Runtime Engine of the Wireless Internet Platform for Mobile Devices
- Author
-
You, Yong-Duck, Park, Choong-Bum, Choi, Hoon, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Rangan, C. Pandu, editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Lee, Yann-Hang, editor, Kim, Heung-Nam, editor, Kim, Jong, editor, Park, Yongwan, editor, Yang, Laurence T., editor, and Kim, Sung Won, editor
- Published
- 2007
- Full Text
- View/download PDF
24. Application-Level Checkpointing Techniques for Parallel Programs
- Author
-
Walters, John Paul, Chaudhary, Vipin, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Madria, Sanjay K., editor, Claypool, Kajal T., editor, Kannan, Rajgopal, editor, Uppuluri, Prem, editor, and Gore, Manoj Madhava, editor
- Published
- 2006
- Full Text
- View/download PDF
25. Open MPI: A Flexible High Performance MPI
- Author
-
Graham, Richard L., Woodall, Timothy S., Squyres, Jeffrey M., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Dough, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Wyrzykowski, Roman, editor, Dongarra, Jack, editor, Meyer, Norbert, editor, and Waśniewski, Jerzy, editor
- Published
- 2006
- Full Text
- View/download PDF
26. The Privacy of T Cell Memory to Viruses
- Author
-
Welsh, R. M., Kim, S. K., Cornberg, M., Clute, S. C., Selin, L. K., Naumov, Y. N., Compans, R. W., editor, Cooper, M. D., editor, Honjo, T., editor, Koprowski, H., editor, Melchers, F., editor, Oldstone, M. B. A., editor, Olsnes, S., editor, Svanborg, C., editor, Vogt, P. K., editor, Wagner, H., editor, Pulendran, Bali, editor, and Ahmed, Rafi, editor
- Published
- 2006
- Full Text
- View/download PDF
27. Predicting confirmation times of Bitcoin transactions
- Author
-
Jacques Resing, David Koops, Martijn Gijsbers, Rowel Gundlach, Probability, Mathematics and Computer Science, and Stochastic Operations Research
- Subjects
Computer Networks and Communications ,Computer science ,05 social sciences ,bitcoin ,Process (computing) ,Memory pool ,Heavy traffic approximation ,01 natural sciences ,Inverse Gaussian distribution ,010104 statistics & probability ,symbols.namesake ,Hardware and Architecture ,0502 economics and business ,Econometrics ,symbols ,corrected diffusion approximation ,cramer-lundberg model ,0101 mathematics ,Heavy traffic ,confirmation times ,Database transaction ,050203 business & management ,Software - Abstract
We study the distribution of confirmation times of Bitcoin transactions, conditional on the size of the current memory pool. We argue that the time until a Bitcoin transaction is confirmed resembles the time to ruin in a corresponding Cramer-Lundberg process. This well-studied model gives mathematical insights in the mempool behaviour over time. Specifically, for situations where one chooses a fee, such that the total size of incoming transactions with higher fee is close to the total size of transactions leaving the mempool (heavy traffic), a diffusion approximation leads to an inverse Gaussian distribution for the confirmation times. The results of this paper are particularly interesting for users that want to make a Bitcoin transaction during heavy-traffic situations, as evaluation of the well-known inverse Gaussian distribution is computationally straightforward.
- Published
- 2021
28. Real power loss reduction by percheron optimization algorithm
- Author
-
Lenin Kanagasabai
- Subjects
Power loss ,Computer Networks and Communications ,Computer science ,Applied Mathematics ,Process (computing) ,Memory pool ,Value (computer science) ,020206 networking & telecommunications ,02 engineering and technology ,Computer Science Applications ,Reduction (complexity) ,Computational Theory and Mathematics ,Ranking ,Artificial Intelligence ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Minification ,Electrical and Electronic Engineering ,Information Systems ,Voltage - Abstract
In this paper Percheron optimization algorithm (POA) has been designed for voltage stability enhancement and power loss reduction. Percherons are used in agriculture segment and carrying of weighty goods. In each group a stallion will be there and it takes control of the group activities. Based on the Pecking-order food and others things are accessed. Ranking of the Percheron has been done with reference to the fitness value. Search space has been modeled by the arrays of zeros and ones. Memory updating of the Percheron is done by Percheron Memory Pool (PMP). In the process some Percheron will be left out if it has poor fitness value. By means of considering L (voltage stability)—index POA verified in IEEE 30- bus system. Then without L-index POA appraised in 30 bus test systems. POA condensed the power loss proficiently with augmentation of voltage stability and minimization of voltage deviation.
- Published
- 2021
29. A Patterns Catalog for RTSJ Software Designs
- Author
-
Benowitz, Edward G., Niessner, Albert F., Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Meersman, Robert, editor, and Tari, Zahir, editor
- Published
- 2003
- Full Text
- View/download PDF
30. Homeostatic Proliferation But not The Generation of Virus Specific Memory CD8 T Cells is Impaired in the Absence of IL-15 or IL-15Rα
- Author
-
Wherry, E. John, Becker, Todd C., Boone, David, Kaja, Murali-Krishna, Ma, Averil, Ahmed., Rafi, Gupta, Sudhir, editor, Butcher, Eugene, editor, and Paul, William, editor
- Published
- 2002
- Full Text
- View/download PDF
31. Generation and Characterization of Memory Cd4 T Cells
- Author
-
Ben-Sasson, S. Z., Zukovsky, Irena, Biton, Aliza, Vogel, Ron, Foucras, Gilles, Hayashi, Nobuki, Paul, William E., Gupta, Sudhir, editor, Butcher, Eugene, editor, and Paul, William, editor
- Published
- 2002
- Full Text
- View/download PDF
32. Object-Orientation and Operating Systems
- Author
-
Gal, Andreas, Spinczyk, Olaf, Alvarez, Dario, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Hernández, Juan, editor, and Moreira, Ana, editor
- Published
- 2002
- Full Text
- View/download PDF
33. MOSIQS: Persistent Memory Object Storage With Metadata Indexing and Querying for Scientific Computing
- Author
-
Youngjae Kim, Hyogi Sim, Sudharshan S. Vazhkudai, and Awais Khan
- Subjects
010302 applied physics ,General Computer Science ,Computer science ,Search engine indexing ,General Engineering ,Memory pool ,020206 networking & telecommunications ,02 engineering and technology ,Data structure ,persistent memory storage ,01 natural sciences ,Computational science ,TK1-9971 ,Object storage ,Metadata ,Memory management ,Shared memory ,0103 physical sciences ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,PM~index data structures ,Electrical engineering. Electronics. Nuclear engineering ,Memory-centric computing and HPC ,scientific metadata indexing and search - Abstract
Scientific applications often require high-bandwidth shared storage to perform joint simulations and collaborative data analytics. Shared memory pools provide a chance to satisfy such needs. Recently, a high-speed network such as Gen-Z utilizing persistent memory (PM) offers an opportunity to create a shared memory pool connected to compute nodes. However, there are several challenges to use scientific applications on the shared memory pool directly such as scalability, failure-atomicity, and lack of scientific metadata-based search and query. In this paper, we propose MOSIQS, a persistent memory object storage framework with metadata indexing and querying for scientific computing. We design MOSIQS based on the key idea that memory objects on PM pool can live beyond the application lifetime and can become the sharing currency for applications and scientists. MOSIQS provides an aggregate memory pool atop an array of persistent memory devices to store and access memory objects to accelerate scientific computing. MOSIQS uses a lightweight persistent memory key-value store to manage the metadata of memory objects, which enables memory object sharing. To facilitate metadata search and query over millions of memory objects resident on memory pool, we introduce Group Split and Merge (GSM), a novel persistent index data structure designed primarily for scientific datasets. GSM splits and merges dynamically to minimize the query search space and maintains low query processing time while overcoming the index storage overhead. MOSIQS is implemented on top of PMDK. We evaluate the proposed approach on many-core server with an array of real PM devices. Experimental results show that MOSIQS gains a 100% write performance improvement and executes multi-attribute queries efficiently with $2.7\times $ less index storage overhead offering significant potential to speed up scientific computing applications.
- Published
- 2021
34. Exploring Means to Enhance the Efficiency of GPU Bitmap Index Query Processing
- Author
-
Brennan Schaffner, Brandon Tran, David Chiu, Jason Sawin, and Joseph M. Myre
- Subjects
Speedup ,Computer science ,Computational Mechanics ,Memory pool ,computer.file_format ,Parallel computing ,Data structure ,Computer Science Applications ,Metadata ,Overhead (computing) ,Bitmap index ,Bitmap ,computer ,Word (computer architecture) - Abstract
Once exotic, computational accelerators are now commonly available in many computing systems. Graphics processing units (GPUs) are perhaps the most frequently encountered computational accelerators. Recent work has shown that GPUs are beneficial when analyzing massive data sets. Specifically related to this study, it has been demonstrated that GPUs can significantly reduce the query processing time of database bitmap index queries. Bitmap indices are typically used for large, read-only data sets and are often compressed using some form of hybrid run-length compression. In this paper, we present three GPU algorithm enhancement strategies for executing queries of bitmap indices compressed using word aligned hybrid compression: (1) data structure reuse (2) metadata creation with various type alignment and (3) a preallocated memory pool. The data structure reuse greatly reduces the number of costly memory system calls. The use of metadata exploits the immutable nature of bitmaps to pre-calculate and store necessary intermediate processing results. This metadata reduces the number of required query-time processing steps. Preallocating a memory pool can reduce or entirely remove the overhead of memory operations during query processing. Our empirical study showed that performing a combination of these strategies can achieve 32.4$$\times$$ × to 98.7$$\times$$ × speedup over the current state-of-the-art implementation. Our study also showed that by using our enhancements, a common gaming GPU can achieve a $$15.0\times$$ 15.0 × speedup over a more expensive high-end CPU.
- Published
- 2020
35. Memory Mechanisms for Discriminative Visual Tracking Algorithms With Deep Neural Networks
- Author
-
Lei Zhang, Lituan Wang, Jianyong Wang, and Zhang Yi
- Subjects
Artificial neural network ,Computer science ,business.industry ,Reading (computer) ,Reliability (computer networking) ,Feature extraction ,Memory pool ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,Object (computer science) ,01 natural sciences ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Eye tracking ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Software ,0105 earth and related environmental sciences - Abstract
Deep-neural-networks-based online visual tracking methods have achieved state-of-the-art results. One of the core components of these methods is the memory pool, in which a number of samples consisting of image patches and the corresponding labels are stored to update the online tracking network. Hence, the mechanism of updating the stored samples determines the performance of the tracking method. In this paper, a novel memory mechanism is proposed to control the writing and reading accesses of the memory pool using credit assignment network ${H}$ , which learns features of the target object. This memory mechanism comprises the writing and reading mechanisms. In the writing mechanism, network ${H}$ produces credits for the current tracked object and the samples in the memory pool. This ensures that the reliable samples are written into the memory pool and the unreliable samples are replaced if the memory pool is full. In the reading mechanism, network ${H}$ assigns an importance score to each sample selected to update the online tracking network. The state-of-the-art tracking methods with and without the proposed memory mechanism are evaluated on the CVPR2013 and OTB100 benchmarks. The experimental results demonstrated that the proposed memory mechanism improves tracking performance significantly.
- Published
- 2020
36. Accelerating GPU Message Communication for Autonomous Navigation Systems
- Author
-
Jiangming Jin, Hao Wu, Jidong Zhai, Yifan Gong, and Liu Wei
- Subjects
Data processing ,Software ,Speedup ,Computer science ,business.industry ,Autonomous Navigation System ,Distributed computing ,Memory pool ,Throughput ,Latency (engineering) ,business ,Object detection - Abstract
Autonomous navigation systems consist of multiple software modules, such as sensing, object detection, and planning, to achieve traffic perception and fast decision making. Such a system generates a large amount of data and requires data processing and communication in real-time. Although accelerators, such as GPUs, have been exploited to speed up data processing, communicating GPU messages between modules is still lacking support, leading to high communication latency and resource contention. For such a latency-sensitive and resource-limited autonomous navigation system, high performance and lightweight message communication are crucial and demanding. To obtain both high performance and low resource usages, we first propose a novel pub-centric memory pool and an on-the-fly offset conversion algorithm to avoid unnecessary data movement. Secondly, we combine these two techniques and propose an efficient message communication on a single GPU. Finally, we extend this approach to multi-GPU and design a framework that natively supports GPU message communication for Inter-Process Communication. With comprehensive evaluation, results show our approach is able to reduce communication latency by 53.7% for PointCloud and Image messages compared to the state-of-the-art approach. Moreover, in the real autonomous navigation scenario, our approach reduces the end-to-end latency by 29.2% and decreases resource usage up to 58.9%.
- Published
- 2021
37. Extractive single document summarization using multi-objective modified cat swarm optimization approach: ESDS-MCSO
- Author
-
Partha Pakray, Ranjita Das, and Dipanwita Debnath
- Subjects
education.field_of_study ,Optimization problem ,Computer science ,Population ,Memory pool ,Swarm behaviour ,Cohesion (computer science) ,computer.software_genre ,Automatic summarization ,Constraint (information theory) ,Artificial Intelligence ,Limit (mathematics) ,Data mining ,education ,computer ,Software - Abstract
As the world is progressing faster, to compete with the demand, the need for proficient computing technology has increased, resulting in huge volumes of data. Consequently, the extraction of relevant information from such a massive volume of data in a short time becomes challenging. Hence, automatic text summarization (TS) has emerged as an efficient solution to this problem. In the current study, the automatic TS problem is formulated as a multi-objective optimization problem, and to mitigate this problem, the modified cat swarm optimization (MCSO) strategy is employed. In this work, the population is represented as a collection of feasible individuals where the summary length limit is considered as a constraint that determines the feasibility of an individual. Here, each individual is shaped by randomly selecting some of the sentences encoded in the binary form. Furthermore, two objective functions, namely “coverage and informativeness” and “anti-redundancy,” are used to evaluate each individual’s fitness. Also, to update the position of an individual, genetic and bit manipulating operators and the best cat memory pool have been incorporated into the system. Finally, from the generated non-dominated optimal solutions, the best solution is selected based on the ROUGE score for the summary generation process. The system’s performance is evaluated using ROUGE-1 and ROUGE-2 measures on two standard summarization datasets, namely DUC-2001 and DUC-2002, which revealed that the proposed approach achieved a noticeable improvement in ROUGE scores compared to many state-of-the-art methods mentioned in this paper. The system is also evaluated using the generational distance, CPU processing time, and cohesion, reflecting that the obtained summaries are readable, concise, and relevant being fast converging.
- Published
- 2021
38. Gengar: An RDMA-based Distributed Hybrid Memory Pool
- Author
-
Xiaofei Liao, Haodi Lu, Haikun Liu, Hai Jin, Yu Zhang, Zhuohui Duan, and Bingsheng He
- Subjects
Non-volatile memory ,Hardware_MEMORYSTRUCTURES ,Remote direct memory access ,Computer science ,business.industry ,Embedded system ,Memory pool ,Cache ,DIMM ,business ,Communications protocol ,Dram ,Bottleneck - Abstract
Byte-addressable Non-volatile Memory (NVM) technologies promise higher density and lower cost than DRAM. They have been increasingly employed for data center applications. Despite many previous studies on using NVM in a single machine, there remain challenges to best utilize it in a distributed data center environment. This paper presents Gengar, an RDMA-enabled Distributed Shared Hybrid Memory (DSHM) pool with simple programming APIs on viewing remote NVM and DRAM in a global memory space. We propose to exploit semantics of RDMA primitives to identify frequently-accessed data in the hybrid memory pool, and cache it in distributed DRAM buffers. We redesign RDMA communication protocols to reduce the bottleneck of RDMA write latency by leveraging a proxy mechanism. Gengar also supports memory sharing among multiple users with data consistency guarantee. We evaluate Gengar in a real testbed equipped with Intel Optane DC Persistent DIMMs. Experimental results show that Gengar significantly improves the performance of public benchmarks such as MapReduce and YCSB by up to 70 % compared with state-of-the-art DSHM systems.
- Published
- 2021
39. Object Scanning of Windows Kernel Driver Based on Pool Tag Quick Scanning
- Author
-
Hailu Yang, Jiqiang Zhai, Yajun Xiao, and Jian Wang
- Subjects
genetic structures ,Computer science ,business.industry ,General Engineering ,Memory pool ,020207 software engineering ,TL1-4050 ,02 engineering and technology ,pool tag ,Memory forensics ,Constant false alarm rate ,memory forensics ,Kernel (image processing) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,pool tag quick scanning ,Artificial intelligence ,business ,human activities ,kernel driver object ,Motor vehicles. Aeronautics. Astronautics - Abstract
In the memory forensics, the Pool Tag Scanning based on the memory pool tag requires a detailed search of the physical memory when scanning the kernel driver object, which is very inefficient. The object scanning of Windows kernel driver by using the pool tag quick scanning is proposed. The method uses the quick pool tag scanning to reduce the memory range of the scan, and then scan the driver object according to the characteristics of the kernel driver object quickly, to help investigator to determine whether the driver is normal. Experimental results shows that the scanning efficiency for object scanning of kernel driver is improved greatly by using the quick pool tag scanning technology and the time spent in the scanning step is reduced while ensuring the false alarm rate is same.
- Published
- 2019
40. Exploiting stack-based buffer overflow using modern day techniques
- Author
-
Ștefan Nicula and Razvan Daniel Zota
- Subjects
Address space layout randomization ,Exploit ,Computer science ,Memory pool ,020206 networking & telecommunications ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Memory leak ,Data Execution Prevention ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,General Earth and Planetary Sciences ,Stack buffer overflow ,020201 artificial intelligence & image processing ,Executable ,computer ,General Environmental Science ,Buffer overflow ,Heap (data structure) - Abstract
One of the most commonly known vulnerabilities that can affect a binary executable is the stack-based buffer overflow. The buffer overflow occurs when a program, while writing data to a buffer, overruns the buffer’s boundary and overwrites adjacent memory locations. Nowadays, due to multiple protection mechanisms enforced by the operating system and on the executable level, the buffer overflow has become harder to exploit. Multiple bypassing techniques are often required to be used in order to successfully exploit the vulnerability and control the execution flow of the studied executable. One of the security features designed as protection mechanisms is Data Execution Prevention (DEP) which helps prevent code execution from the stack, heap or memory pool pages by marking all memory locations in a process as non-executable unless the location explicitly contains executable code. Another protection mechanism targeted is the Address Space Layout Randomization (ASLR), which is often used in conjunction with DEP. This security feature randomizes the location where the system executables are loaded into memory. By default, modern day operating systems have these security features implemented. However, on the executable level, they have to be explicitly enabled. Most of the protection mechanisms, like the ones mentioned above, require certain techniques in order to bypass them and many of these techniques are using some form of address memory leakage in order to leverage an exploit. By successfully exploiting a buffer overflow, the adversary can potentially obtain code execution on the affected operating system which runs the vulnerable executable. The level of privilege granted to the adversary is highly depended on the level of privilege that the binary is executed with. As such, an adversary may gain elevated privileges inside the system. Most of the times, this type of vulnerability is used for privilege escalation attacks or for gaining remote code execution on the system.
- Published
- 2019
41. An Embedded Inference Framework for Convolutional Neural Network Applications
- Author
-
Yingjie Zhang, Huaqing Min, Min Dong, and Sheng Bi
- Subjects
General Computer Science ,Artificial neural network ,Computer science ,business.industry ,mobile computing ,General Engineering ,Process (computing) ,Memory pool ,Inference ,020206 networking & telecommunications ,Deep learning ,02 engineering and technology ,Convolutional neural network ,TK1-9971 ,embedded system ,0202 electrical engineering, electronic engineering, information engineering ,Optimization methods ,020201 artificial intelligence & image processing ,General Materials Science ,Artificial intelligence ,Electrical engineering. Electronics. Nuclear engineering ,business ,mobile sensing - Abstract
With the rapid development of deep convolutional neural networks, more and more computer vision tasks have been well resolved. These convolutional neural network solutions rely heavily on the performance of the hardware. However, due to privacy issues or the network instability, we need to run convolutional neural networks on embedded platforms. Critical challenges will be raised by limited hardware resources on the embedded platform. In this paper, we design and implement an embedded inference framework to accelerate the inference of the convolutional neural network on the embedded platform. For this, we first analyzed the time-consuming layers in the inference process of the network, and then we design optimization methods for these layers. Also, we design a memory pool specifically for neural networks. Our experimental results show that our embedded inference framework can run a classification model MobileNet in 80ms and a detection model MobileNet-SSD in 155ms on Firefly-RK3399 development board.
- Published
- 2019
42. A Pick-and-Throw Method for Enhancing Robotic Sorting Ability via Deep Reinforcement Learning
- Author
-
Jun Li, Zihan Fang, and Yanxu Hou
- Subjects
business.industry ,Computer science ,Sorting ,Memory pool ,Robot end effector ,Automation ,law.invention ,law ,Convergence (routing) ,Reinforcement learning ,Robot ,Artificial intelligence ,business ,Throwing - Abstract
To promote the work capacity and efficiency in a weakly structured logistics sorting scene, a pick-and-throw method based on reinforcement learning is proposed. First, a D N-based learning algorithm is used to obtain a picking feasibility distribution map for guiding an optimal picking action. Second, Deep Deterministic Policy Gradient (DDPG) algorithm is utilized to train a throwing policy. The throwing policy outputs a throwing velocity of the end-effector of a robot to throw the picked object to a target area. In addition, a memory pool optimization algorithm is also proposed to enhance the convergence of the throwing policy. Experiments are conducted in both simulation and real scenarios and results demonstrate that the proposed method can improve the sorting efficiency and expand the sorting space significantly.
- Published
- 2021
43. A convolutional fully connected structure with hard sample memory pool for land use classification
- Author
-
Arthur C. Depoian, Colleen P. Bailey, Dong Xie, and Lorenzo E. Jaques
- Subjects
Land use ,Artificial neural network ,Contextual image classification ,business.industry ,Computer science ,Deep learning ,Classifier (linguistics) ,Memory pool ,Pattern recognition ,Artificial intelligence ,Land cover ,business ,Convolutional neural network - Abstract
As one of the classic fields of computer vision, image classification has been booming with the improvement of chip performance and algorithm efficiency. With the rapid progress of deep learning in recent decades, remote sensing land cover and land use image classification has ushered in a golden period of development. This paper presents a new deep learning classifier to classify remote sensing land cover and land use images. The approach first uses multi-layer convolutional neural networks to extract the image features, attached through a fully-connected neural network to generate the sample loss. Then, a hard sample memory pool is created to collect the samples with large losses during the training. A batch of hard samples is randomly extracted from the memory pool to participate in the training of the convolutional fully connected model so that the model becomes more robust. Our method is validated by testing the classic remote sensing land cover and land use dataset. Compared with the previous popular classification algorithm, our algorithm can classify images more accurately with a shorter training iteration.
- Published
- 2021
44. DNN-Based Resource Allocation for Cooperative CR Networks with Energy Harvesting
- Author
-
Han Hu, Dingguo Wu, Cen Yang, and Rose Qingyang Hu
- Subjects
Mathematical optimization ,Computer science ,Wireless network ,Memory pool ,020206 networking & telecommunications ,02 engineering and technology ,Transmitter power output ,Power budget ,Base station ,Cognitive radio ,0202 electrical engineering, electronic engineering, information engineering ,Resource allocation ,020201 artificial intelligence & image processing ,Resource management - Abstract
Cognitive radio (CR) and energy harvesting (EH) have been deemed two promising technologies in the spectrum-scarce and energy-limited wireless networks. In this paper, the cooperative cognitive radio network (CRN) with EH is considered, where a secondary user (SU) close to the secondary base station (SBS) employs power splitting for EH and assists to relay the data for another SU far away from the SBS. A SU sum-rate maximization problem is formulated under the constraints of the power budget at the SBS, the interference threshold of the primary network, and SU QoS. To tackle this problem, a resource allocation algorithm based on an improved deep neural network (DNN) is proposed. In order to accelerate the convergence of the DNN loss function, transfer learning is exploited to initialize the DNN weights. The loss between the DNN output and the optimal transmit power obtained by the conventional solution is stored in the memory pool, where the samples with large losses are used to train the DNN. Simulation results show the efficiency of our proposed DNN-based resource allocation scheme, which outperforms the normal DNN-based resource allocation and conventional resource allocation scheme in terms of the computation time.
- Published
- 2021
45. G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression
- Author
-
Onur Mutlu, Xiaoyong Du, Xipeng Shen, Zaifeng Pan, Jidong Zhai, Yanliang Zhou, and Feng Zhang
- Subjects
FOS: Computer and information sciences ,Lossless compression ,Speedup ,Computer science ,business.industry ,Big data ,Memory pool ,Databases (cs.DB) ,Parallel computing ,Data structure ,Instruction set ,Computer Science - Databases ,Hardware Architecture (cs.AR) ,Synchronization (computer science) ,Computer Science - Hardware Architecture ,business ,Massively parallel - Abstract
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data. G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while directly using locks and atomic operations lead to large synchronization overheads. We develop a memory pool with thread-safe data structures on GPUs to handle such difficulties. Third, maintaining the sequence information among words is essential for lossless compression. We design a sequence-support strategy, which maintains high GPU parallelism while ensuring sequence information. Our experimental evaluations show that G-TADOC provides 31.1x average speedup compared to state-of-the-art TADOC., Comment: 37th IEEE International Conference on Data Engineering (ICDE 2021)
- Published
- 2021
46. Virus-specific NK cell memory
- Author
-
Joseph C. Sun and Sam Sheppard
- Subjects
Infectious disease and host defense ,medicine.medical_treatment ,Immunology ,Cell ,Population ,Cytomegalovirus ,Review ,Biology ,Virus ,Mice ,medicine ,Animals ,Humans ,Immunology and Allergy ,Receptor ,Cytotoxicity ,education ,education.field_of_study ,Effector ,Memory pool ,Cell biology ,Killer Cells, Natural ,medicine.anatomical_structure ,Cytokine ,Cytomegalovirus Infections ,Immune Memory Focus ,Immunologic Memory - Abstract
Natural killer cells are critical effectors of antiviral host defense. This review summarizes the recent literature, current paradigms, and molecular mechanisms underlying the ability of these innate lymphocytes to mount adaptive responses against viral infection in humans and mice., NK cells express a limited number of germline-encoded receptors that identify infected or transformed cells, eliciting cytotoxicity, effector cytokine production, and in some circumstances clonal proliferation and memory. To maximize the functional diversity of NK cells, the array and expression level of surface receptors vary between individual NK cell “clones” in mice and humans. Cytomegalovirus infection in both species can expand a population of NK cells expressing receptors critical to the clearance of infected cells and generate a long-lived memory pool capable of targeting future infection with greater efficacy. Here, we discuss the pathways and factors that regulate the generation and maintenance of effector and memory NK cells and propose how this understanding may be harnessed therapeutically.
- Published
- 2021
47. Replicative history marks transcriptional and functional disparity in the CD8+ T cell memory pool
- Author
-
Kaspar Bresser, King L, Swain A, Leïla Perié, Kok L, Scheeren F, Tom S. Weber, Ton N. Schumacher, Ken R. Duffy, Jacobs L, and Boer Rd
- Subjects
Text mining ,business.industry ,Memory pool ,Cytotoxic T cell ,Computational biology ,Biology ,business - Abstract
Clonal expansion is a core aspect of T cell immunity. However, little is known with respect to the relationship between replicative history and the formation of distinct CD8+ memory T cell subgroups. To address this issue, we developed a genetic-tracing approach, termed the DivisionRecorder, that reports the extent of past proliferation of cell pools in vivo. Using this system to genetically ‘record’ the replicative history of different CD8+ T cell populations throughout a pathogen-specific immune response, we demonstrate that the central memory T cell (TCM) pool is marked by a higher number of prior divisions than the effector memory T cell pool, due to the combination of strong proliferative activity during the acute immune response and selective proliferative activity after pathogen clearance. Furthermore, by combining DivisionRecorder analysis with single cell transcriptomics and functional experiments, we show that replicative history identifies distinct cell pools within the TCM compartment. Specifically, we demonstrate that lowly divided TCM display enriched expression of stem-cell-associated genes, and that such lowly divided cells are superior in eliciting a proliferative recall response. The latter data provide the first evidence that a stem cell like memory T cell pool that reconstitutes the CD8+ T cell effector pool upon reinfection is marked by prior quiescence.
- Published
- 2021
48. Memory Pool Publisher Algorithm for Preventing Malicious Fork in the Bitcoin Environment
- Author
-
Hatem Abdelkader, Ahmed H. Madkour, and Asmaa H Ali
- Subjects
Cryptocurrency ,Blockchain ,Computer science ,Fork (system call) ,Memory pool ,Linked list ,Algorithm ,Database transaction ,Rollback ,Block (data storage) - Abstract
— Blockchain technology is used by most Bitcoin systems to store all historical transaction information. Blockchain is a chain of blocks similar to the linked list structure and can be changed to a fork structure, in which there are two types of forks: useful fork or an intentional fork structure. A useful fork may appear when the rules of the Bitcoin system are updated. On the other hand, the intentional fork may appear when a miner has supercomputer properties, generates a set of blocks as a private branch, and does not publish this branch to the blockchain until its length exceeds the length of the main branch. A set of blockchain transactions will be rollbacked when the intentional fork occurs in the Bitcoin system, user waiting times will increase, and miner rewards will illegally increase. A Memory pool publisher algorithm is suggested in this paper to avoid the fork issues in the Bitcoin system, for instance: intentional fork, rollback problem, users waiting time. The proposed algorithm is to make the system a single publisher and divide the block's construction into two phases. A miner constructs a block and sends it to the memory pool as the first phase. The memory pool will send the construction block to the blockchain as the second phase. The findings indicate that the proposed algorithm has a strong potential to avoid the blockchain's intentional fork problem and thus minimize user waiting times for the rollback problem.
- Published
- 2021
49. Performance Evaluation of Fabric-Attached Memory Pool for AI Applications
- Author
-
Myeong-Hoon Oh and Young Woo Kim
- Subjects
Computer architecture ,Computer science ,business.industry ,NVM Express ,Big data ,Systems architecture ,Benchmark (computing) ,Device file ,Memory pool ,Applications of artificial intelligence ,Unconventional computing ,business - Abstract
Recently, traditional compute-oriented system architecture is changing and diverting. The increasing demands for big data and artificial intelligence accelerate the needs for new and alternative architecture in computing system. One of the emerging area for alternative computing architecture is memory-centric or disaggregated computing. The memory-centric or disaggregated computing can solve the requirement for huge memory in system wide. In this paper, we present preliminary performance evaluation results of the fabric-attached memory system with industry standard based memory pool prototype hardware. The memory pool prototype hardware is configured as block device for benchmarking. For evaluation, well-known benchmarks – sysbench and resnet – are used. The preliminary benchmark results show that the overall access performance of the fabric-attached prototype hardware is comparable to SSD, and deep learning performance is close to NVMe.
- Published
- 2021
50. PCIe Bridge Hardware for Gen-Z Memory System
- Author
-
Chanho Park, Wonok Kwon, Myeong-Hoon Oh, and Cheol-Hoon Lee
- Subjects
Interconnection ,Upgrade ,Adapter (computing) ,business.industry ,Computer science ,Memory pool ,business ,Protocol (object-oriented programming) ,Bridge (interpersonal) ,Computer hardware ,PCI Express ,Block (data storage) - Abstract
As applications requiring a lot of memory grows, memory-centric computing systems which has a lot of memory pool are drawing attention. And the high-speed interconnection for accessing memory pools in this system is a very important part. In this paper, the bridge hardware for connecting PCI Express bus which is widely used in general systems, to Gen-Z protocol adapter which is one of the new high-speed networks, is shown, and its maximum performance is measured. This hardware has shown that it can support up to 720MB/s for memory writes and 3.5GB/s for block writes. This bridge hardware is expected to upgrade the performance of existing hardware and can expand the application field of the Gen-Z memory system.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.