158 results on '"Weichen Liu"'
Search Results
2. SurgeNAS: A Comprehensive Surgery on Hardware-Aware Differentiable Neural Architecture Search
- Author
-
Xiangzhong Luo, Di Liu, Hao Kong, Shuo Huai, Hui Chen, Weichen Liu, School of Computer Science and Engineering, and HP-NTU Digital Manufacturing Corporate Lab
- Subjects
Computational Theory and Mathematics ,Hardware and Architecture ,Neural Architecture Search ,Computer science and engineering [Engineering] ,Hardware Performance Prediction ,Software ,Theoretical Computer Science - Abstract
Differentiable neural architecture search (NAS) is an emerging paradigm to automate the design of top-performing convolutional neural networks (CNNs). However, previous differentiable NAS methods suffer from several crucial weaknesses, such as inaccurate gradient estimation, high memory consumption, search fairness, etc. More importantly, previous differentiable NAS works are mostly hardware-agnostic since they only search for CNNs in terms of accuracy, ignoring other critical performance metrics like latency. In this work, we introduce a novel hardware-aware differentiable NAS framework, namely SurgeNAS, in which we leverage the one-level optimization to avoid inaccuracy in gradient estimation. To this end, we propose an effective identity mapping regularization to alleviate the over-selecting issue. Besides, to mitigate the memory bottleneck, we propose an ordered differentiable sampling approach, which significantly reduces the search memory consumption to the single-path level, thereby allowing to directly search on target tasks instead of small proxy tasks. Meanwhile, it guarantees the strict search fairness. Moreover, we introduce a graph neural networks (GNNs) based predictor to approximate the on-device latency, which is further integrated into SurgeNAS to enable the latency-aware architecture search. Finally, we analyze the resource underutilization issue, in which we propose to scale up the searched SurgeNets within \textit{Comfort Zone} to balance the computation and memory access, which brings considerable accuracy improvement without deteriorating the execution efficiency. Extensive experiments are conducted on ImageNet with diverse hardware platforms, which clearly show the effectiveness of SurgeNAS in terms of accuracy, latency, and search efficiency. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087).
- Published
- 2023
- Full Text
- View/download PDF
3. The Impact of Federal Reserve’s Interest Rate Policy on US Semiconductor Industry
- Author
-
Weichen Liu
- Abstract
The Federal Reserve raised interest rates by 75BP in June. In the meantime, the American government has enacted laws and orders to promote the growth of the semiconductor sector [4]. The purpose of this essay is to assess how the Federal Reserve's interest rate policy has affected the US semiconductor industry. To determine whether it has a beneficial or negative impact on the industry, this research uses the VAR model. This study projects the yield and volatility of semiconductors using the ARMA-GARCH model. The findings demonstrate that raising interest rates encourages greater investment in the stock market, which is advantageous for the semiconductor sector. But as time passes, the yield's volatility rises as well. This paper provides the reason that since the US chip sector has long had a dominant position, chip demand is not price elastic, and the strengthening of the currency is insufficient to reduce chip demand. The outcome is dominated by net capital inflows.
- Published
- 2023
- Full Text
- View/download PDF
4. FAT: An In-Memory Accelerator With Fast Addition for Ternary Weight Neural Networks
- Author
-
Shien Zhu, Luan H. K. Duong, Hui Chen, Di Liu, Weichen Liu, School of Computer Science and Engineering, Parallel and Distributed Computing Centre, and HP-NTU Digital Manufacturing Corporate Lab
- Subjects
FOS: Computer and information sciences ,Computer Science - Artificial Intelligence ,Computer science and engineering::Computing methodologies::Artificial intelligence [Engineering] ,Spin-Transfer Torque Magnetic Random-Access Memory ,Convolutional Neural Network ,Computer Graphics and Computer-Aided Design ,Computer science and engineering::Hardware::Arithmetic and logic structures [Engineering] ,Artificial Intelligence (cs.AI) ,Hardware Architecture (cs.AR) ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,Computer science and engineering::Computer systems organization::Special-purpose and application-based systems [Engineering] ,Software ,Ternary Wight Neural Network ,In-Memory Computing - Abstract
Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% average sparsity., Comment: 14 pages
- Published
- 2023
- Full Text
- View/download PDF
5. Has China’s Young Thousand Talents program been successful in recruiting and nurturing top-caliber scientists?
- Author
-
Dongbo Shi, Weichen Liu, and Yanbo Wang
- Subjects
China ,Multidisciplinary ,United States - Abstract
In this study, we examined China’s Young Thousand Talents (YTT) program and evaluated its effectiveness in recruiting elite expatriate scientists and in nurturing the returnee scientists’ productivity. We find that YTT scientists are generally of high caliber in research but, as a group, fall below the top category in pre-return productivity. We further find that YTT scientists are associated with a post-return publication gain across journal-quality tiers. However, this gain mainly takes place in last-authored publications and for high-caliber (albeit not top-caliber) recruits and can be explained by YTT scientists’ access to greater funding and larger research teams. This paper has policy implications for the mobility of scientific talent, especially as early-career scientists face growing challenges in accessing research funding in the United States and European Union
- Published
- 2023
- Full Text
- View/download PDF
6. Toward Minimum WCRT Bound for DAG Tasks Under Prioritized List Scheduling Algorithms
- Author
-
Shuangshuang Chang, Ran Bi, Jinghao Sun, Weichen Liu, Qi Yu, Qingxu Deng, and Zonghua Gu
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
7. Fast and Low Overhead Metadata Operations for NVM-Based File System Using Slotted Paging
- Author
-
Fangzhu Lin, Chunhua Xiao, Weichen Liu, Lin Wu, Chen Shi, and Kun Ning
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2022
- Full Text
- View/download PDF
8. Locking Protocols for Parallel Real-Time Tasks With Semaphores Under Federated Scheduling
- Author
-
Weichen Liu, Yue Tang, Xu Jiang, Nan Guan, and Yang Wang
- Subjects
Computer science ,Operating system ,Electrical and Electronic Engineering ,Semaphore ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,computer ,Software ,Scheduling (computing) - Published
- 2022
- Full Text
- View/download PDF
9. CARTAD: Compiler-Assisted Reinforcement Learning for Thermal-Aware Task Scheduling and DVFS on Multicores
- Author
-
Mingxiong Zhao, Zhenli He, Di Liu, Shi-Gui Yang, and Weichen Liu
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Scheduling (computing) ,Task (project management) ,Embedded system ,Reinforcement learning ,Compiler ,Minification ,Electrical and Electronic Engineering ,Latency (engineering) ,business ,Function (engineering) ,Frequency scaling ,computer ,Software ,media_common - Abstract
As the power density of modern CPUs is gradually increasing, thermal management has become one of the primary concerns for multicore systems, where task scheduling and dynamic voltage/frequency scaling (DVFS) play a pivotal role in effectively managing the system temperature. In this paper, we propose CARTAD, a new reinforcement learning (RL) based task scheduling and DVFS method for temperature minimization and latency guarantee on multicore systems. The novelty of CARTAD framework is that we exploit machine learning technique to analyze the applications’ intermediate representations (IRs) generated by compiler and identify an important feature which is critical for predicting the application’s performance. With the newly explored feature, we construct a RL-based scheduler with the more effective state representation and reward function such that the system temperature can be minimized while guaranteeing applications’ latency. We implement and evaluate CARTAD on real platforms in comparison with the stateof-the-art approaches. Experimental results show CARTAD can reduce the maximum temperature by up to 16∘C and the average temperature by up to 10∘C.
- Published
- 2022
- Full Text
- View/download PDF
10. Designing Efficient DNNs via Hardware-Aware Neural Architecture Search and Beyond
- Author
-
Hui Chen, Shuo Huai, Xiangzhong Luo, Hao Kong, Weichen Liu, Di Liu, School of Computer Science and Engineering, and HP-NTU Digital Manufacturing Corporate Lab
- Subjects
Deep Neural Networks ,business.industry ,Computer science ,Computation ,Neural Architecture Search ,Latency (audio) ,Normalization (image processing) ,Evolutionary algorithm ,Computer Graphics and Computer-Aided Design ,Operator (computer programming) ,Computer science and engineering [Engineering] ,Deep neural networks ,Electrical and Electronic Engineering ,Architecture ,business ,Software ,Computer hardware ,Communication channel - Abstract
Hardware systems integrated with deep neural networks (DNNs) are deemed to pave the way for future artificial intelligence (AI). However, manually designing efficient DNNs involves non-trivial computation resources since significant trial-and-errors are required to finalize the network configuration. To this end, we, in this paper, introduce a novel hardware-aware neural architecture search (NAS) framework, namely GoldenNAS, to automate the design of efficient DNNs. To begin with, we present a novel technique, called dynamic channel scaling, to enable the channel-level search since the number of channels has non-negligible impacts on both accuracy and efficiency. Besides, we introduce an efficient progressive space shrinking method to raise the awareness of the search space towards target hardware and alleviate the search overheads as well. Moreover, we propose an effective hardware performance modeling method to approximate the runtime latency of DNNs upon target hardware, which is further integrated into GoldenNAS to avoid the tedious on-device measurements. Then, we employ the evolutionary algorithm (EA) to search for the optimal operator/channel configurations of DNNs, denoted as GoldenNets. Finally, to enable the depthwise adaptiveness of GoldenNets under dynamic environments, we propose the adaptive batch normalization (ABN) technique, followed by the self-knowledge distillation (SKD) approach to improve the accuracy of adaptive sub-networks. We conduct extensive experiments directly on ImageNet, which clearly demonstrate the advantages of GoldenNAS over existing state-of-the-art approaches. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1-001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087).
- Published
- 2022
- Full Text
- View/download PDF
11. ArSMART: An Improved SMART NoC Design Supporting Arbitrary-Turn Transmission
- Author
-
Weichen Liu, Luan H. K. Duong, Hui Chen, Peng Chen, Jun Zhou, and School of Computer Science and Engineering
- Subjects
FOS: Computer and information sciences ,Network on-Chip ,Computer science ,business.industry ,Electrical engineering ,ArSMART ,Computer Graphics and Computer-Aided Design ,Transmission (telecommunications) ,Hardware Architecture (cs.AR) ,Turn (geometry) ,Computer science and engineering [Engineering] ,Electrical and Electronic Engineering ,Computer Science - Hardware Architecture ,business ,Software - Abstract
SMART NoC, which transmits unconflicted flits to distant processing elements (PEs) in one cycle through the express bypass, is a high-performance NoC design proposed recently. However, if contention occurs, flits with low priority would not only be buffered but also could not fully utilize bypass. Although there exist several routing algorithms that decrease contentions by rounding busy routers and links, they cannot be directly applicable to SMART since it lacks the support for arbitrary-turn (i.e. the number and direction of turns are free of constraints) routing. Thus, in this article, to minimize contentions and further utilize bypass, we propose an improved SMART NoC, called ArSMART, in which arbitrary-turn transmission is enabled. Specifically, ArSMART divides the whole NoC into multiple clusters where the route computation is conducted by the cluster controller and the data forwarding is performed by the bufferless reconfigurable router. Since the long-range transmission in SMART NoC needs to bypass the intermediate arbitration, to enable this feature, we directly configure the input and output ports connection rather than apply hop-by-hop table-based arbitration. To further explore the higher communication capabilities, effective adaptive routing algorithms that are compatible with ArSMART are proposed. The route computation overhead, one of the main concerns for adaptive routing algorithms, is hidden by our carefully designed control mechanism. Compared with the state-of-the-art SMART NoC, the experimental results demonstrate an average reduction of 40.7% in application schedule length and 29.7% in energy consumption. Ministry of Education (MOE) Nanyang Technological University This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MoE2019-T2-1-071) and Tier 1 (MoE2019-T1-001-072), and Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087).
- Published
- 2022
- Full Text
- View/download PDF
12. Bringing AI to edge: From deep learning’s perspective
- Author
-
Weichen Liu, Xiangzhong Luo, Di Liu, Hao Kong, Ravi Subramaniam, School of Computer Science and Engineering, and HP-NTU Digital Manufacturing Corporate Lab
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,business.industry ,Computer science ,Cognitive Neuroscience ,Deep learning ,Perspective (graphical) ,Model Optimization ,Data science ,Bridge (nautical) ,Machine Learning (cs.LG) ,Computer Science Applications ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Deep Learning ,Artificial Intelligence ,Model compression ,Optimization methods ,Enhanced Data Rates for GSM Evolution ,Artificial intelligence ,Architecture ,business ,Edge computing - Abstract
Edge computing and artificial intelligence (AI), especially deep learning for nowadays, are gradually intersecting to build a novel system, called edge intelligence. However, the development of edge intelligence systems encounters some challenges, and one of these challenges is the \textit{computational gap} between computation-intensive deep learning algorithms and less-capable edge systems. Due to the computational gap, many edge intelligence systems cannot meet the expected performance requirements. To bridge the gap, a plethora of deep learning techniques and optimization methods are proposed in the past years: light-weight deep learning models, network compression, and efficient neural architecture search. Although some reviews or surveys have partially covered this large body of literature, we lack a systematic and comprehensive review to discuss all aspects of these deep learning techniques which are critical for edge intelligence implementation. As various and diverse methods which are applicable to edge systems are proposed intensively, a holistic review would enable edge computing engineers and community to know the state-of-the-art deep learning techniques which are instrumental for edge intelligence and to facilitate the development of edge intelligence systems. This paper surveys the representative and latest deep learning techniques that are useful for edge intelligence systems, including hand-crafted models, model compression, hardware-aware neural architecture search and adaptive deep learning models. Finally, based on observations and simple experiments we conducted, we discuss some future directions. Submitted/Accepted version
- Published
- 2022
- Full Text
- View/download PDF
13. Reduced Modular Segregation of White Matter Brain Networks in Attention Deficit Hyperactivity Disorder
- Author
-
Wenbo He, Weichen Liu, Min Mao, Xiaohong Cui, Ting Yan, Jie Xiang, Bin Wang, and Dandan Li
- Subjects
Brain Mapping ,Clinical Psychology ,Attention Deficit Disorder with Hyperactivity ,mental disorders ,Developmental and Educational Psychology ,Brain ,Humans ,Magnetic Resonance Imaging ,White Matter ,Algorithms - Abstract
Objective: Despite studies reporting alterations in the brain networks of patients with ADHD, alterations in the modularity of white matter (WM) networks are still unclear. Method: Based on the results of module division by generalized Louvain algorithm, the modularity of ADHD was evaluated. The correlation between the modular changes of ADHD and its clinical characteristics was analyzed. Results: The participation coefficient and the connectivity between modules of ADHD increased, and the modularity coefficient decreased. Provincial hubs of ADHD did not change, and the number of connector hubs increased. All results showed that the modular segregation of WM networks of ADHD decreased. Modules with reduced modular segregation are mainly responsible for language and motor functions. Moreover, modularity showed evident correlation with the symptoms of ADHD. Conclusion: The modularity changes in WM network provided a novel insight into the understanding of brain cognitive alterations in ADHD.
- Published
- 2022
- Full Text
- View/download PDF
14. O-Star: An Optical Switching Architecture Featuring Mode and Wavelength-Division Multiplexing for On-Chip Many-Core Systems
- Author
-
Lei Guo, Hui Chen, Weichen Liu, Weigang Hou, Pengxing Guo, and Xu Zhang
- Subjects
Computer science ,business.industry ,Mode (statistics) ,Topology (electrical circuits) ,Optical switch ,Atomic and Molecular Physics, and Optics ,Computer Science::Hardware Architecture ,Proof of concept ,Wavelength-division multiplexing ,Scalability ,Electronic engineering ,Data center ,business ,Communication channel - Abstract
In this paper, we propose O-Star, a scalable optical switching architecture for on-chip many-core systems, employing hybrid mode and wavelength division multiplexing technology. The O-Star uses the Benes topology as the core switching module to realize non-blocking switching. Besides, we design a wavelength and mode allocation module to enable each processor to transmit data in parallel. To quantitatively analyze the minimum hardware cost required, we establish a mathematical model for the different number of processors. As a proof of concept, a 64-core optical switching architecture featuring 8-port Benes topology, 2-mode, and 4-wavelength channels is simulated with single channel 25 Gbps data rate. The O-Star is flexible to scale, both for the number of supported processors and the switching capacity. O-Star holds promise for realizing large-scale optical switching networks to address the incoming challenges of high-performance computing (HPC) systems and data center networks (DCNs).
- Published
- 2022
- Full Text
- View/download PDF
15. The evolution of regional spatial structure influenced by passenger rail service: A case study of the Yangtze River Delta
- Author
-
Weichen Liu, Jiaying Guo, Wei Wu, and Youhui Cao
- Subjects
Global and Planetary Change - Published
- 2021
- Full Text
- View/download PDF
16. Southern Himalayas rainfall as a key driver of interannual variation of pre-monsoon aerosols over the Tibetan Plateau
- Author
-
Chun Zhao, Weichen Liu, Mingyue Xu, Jiawang Feng, Qiuyan Du, Jun Gu, L. Leung, and William Lau
- Abstract
The Tibetan Plateau (TP) is one of the most climate sensitive regions around the world. Aerosols imported from adjacent regions reach its peak during the pre-monsoon season and play a vital role in the TP environment. However, the strong interannual variation in aerosols transported to the TP has not been fully understood. Here, we show that the interannual variability of pre-monsoon aerosols transported to the TP is influenced more by rainfall over the southern Himalayas than near-surface wind. Rainfall modulates fire events and biomass burning emissions and reduces aerosols over the TP by wet scavenging. Contrary to the role of wind in increasing aerosol transport, the positive correlation between wind and aerosols in the TP reported in previous studies is contributed by the negative interannual correlations between wind and rainfall and between rainfall and fire events over the southern Himalayas. This study highlights the co-variability of wind and rainfall and their confounding impacts on aerosols in the southern Himalayas and over the TP. With pre-monsoon rainfall projected to increase in adjacent regions of southern TP, aerosol transport to the TP may be mitigated in the future.
- Published
- 2022
- Full Text
- View/download PDF
17. MARCO: A High-performance Task <u>M</u> apping <u>a</u> nd <u>R</u> outing <u>Co</u> -optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems
- Author
-
Xiangzhong Luo, Hui Chen, Zihao Zhang, Peng Chen, Shiqing Li, and Weichen Liu
- Subjects
Point-to-point ,Schedule ,Hardware and Architecture ,Computer science ,Distributed computing ,Scalability ,Overhead (computing) ,Symmetric multiprocessor system ,Routing (electronic design automation) ,Performance improvement ,Software ,Tabu search - Abstract
Heterogeneous computing systems (HCSs), which consist of various processing elements (PEs) that vary in their processing ability, are usually facilitated by the network-on-chip (NoC) to interconnect its components. The emerging point-to-point NoCs which support single-cycle-multi-hop transmission, reduce or eliminate the latency dependence on distance, addressing the scalability concern raised by high latency for long-distance transmission and enlarging the design space of the routing algorithm to search the non-shortest paths. For such point-to-point NoC-based HCSs, resource management strategies which are managed by compilers, scheduler, or controllers, e.g., mapping and routing, are complicated for the following reasons: (i) Due to the heterogeneity, mapping and routing need to optimize computation and communication concurrently (for homogeneous computing systems, only communication). (ii) Conducting mapping and routing consecutively cannot minimize the schedule length in most cases since the PEs with high processing ability may locate in the crowded area and suffer from high resource contention overhead. (iii) Since changing the mapping selection of one task will reconstruct the whole routing design space, the exploration of mapping and routing design space is challenging. Therefore, in this work, we propose MARCO, the m apping a nd r outing co -optimization framework, to decrease the schedule length of applications on point-to-point NoC-based HCSs. Specifically, we revise the tabu search to explore the design space and evaluate the quality of mapping and routing. The advanced reinforcement learning (RL)algorithm, i.e., advantage actor-critic, is adopted to efficiently compute paths. We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ∼ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art mapping and routing approaches.
- Published
- 2021
- Full Text
- View/download PDF
18. What drives attitude towards remote palliative care among tumor patients amid the COVID-19 Pandemic ? A survey
- Author
-
Ming Li, Weichen Liu, Siqin Lian, Xijie Hou, Guolian Chen, and Ying Ling
- Abstract
Background: To understand the status of cancer patients’ awareness and demand for remote palliative care service and to provide a reference for promoting the formulation of remote palliative care policy.Methods: A small across-section design involved a sample of 148 cancer adults in the First Affiliated Hospital of Guangxi Medical University, who were conducted by self-designed questionnaire via the Questionnaire Star. The questionnaire contained basic information (14 items) and the attitude towards the remote palliative care (6 items).Results: The rate of patients who supported to take remote palliative care was 41.2%. It related with age, education level, occupation, family income, insurance type, self-care ability, smartphone and communication app using, and the using time(p<0.05). Most of them hoped to get remote palliative care 24 hours. Almost half of the patients would like to be charged by items and they hoped the price of remote palliative care be cheap. Most of the patients didn’t really care about the working years and title of the care provider, but half of them more believed Grade 3, Class A hospital. The top three demand of remote palliative care were pain reducing, nutrition counseling, and nursing instruction.Conclusion: Remote palliative care has a good application prospect for cancer patients, but there are still some challenges, which need further exploration and development, including appropriate infrastructure, reasonable price and sufficient personnel to provide remote palliative care services.
- Published
- 2022
- Full Text
- View/download PDF
19. Nanophotonic reservoir computing for COVID-19 pandemic forecasting
- Author
-
Bocheng Liu, Yiyuan Xie, Weichen Liu, Xiao Jiang, Yichen Ye, Tingting Song, Junxiong Chai, Manying Feng, and Haodong Yuan
- Subjects
Control and Systems Engineering ,Applied Mathematics ,Mechanical Engineering ,Aerospace Engineering ,Ocean Engineering ,Electrical and Electronic Engineering - Abstract
The coronavirus disease 2019 (COVID-19) has spread worldwide in unprecedented speed, and diverse negative impacts have seriously endangered human society. Accurately forecasting the number of COVID-19 cases can help governments and public health organizations develop the right prevention strategies in advance to contain outbreaks. In this work, a long-term 6-month COVID-19 pandemic forecast in second half of 2021 and a short-term 30-day daily ahead COVID-19 forecast in December 2021 are successfully implemented via a novel nanophotonic reservoir computing based on silicon optomechanical oscillators with photonic crystal cavities, benefitting from its simpler learning algorithm, abundant nonlinear characteristics, and some unique advantages such as CMOS compatibility, fabrication cost, and monolithic integration. In essence, the nonlinear time series related to COVID-19 are mapped to the high-dimensional nonlinear space by the optical nonlinear properties of nanophotonic reservoir computing. The testing-dataset forecast results of new cases, new deaths, cumulative cases, and cumulative deaths for six countries demonstrate that the forecasted blue curves are awfully close to the real red curves with exceedingly small forecast errors. Moreover, the forecast results commendably reflect the variations of the actual case data, revealing the different epidemic transmission laws in developed and developing countries. More importantly, the daily ahead forecast results during December 2021 of four kinds of cases for six countries illustrate that the daily forecasted values are highly coincident with the real values, while the relevant forecast errors are tiny enough to verify the good forecasting competence of COVID-19 pandemic dominated by Omicron strain. Therefore, the implemented nanophotonic reservoir computing can provide some foreknowledge on prevention strategy and healthcare management for COVID-19 pandemic.
- Published
- 2022
20. Institutions and Innovation in China
- Author
-
Christopher Marquis, Qian Wang, Mike W. Peng, Yuen Yuen Ang, Meitong Dong, Kenneth Guang-Lih Huang, Nan Jia, Weichen Liu, Dongbo Shi, Yanbo Wang, Bo Yang, and Kevin Zheng Zhou
- Subjects
General Medicine - Published
- 2022
- Full Text
- View/download PDF
21. On the Analysis of Parallel Real-Time Tasks With Spin Locks
- Author
-
He Du, Wang Yi, Weichen Liu, Nan Guan, and Xu Jiang
- Subjects
FOS: Computer and information sciences ,Multi-core processor ,Computer science ,Distributed computing ,Workload ,02 engineering and technology ,Blocking (computing) ,020202 computer hardware & architecture ,Theoretical Computer Science ,Task (computing) ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computational Theory and Mathematics ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Resource management (computing) ,Protocol (object-oriented programming) ,Software - Abstract
Locking protocol is an essential component in resource management of real-time systems, which coordinates mutually exclusive accesses to shared resources from different tasks. Although the design and analysis of locking protocols have been intensively studied for sequential real-time tasks, there has been a little work on this topic for parallel real-time tasks. In this article, we study the analysis of parallel real-time tasks using spin locks to protect accesses to shared resources in three commonly used request serving orders (unordered, FIFO-order, and priority-order). A remarkable feature making our analysis method more accurate is to systematically analyze the blocking time which may delay a task's finishing time, where the impact to the total workload and the longest path length is jointly considered, rather than analyzing them separately and counting all blocking time as the workload that delays a task's finishing time, as commonly assumed in the state-of-the-art.
- Published
- 2021
- Full Text
- View/download PDF
22. Effects of high-speed rail on the spatial agglomeration of producer services: A case study of the Yangtze River Delta urban agglomeration
- Author
-
Weichen Liu, Wei Wu, Xiaoli Li, and Zhaopei Tang
- Subjects
Delta ,Ecology ,Urban agglomeration ,Environmental protection ,Economies of agglomeration ,Geography, Planning and Development ,Earth and Planetary Sciences (miscellaneous) ,Yangtze river ,Environmental science ,Nature and Landscape Conservation - Published
- 2021
- Full Text
- View/download PDF
23. Scope-Aware Useful Cache Block Calculation for Cache-Related Pre-Emption Delay Analysis With Set-Associative Data Caches
- Author
-
Zhiping Jia, Yue Tang, Wei Zhang, Weichen Liu, Nan Guan, and Lei Ju
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Delay analysis ,Static timing analysis ,02 engineering and technology ,Parallel computing ,Calculation technique ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Scheduling (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Cache ,Electrical and Electronic Engineering ,Software ,Associative property - Abstract
Timing analysis of real-time systems must consider cache-related pre-emption delay (CRPD) costs when pre-emptive scheduling is used. While most previous work on CRPD analysis only considers instruction caches, the CRPD incurred on data caches is actually more significant. The state-of-the-art CRPD analysis methods are based on useful cache block (UCB) calculation. Unfortunately, as shown in this article, directly extending the existing UCB calculation techniques from instruction caches to data caches will lead to both unsoundness and significant imprecision. To solve these problems, we develop a new UCB calculation technique for data caches, which redefines the analysis unit (to address the unsoundness in the existing method) and precisely captures the dynamic cache access behavior by taking the temporal scopes of memory blocks into consideration. The experimental results show that our new technique yields substantially tighter CRPD estimations comparing with the state-of-the-art.
- Published
- 2020
- Full Text
- View/download PDF
24. Thermal-Aware Design and Simulation Approach for Optical NoCs
- Author
-
Wenfei Zhang, Yaoyao Ye, and Weichen Liu
- Subjects
Interconnection ,Silicon photonics ,Computer science ,Bandwidth (signal processing) ,Optical power ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Chip ,Computer Graphics and Computer-Aided Design ,Optical switch ,020202 computer hardware & architecture ,Computer Science::Hardware Architecture ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,System on a chip ,Electrical and Electronic Engineering ,Electrical efficiency ,Software - Abstract
For chip multiprocessors, one major challenge is to bridge the increasing speed gap between processor and the global on-chip interconnect delay. By integrating optical interconnects in network-on-chip (NoC) architectures, optical NoCs can overcome the power and bandwidth bottleneck of traditional electrical on-chip networks. However, while considering the thermal sensitivity of silicon photonic devices used in optical NoCs, optical interconnects may not have advantages in power efficiency as compared with their electrical counterparts. To tackle this problem, in this article, we propose a thermal-aware design and simulation approach for optical NoCs. Key techniques include thermal-sensitive optical power loss models from device level to network level, a thermal-aware adaptive routing mechanism, and a thermal-aware simulation platform. The thermal-aware simulation platform enables optical NoC simulation together with on-chip temperature simulation as well as optical thermal effect modeling. With the proposed thermal-aware simulation platform, we conducted a case study of an $8\times 8$ mesh-based optical NoC under a set of synthetic traffic patterns as well as real applications at typical temperature scenarios. By comparing and analyzing different temperature distributions, we can conclude that it can achieves a better optimization effect for the temperature distributions where the hot spots are scattered across the chip.
- Published
- 2020
- Full Text
- View/download PDF
25. Local deformation behaviour of saturated silica sand during undrained cyclic torsional shear tests using image analysis
- Author
-
Chuang Zhao, Weichen Liu, and Junichi Koseki
- Subjects
021110 strategic, defence & security studies ,Materials science ,Shear (geology) ,0211 other engineering and technologies ,Earth and Planetary Sciences (miscellaneous) ,Liquefaction ,Geotechnical engineering ,02 engineering and technology ,Geotechnical Engineering and Engineering Geology ,021101 geological & geomatics engineering - Abstract
A series of undrained cyclic torsional shear tests was conducted to investigate the development of local deformations of silica sand specimens directly using an image-based technique and a transparent membrane. The results for saturated sand specimens with varying densities showed that the vertical slippage of sand particles relative to the membrane was initiated at the liquefaction stage. Additionally, the quantity of vertical slippage of dense specimens was lower than that of loose specimens tested under the same conditions. Moreover, measured on the surface of the specimen, the large accumulated movement during cyclic shearing significantly increased the potential for inducing vertical slippage. A comparison of local deformation among the different tests revealed that the threshold of shear strain to generate non-uniform local deformations decreased with an increase in the relative density of the specimen. A lower liquefaction resistance layer, which formed at the upper part of the specimen after initial liquefaction, would accelerate the concentration of local shear strain; this was especially distinct when the specimen was completely liquefied.
- Published
- 2020
- Full Text
- View/download PDF
26. A procedure to reach high liquefaction resistance in laboratory testing on reconstituted sand specimens using hollow cylindrical torsional shear apparatus
- Author
-
Weichen Liu and Junichi Koseki
- Subjects
Shear (sheet metal) ,Materials science ,Geotechnical engineering ,Laboratory testing ,Liquefaction resistance - Published
- 2020
- Full Text
- View/download PDF
27. Fault-Tolerant Routing Mechanism in 3D Optical Network-on-Chip Based on Node Reuse
- Author
-
Lei Guo, Sun Wei, Pengxing Guo, Luan H. K. Duong, Hainan Bao, Weigang Hou, Weichen Liu, Chuang Liu, and School of Computer Science and Engineering
- Subjects
Interconnection ,business.industry ,Network packet ,Computer science ,Fault-Tolerant Routing Mechanism ,Throughput ,Fault tolerance ,Energy consumption ,Reuse ,Network on a chip ,Computational Theory and Mathematics ,Hardware and Architecture ,3D Optical Network-On-Chip ,Signal Processing ,Redundancy (engineering) ,Computer science and engineering [Engineering] ,Photonics ,business ,Computer network - Abstract
The three-dimensional Network-on-Chips (3D NoCs) has become a mature multi-core interconnection architecture in recent years. However, the traditional electrical lines have very limited bandwidth and high energy consumption, making the photonic interconnection promising for future 3D Optical NoCs (ONoCs). Since existing solutions cannot well guarantee the fault-tolerant ability of 3D ONoCs, in this paper, we propose a reliable optical router (OR) structure which sacrifices less redundancy to obtain more restore paths. Moreover, by using our fault-tolerant routing algorithm, the restore path can be found inside the disabled OR under the deadlock-free condition, i.e., fault-node reuse. Experimental results show that the proposed approach outperforms the previous related works by maximum 81.1 percent and 33.0 percent on average for throughput performance under different synthetic and real traffic patterns. It can improve the system average optical signal to noise ratio (OSNR) performance by maximum 26.92 percent and 12.57 percent on average, and it can improve the average energy consumption performance by 0.3 percent to 15.2 percent under different topology types/sizes, failure rates, OR structures, and payload packet sizes.
- Published
- 2020
- Full Text
- View/download PDF
28. The Virtual-Augmented Reality Simulator: Evaluating OST-HMD AR calibration algorithms in VR
- Author
-
Danilo Gasques, Weichen Liu, and Nadir Weibel
- Published
- 2022
- Full Text
- View/download PDF
29. Organization of river-sea container transportation in the Yangtze River: Processes and mechanisms
- Author
-
Weichen Liu, Youhui Cao, Jianglong Chen, Jiaying Guo, and Shuangbo Liang
- Subjects
Geography, Planning and Development ,Transportation ,General Environmental Science - Published
- 2023
- Full Text
- View/download PDF
30. Aggravated chemical production of aerosols by regional transport and basin terrain in a heavy PM2.5 pollution episode over central China
- Author
-
Weiyang Hu, Yu Zhao, Tianliang Zhao, Yongqing Bai, Chun Zhao, Shaofei Kong, Lei Chen, Qiuyan Du, Huang Zheng, Wen Lu, Weichen Liu, and Xiaoyun Sun
- Subjects
Atmospheric Science ,General Environmental Science - Published
- 2023
- Full Text
- View/download PDF
31. A Cryptocurrency Price Prediction Model Based on Twitter Sentiment Indicators
- Author
-
Zi Ye, Weichen Liu, Qiang Qu, Qingshan Jiang, and Yi Pan
- Published
- 2022
- Full Text
- View/download PDF
32. Can the Brain Drain of Elite Scientists be Reversed? Evidence from China’s Young Thousand Talents Program
- Author
-
Dongbo Shi, Weichen Liu, and Yanbo Wang
- Published
- 2022
- Full Text
- View/download PDF
33. Nanophotonic Reservoir Computing for COVID-19 Pandemic Forecasting
- Author
-
Bocheng Liu, Yiyuan Xie, Weichen Liu, Xiao Jiang, Yichen Ye, Tingting Song, Junxiong Chai, Qianfeng Tang, Manying Feng, and Haodong Yuan
- Subjects
History ,Polymers and Plastics ,Business and International Management ,Industrial and Manufacturing Engineering - Published
- 2022
- Full Text
- View/download PDF
34. A DirectX-Based DICOM Viewer for Multi-user Surgical Planning in Augmented Reality
- Author
-
Menghe Zhang, Weichen Liu, Nadir Weibel, and Jürgen P. Schulze
- Published
- 2022
- Full Text
- View/download PDF
35. ZeroBN: Learning Compact Neural Networks For Latency-Critical Edge Systems
- Author
-
Lei Zhang, Ravi Subramaniam, Shuo Huai, Weichen Liu, Di Liu, School of Computer Science and Engineering, 2021 58th ACM/IEEE Design Automation Conference (DAC), and HP-NTU Digital Manufacturing Corporate Lab
- Subjects
Artificial neural network ,Edge device ,Computer science ,business.industry ,Computer science and engineering::Computing methodologies::Artificial intelligence [Engineering] ,Deep learning ,Process (computing) ,Compact Learning ,ZeroBN ,Constraint (information theory) ,Computer engineering ,Electronic design automation ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,Latency (engineering) ,business - Abstract
Edge devices have been widely adopted to bring deep learning applications onto low power embedded systems, mitigating the privacy and latency issues of accessing cloud servers. The increasingly computational demand of complex neural network models leads to large latency on edge devices with limited resources. Many application scenarios are real-time and have a strict latency constraint, while conventional neural network compression methods are not latency-oriented. In this work, we propose a novel compact neural networks training method to reduce the model latency on latency-critical edge systems. A latency predictor is also introduced to guide and optimize this procedure. Coupled with the latency predictor, our method can guarantee the latency for a compact model by only one training process. The experiment results show that, compared to state-of-the-art model compression methods, our approach can well-fit the 'hard' latency constraint by significantly reducing the latency with a mild accuracy drop. To satisfy a 34ms latency constraint, we compact ResNet-50 with 0.82% of accuracy drop. And for GoogLeNet, we can even increase the accuracy by 0.3% National Research Foundation (NRF) Submitted/Accepted version This research was conducted in collaboration with HP Inc. and supported by National Research Foundation (NRF) Singapore and the Singapore Government through the Industry Alignment Fund - Industry Collaboration Projects Grant (I1801E0028).
- Published
- 2021
- Full Text
- View/download PDF
36. Efficient FPGA-based Sparse Matrix-Vector Multiplication with Data Reuse-aware Compression
- Author
-
Shiqing Li, Di Liu, Weichen Liu, and School of Computer Science and Engineering
- Subjects
Computer science and engineering::Hardware [Engineering] ,SpMV ,Computer science and engineering [Engineering] ,Data Reuse ,Electrical and Electronic Engineering ,Throughput ,Computer Graphics and Computer-Aided Design ,FPGA ,Software - Abstract
Sparse matrix-vector multiplication (SpMV) on FPGAs has gained much attention. The performance of SpMV is mainly determined by the number of multiplications between non-zero matrix elements and the corresponding vector values per cycle. On the one side, the off-chip memory bandwidth limits the number of non-zero matrix elements transferred from the off-chip DDR to the FPGA chip per cycle. On the other side, the irregular vector access pattern poses challenges to fetch the corresponding vector values. Besides, the read-after-write (RAW) dependency in the accumulation process shall be solved to enable a fully pipelined design. In this work, we propose an efficient FPGA-based sparse matrix-vector multiplication accelerator with data reuse-aware compression. The key observation is that repeated accesses to a vector value can be omitted by reusing the fetched data. Based on the observation, we propose a reordering algorithm to manually exploit the data reuse of fetched vector values. Further, we propose a novel compressed format called data reuse-aware compressed (DRC) to take full advantage of the data reuse and a fast format conversion algorithm to shorten the preprocessing time. Meanwhile, we propose an HLSfriendly accumulator to solve the RAW dependency. Finally, we implement and evaluate our proposed design on the Xilinx Zynq-UltraScale ZCU106 platform with a set of sparse matrices from the SuiteSparse matrix collection. Our proposed design achieves an average 1.18x performance speedup without the DRC format and an average 1.57x performance speedup with the DRC format w.r.t. the state-of-the-art work respectively. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP (M4082282/ 04INS000515C130).
- Published
- 2023
- Full Text
- View/download PDF
37. An Efficient Gustavson-based Sparse Matrix-matrix Multiplication Accelerator on Embedded FPGAs
- Author
-
Shiqing Li, Shuo Huai, Weichen Liu, and School of Computer Science and Engineering
- Subjects
Gustavson ,Computer science and engineering::Hardware [Engineering] ,Dataflow ,Computer science and engineering [Engineering] ,Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,FPGA ,Software ,SpMM - Abstract
Sparse-matrix sparse-matrix multiplication (SpMM) is an important kernel in multiple areas, e.g., data analytics and machine learning. Due to the low on-chip memory requirement, the consistent data format, and the simplified control logic, the Gustavson’s algorithm is a promising backbone algorithm for SpMM on hardware accelerators. However, the off-chip memory traffic still limits the performance of the algorithm, especially on embedded FPGAs. Previous researchers optimize the Gustavson’s algorithm targeting high bandwidth memory-based architectures and their solutions cannot be directly applied to embedded FPGAs with traditional DDRs. In this work, we propose an efficient Gustavson-based sparse matrix-matrix multiplication accelerator on embedded FPGAs. The proposed design fully considers the feature of off-chip memory access on embedded FPGAs and the dataflow of the Gustavson’s algorithm. At first, we analyze the parallelism of the algorithm and propose to perform the algorithm with element-wise parallelism, which reduces the idle time of processing elements caused by synchronization. Further, we show a counter-intuitive example that the traditional cache leads to worse performance. Then, we propose a novel access pattern-aware cache scheme called SpCache, which provides quick responses to reduce bank conflicts caused by irregular memory accesses and combines streaming and caching to handle requests that access ordered elements of unpredictable length. Moreover, we propose to perform the merge on part of partial results, which removes some redundant merges in the naive implementation and has little postprocessing overhead. Finally, we conduct experiments on the Xilinx Zynq-UltraScale ZCU106 platform with a set of benchmarks from the SuiteSparse matrix collection. The experimental results show that the proposed design achieves an average 1.75x performance speedup compared to the baseline. Ministry of Education (MOE) Nanyang Technological University Submitted/Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071), and Nanyang Technological University, Singapore, under its NAP (M4082282/04INS000515C130).
- Published
- 2023
- Full Text
- View/download PDF
38. EdgeCompress: Coupling Multi-Dimensional Model Compression and Dynamic Inference for EdgeAI
- Author
-
Hao Kong, Di Liu, Shuo Huai, Xiangzhong Luo, Ravi Subramaniam, Christian Makaya, Qian Lin, and Weichen Liu
- Subjects
Electrical and Electronic Engineering ,Computer Graphics and Computer-Aided Design ,Software - Published
- 2023
- Full Text
- View/download PDF
39. Face Generation using DCGAN for Low Computing Resources
- Author
-
Weichen Liu, Yuxuan Gu, and Kenan Zhang
- Published
- 2021
- Full Text
- View/download PDF
40. Real-Time Scheduling of DAG Tasks with Arbitrary Deadlines
- Author
-
Qingxu Deng, Nan Guan, Kankan Wang, Xu Jiang, Di Liu, and Weichen Liu
- Subjects
Earliest deadline first scheduling ,020203 distributed computing ,business.industry ,Computer science ,Computation ,02 engineering and technology ,Parallel computing ,Computer Graphics and Computer-Aided Design ,020202 computer hardware & architecture ,Computer Science Applications ,Scheduling (computing) ,Software ,Single task ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,business - Abstract
Real-time and embedded systems are shifting from single-core to multi-core processors, on which the software must be parallelized to fully utilize the computation capacity of the hardware. Recently, much work has been done on real-time scheduling of parallel tasks modeled as directed acyclic graphs (DAG). However, most of these studies assume tasks to have implicit or constrained deadlines. Much less work considered the general case of arbitrary deadlines (i.e., the relative deadline is allowed to be larger than the period), which is more difficult to analyze due to intra-task interference among jobs. In this article, we study the analysis of Global Earliest Deadline First (GEDF) scheduling for DAG parallel tasks with arbitrary deadlines. We develop new analysis techniques for GEDF scheduling of a single DAG task and this new analysis techniques can guarantee a better capacity augmentation bound 2.41 (the best known result is 2.5) in the case of a single task. Furthermore, the proposed analysis techniques are also extended to the case of multiple DAG tasks under GEDF and federated scheduling. Finally, through empirical evaluation, we justify the out-performance of our schedulability tests compared to the state-of-the-art in general.
- Published
- 2019
- Full Text
- View/download PDF
41. Timing-Anomaly Free Dynamic Scheduling of Conditional DAG Tasks on Multi-Core Systems
- Author
-
Qingqiang He, Nan Guan, Xu Jiang, Weichen Liu, Peng Chen, and School of Computer Science and Engineering
- Subjects
Multi-core processor ,Schedule ,Theoretical computer science ,Computer science ,0206 medical engineering ,Response time ,Timing Anomaly ,02 engineering and technology ,Dynamic priority scheduling ,020601 biomedical engineering ,Upper and lower bounds ,020202 computer hardware & architecture ,Scheduling (computing) ,Hardware and Architecture ,Dynamic Scheduling ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Computer science and engineering [Engineering] ,Time complexity ,Software - Abstract
In this paper, we propose a novel approach to schedule conditional DAG parallel tasks, with which we can derive safe response time upper bounds significantly better than the state-of-the-art counterparts. The main idea is to eliminate the notorious timing anomaly in scheduling parallel tasks by enforcing certain order constraints among the vertices, and thus the response time bound can be accurately predicted off-line by somehow “simulating” the runtime scheduling. A key challenge to apply the timing-anomaly free scheduling approach to conditional DAG parallel tasks is that at runtime it may generate exponentially many instances from a conditional DAG structure. To deal with this problem, we develop effective abstractions, based on which a safe response time upper bound is computed in polynomial time. We also develop algorithms to explore the vertex orders to shorten the response time bound. The effectiveness of the proposed approach is evaluated by experiments with randomly generated DAG tasks with different parameter configurations. Accepted version This work is supported by the Research Grants Council of Hong Kong (GRF 15204917 and 15213818) and National Natrual Science Foundation of China (Grant No. 61672140), and Nanyang Assistant Professorship (NAP) M4082282 and Start-Up Grant (SUG) M4082087 from Nanyang Technological University, Singapore.
- Published
- 2019
- Full Text
- View/download PDF
42. NV-eCryptfs: Accelerating Enterprise-Level Cryptographic File System with Non-Volatile Memory
- Author
-
Lei Zhang, Pengda Li, Linfeng Cheng, Yanyue Pan, Neil W. Bergmann, Weichen Liu, Chunhua Xiao, and School of Computer Science and Engineering
- Subjects
File system ,Address space ,business.industry ,Computer science ,ext4 ,Non-volatile Memory ,Cloud computing ,Cryptography ,02 engineering and technology ,computer.software_genre ,Encryption ,020202 computer hardware & architecture ,Theoretical Computer Science ,eCryptfs ,Computational Theory and Mathematics ,Hardware and Architecture ,Backup ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,Computer science and engineering [Engineering] ,Hardware acceleration ,business ,computer ,Software ,Block (data storage) - Abstract
The development of cloud computing and big data results in a large amount of data transmitting and storing. In order to protect sensitive data from leakage and unauthorized access, many cryptographic file systems are proposed to transparently encrypt file contents before storing them on storage devices, such as eCryptfs. However, the time-consuming encryption operations cause serious performance degradation. We found that compared with non-crypto file system EXT4, the performance slowdown could be up to 58.53 and 86.89 percent respectively for read and write with eCryptfs. Although prior work has proposed techniques to improve the efficiency of cryptographic file system through computation acceleration, no solution focused on the inefficiency working flow, which is demonstrated to be a major factor affecting system performance. To address this open problem, we present NV-eCryptfs, an asynchronous software stack for eCryptfs, which utilizes NVM as a fast storage tier on top of slower block devices to fully parallelize encryption and data I/O. We design an efficient NVM management scheme to support the fast parallel cryptographic operations. Besides providing an address space that can be directly accessed by the hardware accelerators, our designed mechanism is able to record the memory allocation states, and supplies a backup plan to deal with the situation of NVM shortage. The additional index structure is built to accelerate lookup operations to determine if a given data block resides in NVM. Moreover, we integrate an adaptive scheduling in NV-eCryptfs to process I/O requests dynamically according to access pattern and request size, which is able to take full utilization of both software and hardware acceleration to boost crypto performance. Our evaluation shows the proposed NV-eCryptfs outperforms the original eCryptfs with software routine 23.41× and 5.82× respectively for read and write. This work is supported by National Natural Science Foundation of China: No.61502061, Chongqing application foundation and research in cutting-edge technologies: No. cstc2015jcyjA40016, the fundamental research funds for the central universities: 106112017CDJXY180004, and also the financial support from the program of China Scholarships Council No.201706055029.
- Published
- 2019
- Full Text
- View/download PDF
43. Optimal Application Mapping and Scheduling for Network-on-Chips with Computation in STT-RAM Based Router
- Author
-
Lei Yang, Nan Guan, Weichen Liu, and Nikil Dutt
- Subjects
Router ,Random access memory ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer science ,02 engineering and technology ,Energy consumption ,020202 computer hardware & architecture ,Theoretical Computer Science ,Scheduling (computing) ,Non-volatile memory ,Computational Theory and Mathematics ,Hardware and Architecture ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,System on a chip ,Static random-access memory ,business ,Standby power ,Software ,Efficient energy use - Abstract
Spin-Torque Transfer Magnetic RAM (STT-RAM), one of the emerging nonvolatile memory (NVM) technologies explored as the replacement for SRAM memory architectures, is particularly promising due to the fast access speed, high integration density, and zero standby power consumption. Recently, hybrid deigns with SRAM and STT-RAM buffers for routers in Network-on-Chip (NoC) systems have been widely implemented to maximize the mutually complementary characteristics of different memory technologies, and leverage the efficiency of intra-router latency and system power consumption. With the realization of Processing-in-Memory enabled by STT-RAM, in this paper, we novelly offload the execution from processors to the STT-RAM based on-chip routers to improve the application performance. On top of the hybrid buffer design in routers, we further present system-level approaches, including an ILP model and polynomial-time heuristic algorithms, to fine-tune the application mapping and scheduling on NoCs, with the objectives of improving system performance-energy efficiency. Network overhead caused by flit conflict in conventional communication circumstances can be ideally avoided by computing the contended flits in intermediate routers; meanwhile, the pressure of heavy workload on processors can be relieved by transferring partial operations to routers, such that network latency and system power consumption can be significantly reduced. Experimental results demonstrate that application schedule length and system energy consumption can be reduced by 35.62, 32.87 percent on average, respectively, in extensive evaluation experiments on PARSEC benchmark applications. In particular, the achievements of application performance and energy efficiency, averagely 36.44 and 33.19 percent, for the CNN application AlexNet have verified the practicability and effectiveness of our presented approaches.
- Published
- 2019
- Full Text
- View/download PDF
44. Energy-efficient crypto acceleration with HW/SW co-design for HTTPS
- Author
-
Neil W. Bergmann, Chunhua Xiao, Xie Yuhua, Lei Zhang, Weichen Liu, and School of Computer Science and Engineering
- Subjects
Web server ,Energy Efficiency ,Computer Networks and Communications ,Computer science ,business.industry ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Encryption ,Instruction set ,Secure communication ,Hardware and Architecture ,Embedded system ,Cipher suite ,0202 electrical engineering, electronic engineering, information engineering ,Computer science and engineering [Engineering] ,Hardware acceleration ,020201 artificial intelligence & image processing ,HW/SW Co-design ,business ,computer ,Software ,Efficient energy use - Abstract
Entering the Big Data era leads to the rapid development of web applications which provide high-performance sensitive access on large cloud data centers. HTTPS has been widely deployed as an extension of HTTP by adding an encryption layer of SSL/TLS protocol for secure communication over the Internet. To accelerate the complex crypto computation, specific acceleration instruction set and hardware accelerator are adopted. However, energy consumption has been ignored in the rush for performance. Actually, energy efficiency has become a challenge with the increasing demands for performance and energy saving in data centers. In this paper, we present the EECA, an Energy-Efficient Crypto Acceleration system for HTTPS with OpenSSL. It provides high energy-efficient encryption through HW/SW co-design. The essential idea is to make full use of system resource to exert the superiorities of different crypto acceleration approaches for an energy-efficient design. Experimental results show that, if only do crypto computations with typical encryption algorithm AES-256-CBC, the proposed EECA could get up to 1637.13%, 84.82%, and 966.23% PPW (Performance per Watt) improvement comparing with original software encryption, instruction set acceleration and hardware accelerator, respectively. If considering the whole working flow for end-to-end secure HTTPS based on OpenSSL with cipher suite ECDHE-RSA-AES256-SHA384, EECA could also improve the energy efficiency by up to 422.26%, 40.14% and 96.05% comparing with the original Web server using software, instruction set and hardware accelerators, respectively.
- Published
- 2019
- Full Text
- View/download PDF
45. A Branch-and-Bound-Based Crossover Operator for the Traveling Salesman Problem
- Author
-
Qi Qi, Yan Jiang, Weichen Liu, and Thomas Weise
- Subjects
Mathematical optimization ,education.field_of_study ,021103 operations research ,Branch and bound ,Computer science ,Crossover ,Population ,0211 other engineering and technologies ,Evolutionary algorithm ,02 engineering and technology ,Travelling salesman problem ,Human-Computer Interaction ,Set (abstract data type) ,Operator (computer programming) ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,020201 artificial intelligence & image processing ,education ,Software - Abstract
In this article, the new crossover operator BBX for Evolutionary Algorithms (EAs) for traveling salesman problems (TSPs) is introduced. It uses branch-and-bound to find the optimal combination of the (directed) edges present in the parent solutions. The offspring solutions created are at least as good as their parents and are only composed of parental building blocks. The operator is closer to the ideal concept of crossover in EAs than existing operators. This article provides the most extensive study on crossover operators on the TSP, comparing BBX to ten other operators on the 110 instances of the TSPLib benchmark set in EAs with four different population sizes. BBX, with its better ability to reuse and combine building blocks, surprisingly does not generally outperform the other operators. However, it performs well in certain scenarios. Besides presenting a novel approach to crossover on the TSP, the study significantly extends and refines the body of knowledge on the field with new conclusions and comparison results.
- Published
- 2019
- Full Text
- View/download PDF
46. Implementation issues in optimization algorithms: do they matter?
- Author
-
Yuezhong Wu, Weichen Liu, Raymond Chiong, and Thomas Weise
- Subjects
021103 operations research ,Theoretical computer science ,Optimization algorithm ,Computer science ,0211 other engineering and technologies ,Lin–Kernighan heuristic ,02 engineering and technology ,Travelling salesman problem ,Theoretical Computer Science ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Implementation ,Software - Abstract
Two factors that have a major impact on the performance of an optimization method are (1) formal algorithm specifications and (2) practical implementations. The impact of the latter is typically ig...
- Published
- 2019
- Full Text
- View/download PDF
47. Evolutionary game-based incentive models for sustainable trust enhancement in a blockchained shared manufacturing network
- Author
-
Fuqiang Zhang, Lei Wu, Weichen Liu, Kai Ding, Jizhuang Hui, Jiewu Leng, and Xueliang Zhou
- Subjects
Artificial Intelligence ,Building and Construction ,Information Systems - Published
- 2022
- Full Text
- View/download PDF
48. Using Virtual Reality to Induce and Assess Objective Correlates of Nicotine Craving: Paradigm Development Study
- Author
-
Weichen Liu, Gianna Andrade, Jurgen Schulze, Neal Doran, and Kelly E Courtney
- Subjects
eye-tracking ,Tobacco Smoke and Health ,genetic structures ,craving ,pupillometry ,Rehabilitation ,Biomedical Engineering ,Physical Therapy, Sports Therapy and Rehabilitation ,attentional bias ,smoking ,Brain Disorders ,Computer Science Applications ,Substance Misuse ,Psychiatry and Mental health ,Good Health and Well Being ,Clinical Research ,cue-exposure ,Tobacco ,virtual reality ,addiction ,Drug Abuse (NIDA only) ,development ,nicotine - Abstract
Background Craving is a clinically important phenotype for the development and maintenance of nicotine addiction. Virtual reality (VR) paradigms are successful in eliciting cue-induced subjective craving and may even elicit stronger craving than traditional picture-cue methods. However, few studies have leveraged the advances of this technology to improve the assessment of craving. Objective This report details the development of a novel, translatable VR paradigm designed to both elicit nicotine craving and assess multiple eye-related characteristics as potential objective correlates of craving. Methods A VR paradigm was developed, which includes three Active scenes with nicotine and tobacco product (NTP) cues present, and three Neutral scenes devoid of NTP cues. A pilot sample (N=31) of NTP users underwent the paradigm and completed subjective measures of nicotine craving, sense of presence in the VR paradigm, and VR-related sickness. Eye-gaze fixation time (“attentional bias”) and pupil diameter toward Active versus Neutral cues, as well as spontaneous blink rate during the Active and Neutral scenes, were recorded. Results The NTP Cue VR paradigm was found to elicit a moderate sense of presence (mean Igroup Presence Questionnaire score 60.05, SD 9.66) and low VR-related sickness (mean Virtual Reality Sickness Questionnaire score 16.25, SD 13.94). Scene-specific effects on attentional bias and pupil diameter were observed, with two of the three Active scenes eliciting greater NTP versus control cue attentional bias and pupil diameter (Cohen d=0.30-0.92). The spontaneous blink rate metrics did not differ across Active and Neutral scenes. Conclusions This report outlines the development of the NTP Cue VR paradigm. Our results support the potential of this paradigm as an effective laboratory-based cue-exposure task and provide early evidence of the utility of attentional bias and pupillometry, as measured during VR, as useful markers for nicotine addiction.
- Published
- 2021
49. Using Virtual Reality to Induce and Assess Objective Correlates of Nicotine Craving: Paradigm Development Study (Preprint)
- Author
-
Weichen Liu, Gianna Andrade, Jurgen Schulze, Neal Doran, and Kelly E Courtney
- Subjects
genetic structures - Abstract
BACKGROUND Craving is a clinically important phenotype for the development and maintenance of nicotine addiction. Virtual reality (VR) paradigms are successful in eliciting cue-induced subjective craving and may even elicit stronger craving than traditional picture-cue methods. However, few studies have leveraged the advances of this technology to improve the assessment of craving. OBJECTIVE This report details the development of a novel, translatable VR paradigm designed to both elicit nicotine craving and assess multiple eye-related characteristics as potential objective correlates of craving. METHODS A VR paradigm was developed, which includes three Active scenes with nicotine and tobacco product (NTP) cues present, and three Neutral scenes devoid of NTP cues. A pilot sample (N=31) of NTP users underwent the paradigm and completed subjective measures of nicotine craving, sense of presence in the VR paradigm, and VR-related sickness. Eye-gaze fixation time (“attentional bias”) and pupil diameter toward Active versus Neutral cues, as well as spontaneous blink rate during the Active and Neutral scenes, were recorded. RESULTS The NTP Cue VR paradigm was found to elicit a moderate sense of presence (mean Igroup Presence Questionnaire score 60.05, SD 9.66) and low VR-related sickness (mean Virtual Reality Sickness Questionnaire score 16.25, SD 13.94). Scene-specific effects on attentional bias and pupil diameter were observed, with two of the three Active scenes eliciting greater NTP versus control cue attentional bias and pupil diameter (Cohen d=0.30-0.92). The spontaneous blink rate metrics did not differ across Active and Neutral scenes. CONCLUSIONS This report outlines the development of the NTP Cue VR paradigm. Our results support the potential of this paradigm as an effective laboratory-based cue-exposure task and provide early evidence of the utility of attentional bias and pupillometry, as measured during VR, as useful markers for nicotine addiction.
- Published
- 2021
- Full Text
- View/download PDF
50. The effect of public health awareness and behaviors on the transmission dynamics of syphilis in Northwest China, 2006-2018, based on a multiple-stages mathematical model
- Author
-
Yu Zhao, Weichen Liu, Wenjun Jing, and Ning Ma
- Subjects
medicine.medical_specialty ,Syphilis model ,Primary Syphilis ,Epidemiology of syphilis ,Infectious and parasitic diseases ,RC109-216 ,37H10 ,92B05 ,Control strategy ,Environmental health ,medicine ,Transmission (medicine) ,Applied Mathematics ,Health Policy ,Public health ,medicine.disease ,Basic reproduction number ,Infectious Diseases ,34F05 ,Data fitting ,Health education ,Syphilis ,Psychology ,Epidemic model ,Sensitivity analysis ,60J70 ,Research Paper - Abstract
Syphilis, a sexually transmitted infectious disease caused by the bacterium treponema pallidum, has re-emerged as a global public health issue with an estimated 12 million people infected each year. Understanding the impacts of health awareness and behaviors on transmission dynamics of syphilis can help to establish optimal control strategy in different regions. In this paper, we develop a multiple-stage SIRS epidemic model taking into account the public health awareness and behaviors of syphilis. First, the basic reproduction number R 0 is obtained, which determines the global dynamics behaviors of the model. We derive the necessary conditions for implementing optimal control and the corresponding optimal solution for mitigation syphilis by using Pontryagin's Maximum Principle. Based on the data of syphilis in Ningxia from 2006 to 2018, the parameterizations and model calibration are carried out. The fitting results are in good agreement with the data. Moreover, sensitivity analysis shows that the public awareness induced protective behaviors Ce, compliance of condom-induced preventability e and treatment for the primary syphilis m1 play an important role in mitigating the risk of syphilis outbreaks. These results can help us gain insights into the epidemiology of syphilis and provide guidance for the public health authorities to implement health education programs.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.