2,729 results
Search Results
2. Parallel border tracking in binary images for multicore computers
- Author
-
Victor M. Garcia-Molla and Pedro Alonso-Jordá
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Border tracking in binary images is an important operation in many computer vision applications. The problem consists in finding borders in a 2D binary image (where all of the pixels are either 0 or 1). There are several algorithms available for this problem, but most of them are sequential. In a former paper, a parallel border tracking algorithm was proposed. This algorithm was designed to run in Graphics Processing units, and it was based on the sequential algorithm known as the Suzuki algorithm. In this paper, we adapt the previously proposed GPU algorithm so that it can be executed in multicore computers. The resulting algorithm is evaluated against its GPU counterpart. The results show that the performance of the GPU algorithm worsens (or even fails) for very large images or images with many borders. On the other hand, the proposed multicore algorithm can efficiently cope with large images.
- Published
- 2023
3. Performance evaluation of multi-exaflops machines using Equality network topology
- Author
-
Chi-Hsiu Liang, Chun-Ho Cheng, Hong-Lin Wu, Chao-Chin Li, Po-Lin Huang, and Chi-Chuan Hwang
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
In modern computing architectures, graph theory is the soul of the play due to the rising core counts. It is indispensable to keep finding a better way to connect the cores. A novel chordal-ring interconnect topology system, Equality, is revisited in this paper to compare with a few previous works. This paper details the procedures for constructing the Equality interconnects, its special routing procedures, the strategies for selecting a configuration, and evaluating its performance using the open-source cycle-accurate BookSim package. Four scenarios representing small- to large-scale computing facilities are presented to assess the network performance. This work shows that in 16,384-endpoint systems, the Equality network turns out to be the most efficient system. The results also show the steady scalability of Equality networks extending to 48–320K, and a million endpoints. Equality networks are adjustable to fit with commodity hardware and resilient under ten common traffic models. It is suggested that Equality network topology can be used in constructing efficient multi-exaflops supercomputers and data centers.
- Published
- 2022
4. Real-time monitoring solution with vibration analysis for industry 4.0 ventilation systems
- Author
-
Rubén Muñiz, Fernando Nuño, Juan Díaz, María González, Miguel J. Prieto, and Óliver Menéndez
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Predictive maintenance has revealed as one of the paradigms of Industry 4.0. This paper addresses a complete system for the acquisition, computing, monitoring and communication of ventilation equipment in underground tunnels based on TCP/IP protocol and accessible via WEB services. Not only does the proposed system collect different sensor data (temperatures, vibrations, pressures, tilt angles or rotational speed), it performs local data processing as well. This feature is the newest and most important of all those provided by the system design, and there is no equipment that offers a similar performance in current ventilation systems. This paper shows the design and implementation of the equipment (system architecture and processing), as well as the experimental results obtained.
- Published
- 2022
5. Evaluation of e-learners’ concentration using recurrent neural networks
- Author
-
Young-Sang Jeong and Nam-Wook Cho
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Recently, interest in e-learning has increased rapidly owing to the lockdowns imposed by COVID-19. A major disadvantage of e-learning is the difficulty in maintaining concentration because of the limited interaction between teachers and students. The objective of this paper is to develop a methodology to predict e-learners' concentration by applying recurrent neural network models to eye gaze and facial landmark data extracted from e-learners' video data. One hundred eighty-four video data of ninety-two e-learners were obtained, and their frame data were extracted using the OpenFace 2.0 toolkit. Recurrent neural networks, long short-term memory, and gated recurrent units were utilized to predict the concentration of e-learners. A set of comparative experiments was conducted. As a result, gated recurrent units exhibited the best performance. The main contribution of this paper is to present a methodology to predict e-learners' concentration in a natural e-learning environment.
- Published
- 2022
6. Low-latency and High-Reliability FBMC Modulation scheme using Optimized Filter design for enabling NextG Real-time Smart Healthcare Applications
- Author
-
Abhinav Adarsh, Shashwat Pathak, Digvijay Singh Chauhan, and Basant Kumar
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
This paper presents a prototype filter design using the orthant optimization technique to assist a filter bank multicarrier (FBMC) modulation scheme of a NextG smart e-healthcare network framework. Low latency and very high reliability are one of the main requirements of a real-time e-healthcare system. In recent times, FBMC modulation has gotten more attention due to its spectral efficiency. The characteristics of a filter bank are determined by t's, prototype filter. A prototype filter cannot be designed to achieve an arbitrary time localization (for low latency) and frequency localization (spectral efficiency), as time and frequency spreading are conflicting goals. Hence, an optimum design needed to be achieved. In this paper, a constraint for perfect or nearly perfect reconstruction is formulated for prototype filter design and an orthant-based enriched sparse ℓ1-optimization method is applied to achieve the optimum performance in terms of higher availability of subcarrier spacing for the given requirement of signal-to-interference ratio. Larger subcarrier spacing ensures lower latency and better performance in real-time applications. The proposed FBMC system, based on an optimum design of the prototype filter, also supports a higher data rate as compared to traditional FBMC and OFDM systems, which is another requirement of real-time communication. In this paper, the solution for the different technical issues of physical layer design is provided. The presented modulation scheme through the proposed prototype filter-based FBMC can suppress the side lobe energy of the constituted filters up to large extent without compromising the recovery of the signal at the receiver end. The proposed system provides very high spectral efficiency; it can sacrifice large guard band frequencies to increase the subcarrier spacing to provide low-latency communication to support the real-time e-healthcare network.
- Published
- 2022
7. Anti-aliasing convolution neural network of finger vein recognition for virtual reality (VR) human–robot equipment of metaverse
- Author
-
Nghi C. Tran, Jian‑Hong Wang, Toan H. Vu, Tzu-Chiang Tai, and Jia-Ching Wang
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Metaverse, which is anticipated to be the future of the internet, is a 3D virtual world in which users interact via highly customizable computer avatars. It is considerably promising for several industries, including gaming, education, and business. However, it still has drawbacks, particularly in the privacy and identity threads. When a person joins the metaverse via a virtual reality (VR) human-robot equipment, their avatar, digital assets, and private information may be compromised by cybercriminals. This paper introduces a specific Finger Vein Recognition approach for the virtual reality (VR) human-robot equipment of the metaverse of the Metaverse to prevent others from misappropriating it. Finger vein is a is a biometric feature hidden beneath our skin. It is considerably more secure in person verification than other hand-based biometric characteristics such as finger print and palm print since it is difficult to imitate. Most conventional finger vein recognition systems that use hand-crafted features are ineffective, especially for images with low quality, low contrast, scale variation, translation, and rotation. Deep learning methods have been demonstrated to be more successful than traditional methods in computer vision. This paper develops a finger vein recognition system based on a convolution neural network and anti-aliasing technique. We employ/ utilize a contrast image enhancement algorithm in the preprocessing step to improve performance of the system. The proposed approach is evaluated on three publicly available finger vein datasets. Experimental results show that our proposed method outperforms the current state-of-the-art methods, improvement of 97.66% accuracy on FVUSM dataset, 99.94% accuracy on SDUMLA dataset, and 88.19% accuracy on THUFV2 dataset.
- Published
- 2022
8. Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches
- Author
-
Areeba Umair, Elio Masciari, and Muhammad Habib Ullah
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people’s hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people’s sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731–5780, 2022. https://doi.org/10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people’s sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM ) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people’s reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people’s feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers.
- Published
- 2023
9. YOLOOD: an arbitrary-oriented flexible flat cable detection method in robotic assembly
- Author
-
Yuxuan Bai, Shimin Wei, Mingshuai Dong, Jian Li, and Xiuli Yu
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Flexible Flat Cable (FFC) detection is the premise of robot 3C assembly and is challenging because FFCs are often non-axis aligned with arbitrary orientations having cluttered surroundings. However, to date, the traditional robotic object detection methods mainly regress the object horizontal bounding box (HBB), in which the size and aspect ratios do not reflect the actual shape of the target object and hardly separate the FFCs in dense. In this paper, rotated object detection was introduced into FFC detection, and a YOLO-based arbitrary-oriented FFC detection method named YOLOOD was proposed. Firstly, oriented bounding boxes (OBB) are used to reflect the object's physical size and angle information and better separate the FFCs from the dense background. Secondly, the circular smooth label (CSL) angular classification algorithm is adopted to obtain the angle information of FFCs. Finally, the head point regression branch is introduced to distinguish between the head and the tail of the FFC and expand the range of FFC detection angle to [0° , 360°). The proposed YOLOOD can reach the detection performance with an average precision of 90.82% and a detection speed of 112 FPS on an FFC dataset. Meanwhile, an actual FFC grasping experiment demonstrated the proposed YOLOOD's effectiveness and feasibility in practical assembly scenarios. In conclusion, this paper innovatively introduces rotating object detection into robot object detection, and the proposed YOLOOD solves the problem of detecting and locating non-axis aligned FFCs and has particular significance for robot 3C assembly.
- Published
- 2023
10. Land consolidation through parcel exchange among landowners using a distributed Spark-based genetic algorithm
- Author
-
Diego Teijeiro, Margarita Amor, Ramón Doallo, Eduardo Corbelle, Juan Porta, and Jorge Parapar
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Land consolidation is an essential tool for public administrations to reduce the fragmentation of land ownership. In particular, parcel exchange shows promising potential for restructuring parcel holdings, even more when the number of parcels and owners involved is large. Unfortunately, the number of possible exchange combinations grows very quickly with the number of participating landowners and parcels, with the associated challenge of finding an acceptable solution. In this paper, we present a high-performance solution for parcel exchange based on genetic algorithms. Our proposal, using Apache Spark framework, is based on the exploiting of distributed-memory systems with effortless access in order to reduce the execution time. This also allows increasing the search width through multiple populations that share their advances. This can be achieved without compromising the search depth thanks to the higher amount of resources available from using distributed-memory systems. Our proposal is capable of achieving better solutions in lower amounts of time compared to previous works, showing that genetic algorithms on a high performance system can be used to propose fair parcel exchanges under strict time constraints, even in complex scenarios. The performance achieved allows for fast trial of several options, reducing the time usually needed to perform administrative procedures associated with land fragmentation problems. Specifically, our proposal is capable of combining the benefits of both depth-focused and width-focused multithreaded parallelization. It matches the speedup gains of depth-focused multithreaded parallelization. The width-focused parallelization provides local minimum resilience and fitness value reduction potential. In this paper, multithreading solutions and Spark-based solutions are tested.
- Published
- 2022
11. Advanced encryption schemes in multi-tier heterogeneous internet of things: taxonomy, capabilities, and objectives
- Author
-
Mahdi R. Alagheband and Atefeh Mashatan
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
The Internet of Things (IoT) is increasingly becoming widespread in different areas such as healthcare, transportation, and manufacturing. IoT networks comprise many diverse entities, including smart small devices for capturing sensitive information, which may be attainable targets for malicious parties. Thus security and privacy are of utmost importance. To protect the confidentiality of data handled by IoT devices, conventional cryptographic primitives have generally been used in various IoT security solutions. While these primitives provide just an acceptable level of security, they typically neither preserve privacy nor support advanced functionalities. Also, they overly count on trusted third parties because of some limitations by design. This multidisciplinary survey paper connects the dots and explains how some advanced cryptosystems can achieve ambitious goals. We begin by describing a multi-tiered heterogeneous IoT architecture that supports the cloud, edge, fog, and blockchain technologies and assumptions and capabilities for each layer. We then elucidate advanced encryption primitives, namely wildcarded, break-glass, proxy re-encryption, and registration-based encryption schemes, as well as IoT-friendly cryptographic accumulators. Our paper illustrates how they can augment the features mentioned above while simultaneously satisfying the architectural IoT requirements. We provide comparison tables and diverse IoT-based use cases for each advanced cryptosystem as well as a guideline for selecting the best one in different scenarios and depict how they can be integrated.
- Published
- 2022
12. A survey of HPC algorithms and frameworks for large-scale gradient-based nonlinear optimization
- Author
-
Felix Liu, Albin Fredriksson, and Stefano Markidis
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Large-scale numerical optimization problems arise from many fields and have applications in both industrial and academic contexts. Finding solutions to such optimization problems efficiently requires algorithms that are able to leverage the increasing parallelism available in modern computing hardware. In this paper, we review previous work on parallelizing algorithms for nonlinear optimization. To introduce the topic, the paper starts by giving an accessible introduction to nonlinear optimization and high-performance computing. This is followed by a survey of previous work on parallelization and utilization of high-performance computing hardware for nonlinear optimization algorithms. Finally, we present a number of optimization software libraries and how they are able to utilize parallel computing today. This study can serve as an introduction point for researchers interested in nonlinear optimization or high-performance computing, as well as provide ideas and inspiration for future work combining these topics.
- Published
- 2022
13. RAPCHI: Robust authentication protocol for IoMT-based cloud-healthcare infrastructure
- Author
-
Vinod Kumar, Mahmoud Shuker Mahmoud, Ahmed Alkhayyat, Jangirala Srinivas, Musheer Ahmad, and Adesh Kumari
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
With the fast growth of technologies like cloud computing, big data, the Internet of Things, artificial intelligence, and cyber-physical systems, the demand for data security and privacy in communication networks is growing by the day. Patient and doctor connect securely through the Internet utilizing the Internet of medical devices in cloud-healthcare infrastructure (CHI). In addition, the doctor offers to patients online treatment. Unfortunately, hackers are gaining access to data at an alarming pace. In 2019, 41.4 million times, healthcare systems were compromised by attackers. In this context, we provide a secure and lightweight authentication scheme (RAPCHI) for CHI employing Internet of medical Things (IoMT) during pandemic based on cryptographic primitives. The suggested framework is more secure than existing frameworks and is resistant to a wide range of security threats. The paper also explains the random oracle model (ROM) and uses two alternative approaches to validate the formal security analysis of RAPCHI. Further, the paper shows that RAPCHI is safe against man-in-the-middle and reply attacks using the simulation programme AVISPA. In addition, the paper compares RAPCHI to related frameworks and discovers that it is relatively light in terms of computation and communication. These findings demonstrate that the proposed paradigm is suitable for use in real-world scenarios.
- Published
- 2022
14. Hyperbolic trees for efficient routing computation
- Author
-
Zalan Heszberger
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Complex system theory is increasingly applied to develop control protocols for distributed computational and networking resources. The paper deals with the important subproblem of finding complex connected structures having excellent navigability properties using limited computational resources. Recently, the two-dimensional hyperbolic space turned out to be an efficient geometry for generative models of complex networks. The networks generated using the hyperbolic metric space share their basic structural properties (like small diameter or scale-free degree distribution) with several real networks. In the paper, a new model is proposed for generating navigation trees for complex networks embedded in the two-dimensional hyperbolic plane. The generative model is not based on known hyperbolic network models: the trees are not inferred from the existing links of any network; they are generated from scratch instead and based purely on the hyperbolic coordinates of nodes. We show that these hyperbolic trees have scale-free degree distributions and are present to a large extent both in synthetic hyperbolic complex networks and real ones (Internet autonomous system topology, US flight network) embedded in the hyperbolic plane. As the main result, we show that routing on the generated hyperbolic trees is optimal in terms of total memory usage of forwarding tables.
- Published
- 2022
15. On the performance evaluation of object classification models in low altitude aerial data
- Author
-
Payal Mittal, Akashdeep Sharma, Raman Singh, and Arun Kumar Sangaiah
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
This paper compares the classification performance of machine learning classifiers vs. deep learning-based handcrafted models and various pretrained deep networks. The proposed study performs a comprehensive analysis of object classification techniques implemented on low-altitude UAV datasets using various machine and deep learning models. Multiple UAV object classification is performed through widely deployed machine learning-based classifiers such as K nearest neighbor, decision trees, naïve Bayes, random forest, a deep handcrafted model based on convolutional layers, and pretrained deep models. The best result obtained using random forest classifiers on the UAV dataset is 90%. The handcrafted deep model's accuracy score suggests the efficacy of deep models over machine learning-based classifiers in low-altitude aerial images. This model attains 92.48% accuracy, which is a significant improvement over machine learning-based classifiers. Thereafter, we analyze several pretrained deep learning models, such as VGG-D, InceptionV3, DenseNet, Inception-ResNetV4, and Xception. The experimental assessment demonstrates nearly 100% accuracy values using pretrained VGG16- and VGG19-based deep networks. This paper provides a compilation of machine learning-based classifiers and pretrained deep learning models and a comprehensive classification report for the respective performance measures.
- Published
- 2022
16. HDNN: a cross-platform MLIR dialect for deep neural networks
- Author
-
Jose M. Garcia, Pablo A. Martínez, and Gregorio Bernabe
- Subjects
Hardware and Architecture ,Software ,Information Systems ,Theoretical Computer Science - Abstract
This paper presents HDNN, a proof-of-concept MLIR dialect for cross-platform computing specialized in deep neural networks. As target devices, HDNN supports CPUs, GPUs and TPUs. In this paper, we provide a comprehensive description of the HDNN dialect, outlining how this novel approach aims to solve the $$P^3$$ P 3 problem of parallel programming (portability, productivity, and performance). An HDNN program is device-agnostic, i.e., only the device specifier has to be changed to run a given workload in one device or another. Moreover, HDNN has been designed to be a domain-specific language, which ultimately helps programming productivity. Finally, HDNN relies on optimized libraries for heavy, performance-critical workloads. HDNN has been evaluated against other state-of-the-art machine learning frameworks on all the hardware platforms achieving excellent performance. We conclude that the ideas and concepts used in HDNN can be crucial for designing future generation compilers and programming languages to overcome the challenges of the forthcoming heterogeneous computing era.
- Published
- 2022
17. CAVLCU: an efficient GPU-based implementation of CAVLC
- Author
-
Nicolás Guil, Antonio Fuentes-Alventosa, Juan Gómez-Luna, José María González-Linares, and Rafael Medina-Carnicer
- Subjects
Computer science ,business.industry ,CAVLC ,GPU ,CUDA ,H.264 ,Parallel implementations ,Data compression ,Variable-length encoding ,Frame (networking) ,Memory bandwidth ,Parallel computing ,Encryption ,Theoretical Computer Science ,Hardware and Architecture ,Encoding (memory) ,business ,Encoder ,Software ,Information Systems ,Image compression ,Block (data storage) ,Context-adaptive variable-length coding - Abstract
CAVLC (Context-Adaptive Variable Length Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several hardware accelerators for CAVLC have been designed. In contrast, high-performance software implementations of CAVLC (e.g., GPU-based) are scarce. A high-performance GPU-based implementation of CAVLC is desirable in several scenarios. On the one hand, it can be exploited as the entropy component in GPU-based H.264 encoders, which are a very suitable solution when GPU built-in H.264 hardware encoders lack certain necessary functionality, such as data encryption and information hiding. On the other hand, a GPU-based implementation of CAVLC can be reused in a wide variety of GPU-based compression systems for encoding images and videos in formats other than H.264, such as medical images. This is not possible with hardware implementations of CAVLC, as they are non-separable components of hardware H.264 encoders. In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5x and 5.4x faster than the only state-of-the-art GPUbased implementation of CAVLC., The Journal of Supercomputing, 78 (6), ISSN:0920-8542, ISSN:1573-0484
- Published
- 2021
18. The g-extra diagnosability of the balanced hypercube under the PMC and MM* model
- Author
-
Yuehong Chen, Qiao Sun, Lijuan Huang, Xin-Yang Wang, Naqin Zhou, Weiwei Lin, and Keqin Li
- Subjects
Discrete mathematics ,Current (mathematics) ,Cover (topology) ,Hardware and Architecture ,Computer science ,Value (computer science) ,Fault tolerance ,Hypercube ,Upper and lower bounds ,Software ,Information Systems ,Theoretical Computer Science - Abstract
Fault diagnosis plays an important role in the measuring of the fault tolerance of an interconnection network, which is of great value in the design and maintenance of large-scale multiprocessor systems. As a classical variant of the hypercube, the Balanced Hypercube, denoted by BHn(n $$\ge$$ 1), has drawn a lot of research attention, and its $$g$$ -extra diagnosability has been studied to improve the network diagnostic ability. However, the current literatures on $$g$$ -extra diagnosability of BHn under the PMC model only cover the cases of $$g < 6$$ , and what’s more, seldom involve its $$g$$ -extra diagnosability under the MM* model, which is a great limitation on the research of BHn diagnosability. In this paper, the upper and lower bounds of the $$g$$ -extra diagnosability of the balanced hypercube are proved, respectively, based on the $$g$$ -extra connectivity by the contradiction method, and finally, the $$g$$ -extra diagnosability of BHn for $$2 \le g \le 2n - 1$$ under the PMC and MM* model is obtained, i.e., $$2\left[ {\left( {n - 2} \right)\lceil\frac{g - 1}{2}\rceil + n} \right] + g$$ . In addition, as a special case, the $$g$$ -extra diagnosability of the balanced hypercube for $$g = 2n$$ is proved to be $$2^{2n - 1} - 1$$ under the PMC and MM* model. In the end, simulation experiments are conducted to verify the effectiveness of our proposed theories. The conclusion of this paper has certain theory and application value for the research of BHn fault diagnosis.
- Published
- 2021
19. Understanding human emotions through speech spectrograms using deep neural network
- Author
-
Yu-Chen Hu, Stuti Juyal, and Vedika Gupta
- Subjects
Artificial neural network ,Computer science ,Speech recognition ,Feature extraction ,Perceptron ,Theoretical Computer Science ,Support vector machine ,Hardware and Architecture ,Bag-of-words model in computer vision ,Classifier (linguistics) ,Cepstrum ,Mel-frequency cepstrum ,Software ,Information Systems - Abstract
This paper presents the analysis and classification of speech spectrograms for recognizing emotions in RAVDESS dataset. Feature extraction from speech utterances is performed using Mel-Frequency Cepstrum Coefficient. Thereafter, deep neural networks are employed to classify speech into six emotions (happy, sad, neutral, calm, disgust, and fear). Firstly, this paper presents a comprehensive comparative study on DNNs on prosodic features. The outcomes of all DNNs are presented in the paper. Secondly, the paper puts forward an analysis of Bag of Visual Words that uses speeded-up robust features (SURF) to cluster them using K-means and further classify them using support vector machine (SVM) into aforementioned emotions. Out of the five DNNs deployed, (i) Long Short-Term Memory (LSTM) on MFCC and, (ii) Multi-Layer Perceptron (MLP) classifier on MFCC, outperforms others, giving an accuracy score of 0.70 (in both cases). Further, the BoVW technique performed 53% of correct classification. Therefore, the proposed methodology constructs a Hybrid of Acoustic Features (HAF) and feeds them into an ensemble of bagged multi-layer perceptron classifier imparting an accuracy of 85%. Also, it achieves a precision score between 0.77 and 0.88 for the classification of six emotions.
- Published
- 2021
20. An effective SPMV based on block strategy and hybrid compression on GPU
- Author
-
Qilong Han, Nianbin Wang, Yuhua Wang, Huanyu Cui, and Yuezhu Xu
- Subjects
Computer science ,Sparse matrix-vector multiplication ,Serial code ,Load balancing (computing) ,Theoretical Computer Science ,Matrix (mathematics) ,Acceleration ,Hardware and Architecture ,Redundancy (engineering) ,Algorithm ,Software ,Information Systems ,Sparse matrix ,Block (data storage) - Abstract
Due to the non-uniformity of the sparse matrix, the calculation of SPMV (sparse matrix vector multiplication) will lead to redundancy in calculation, redundancy in storage, unbalanced load and low GPU utilization. In this study, a new matrix compression method based on CSR and COO is proposed for the above analysis: PBC algorithm. This method considers the load balancing condition in the calculation process of SPMV, and blocks are divided according to the strategy of row main order to ensure the minimum standard deviation between each block, aiming to satisfy the maximum similarity in the number of nonzero elements between each block. This paper preprocesses the original matrix based on block splitting algorithm to meet the conditions of load balancing for each block stored in the form of CSR and COO. Finally, the experimental results show that the time of SPMV preprocessing is within the acceptable range of the algorithm. Compared with the serial code without CSR optimization, the parallel method in this paper has an acceleration ratio of 178x. In addition, compared with the serial code for CSR optimization, the parallel method in this paper has an acceleration ratio of 6x. And a representative matrix compression method is also selected for performing comparative analysis. The experimental results show that the PBC algorithm has a good efficiency improvement compared with the comparison algorithm.
- Published
- 2021
21. Software-defect prediction within and across projects based on improved self-organizing data mining
- Author
-
Junhua Ren and Qing Zhang
- Subjects
Correlation coefficient ,Computer science ,computer.software_genre ,Data imbalance ,Software metric ,Theoretical Computer Science ,Software bug ,Ranking ,Hardware and Architecture ,Data pre-processing ,Data mining ,computer ,Software ,Information Systems - Abstract
This paper proposes a new method for software-defect prediction based on self-organizing data mining; this method can establish a causal relationship between software metrics and defects. Defect-prediction models were established for intra-project and cross-project scenarios. For intra-project forecasting, this article establishes a self-organizing data mining model, adding a method of smooth data preprocessing to solve the problem of data imbalance. For cross-project forecasting, this article establishes a self-organizing data mining model, solves the difference between the two by finding a source-project instance with a larger correlation coefficient with the target project, and establishes a defect-prediction model for the selected source-project instance. This paper aims to achieve classification and ranking prediction. The proposed method is tested on public-defect datasets. In the classification-prediction experiment, the precision, F-measure, and AUC evaluation indicators of this method are used. In the ranking-prediction experiment, AAE and ARE evaluation by this method are optimized. The algorithm is found to be an efficient and feasible method for software-defect prediction.
- Published
- 2021
22. Dynamic weighted selective ensemble learning algorithm for imbalanced data streams
- Author
-
Du Hongle, Ke Gang, Zhang Lin, Yeh-Cheng Chen, and Zhang Yan
- Subjects
Data stream ,Concept drift ,Computer science ,Data stream mining ,Sample (statistics) ,Ensemble learning ,Theoretical Computer Science ,ComputingMethodologies_PATTERNRECOGNITION ,Hardware and Architecture ,Resampling ,Classifier (linguistics) ,Oversampling ,Algorithm ,Software ,Information Systems - Abstract
Data stream mining is one of the hot topics in data mining. Most existing algorithms assume that data stream with concept drift is balanced. However, in real-world, the data streams are imbalanced with concept drift. The learning algorithm will be more complex for the imbalanced data stream with concept drift. In online learning algorithm, the oversampling method is used to select a small number of samples from the previous data block through a certain strategy and add them into the current data block to amplify the current minority class. However, in this method, the number of stored samples, the method of oversampling and the weight calculation of base-classifier all affect the classification performance of ensemble classifier. This paper proposes a dynamic weighted selective ensemble (DWSE) learning algorithm for imbalanced data stream with concept drift. On the one hand, through resampling the minority samples in previous data block, the minority samples of the current data block can be amplified, and the information in the previous data block can be absorbed into building a classifier to reduce the impact of concept drift. The calculation method of information content of every sample is defined, and the resampling method and updating method of the minority samples are given in this paper. On the other hand, because of concept drift, the performance of the base-classifier will be degraded, and the decay factor is usually used to describe the performance degradation of base-classifier. However, the static decay factor cannot accurately describe the performance degradation of the base-classifier with the concept drift. The calculation method of dynamic decay factor of the base-classifier is defined in DWSE algorithm to select sub-classifiers to eliminate according to the attenuation situation, which makes the algorithm better deal with concept drift. Compared with other algorithms, the results show that the DWSE algorithm has better classification performance for majority class samples and minority samples.
- Published
- 2021
23. A new software cache structure on Sunway TaihuLight
- Author
-
Zhaochu Deng, Panpan Du, Jie Lin, and Jianjiang Li
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Program optimization ,Supercomputer ,computer.software_genre ,Theoretical Computer Science ,Data access ,Hardware and Architecture ,Hit rate ,Operating system ,Overhead (computing) ,Cache ,computer ,Software ,Information Systems ,Sunway TaihuLight ,Data transmission - Abstract
The Sunway TaihuLight is the first supercomputer built entirely with domestic processors in China. On Sunway Taihulight, the local data memory (LDM) of the slave core is limited, so data transmission with the main memory is frequent during calculation, and the memory access efficiency is low. On the other hand, for many scientific computing programs, how to solve the storage problem of irregular access data is the key to program optimization. Software cache (SWC) is one of the effective means to solve these problems. Based on the characteristics of Sunway TaihuLight structure and irregular access, this paper designs and implements a new software cache structure by using part of the space in LDM to simulate the cache function, which uses new cache address mapping and conflicts solution to solve high data access overhead and storage overhead in a traditional cache. At the same time, the SWC uses the register communication between the slave cores to share on the different slave core LDMs, increasing the capacity of the software cache and improving the hit rate. In addition, we adopt a double buffer strategy to access regular data in batches, which hides the communication overhead between the slave core and the main memory. The test results on the Sunway TaihuLight platform show that the software cache structure in this paper can effectively reduce the program running time, improve the software cache hit rate, and achieve a better optimization effect.
- Published
- 2021
24. Comparative evaluation of task priorities for processing and bandwidth capacities-based workflow scheduling for cloud environment
- Author
-
Zheng Wei, Zhang De-Fu, and Emmanuel Bugingo
- Subjects
Flexibility (engineering) ,Job shop scheduling ,Computer science ,Heuristic ,business.industry ,Distributed computing ,Cloud computing ,computer.software_genre ,Theoretical Computer Science ,Task (computing) ,Hardware and Architecture ,Virtual machine ,Bandwidth (computing) ,business ,computer ,Software ,Information Systems ,Data transmission - Abstract
With the development of the cloud computing market, cloud computing providers are offering to their users the flexibility to choose the desired capacity of both processing and data transfer (bandwidth) to use during the execution of their applications. The selected processing capacity can be shared among the number of virtual machines (VMs) that are capable of completing the user’s application within optimal time. During the execution of the users’ application, each task is scheduled on the VM that is capable of minimizing its execution time. However, the tasks are interdependent which means that the execution delay of a parent task will lead to the delay of its dependent tasks. The Bandwidth can be used to reduce the waiting time and avoid this delay of the dependent task. The execution time of the user’s application depends on some factors such as tasks’ priority, application’s structure, size, and the number of the VM selected to share the selected processing capacity. Determining the number of VM to share the users’ selected capacities under users’ specified quality of services remains a big challenge. Determining the number of VM to share the users selected processing capacity has been studied in our previous paper, where makespan idle-time was given the same weight. In this paper, we extended our previous work by adding the bandwidth capacity to the user’s selection and use CRITIC a multi-criteria decision-making technique to determine the weight of each criterion. The evaluation results show that the proposed heuristic can work well under different parameter settings.
- Published
- 2021
25. Automatic lane marking prediction using convolutional neural network and S-Shaped Binary Butterfly Optimization
- Author
-
Abrar M. Alajlan and Marwah Almasri
- Subjects
Hyperparameter ,Receiver operating characteristic ,Computer science ,business.industry ,Stability (learning theory) ,Process (computing) ,Binary number ,Pattern recognition ,Convolutional neural network ,Measure (mathematics) ,Theoretical Computer Science ,Hardware and Architecture ,Robustness (computer science) ,Artificial intelligence ,business ,Software ,Information Systems - Abstract
Lane detection is a technique that uses geometric features as an input to the autonomous vehicle to automatically distinguish lane markings. To process the intricate features present in the lane images, traditional computer vision (CV) techniques are typically time-consuming, need more computing resources, and use complex algorithms. To address this problem, this paper presents a deep convolutional neural network (CNN) architecture that prevents the complexities of traditional CV techniques. CNN is regarded as a reasonable method for lane marking prediction, while improved performance requires hyperparameter tuning. To enhance the initial parameter setting of the CNN, an S-Shaped Binary Butterfly Optimization Algorithm (SBBOA) is utilized in this paper. In this way, the relative parameters of CNN are selected for accurate lane marking. To evaluate the performance of the proposed SBBOA-CNN model, extensive experiments are conducted using the TUSimple and CULane datasets. The experimental results obtained show that the proposed approach outperforms other state-of-the-art techniques in terms of classification accuracy, precision, F1-score, and recall. The proposed model also considerably outperforms the CNN in terms of classification accuracy, average elapsed time, and receiver operating characteristics curve measure. This result demonstrates that the SBBOA optimized CNN exhibits higher robustness and stability than CNN.
- Published
- 2021
26. Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment
- Author
-
Youlong Luo, Qianqian Cai, and Chunlin Li
- Subjects
Shuffling ,Computer science ,business.industry ,Distributed computing ,Skew ,Theoretical Computer Science ,Data recovery ,Task (computing) ,Hardware and Architecture ,Spark (mathematics) ,Overhead (computing) ,Cache ,Reservoir sampling ,business ,Software ,Information Systems - Abstract
Both data shuffling and cache recovery are essential parts of the Spark system, and they directly affect Spark parallel computing performance. Existing dynamic partitioning schemes to solve the data skewing problem in the data shuffle phase suffer from poor dynamic adaptability and insufficient granularity. To address the above problems, this paper proposes a dynamic balanced partitioning method for the shuffle phase based on reservoir sampling. The method mitigates the impact of data skew on Spark performance by sampling and preprocessing intermediate data, predicting the overall data skew, and giving the overall partitioning strategy executed by the application. In addition, an inappropriate failure recovery strategy increases the recovery overhead and leads to an inefficient data recovery mechanism. To address the above issues, this paper proposes a checkpoint-based fast recovery strategy for the RDD cache. The strategy analyzes the task execution mechanism of the in-memory computing framework and forms a new failure recovery strategy using the failure recovery model plus weight information based on the semantic analysis of the code to obtain detailed information about the task, so as to improve the efficiency of the data recovery mechanism. The experimental results show that the proposed dynamic balanced partitioning approach can effectively optimize the total completion time of the application and improve Spark parallel computing performance. The proposed cache fast recovery strategy can effectively improve the computational speed of data recovery and the computational rate of Spark.
- Published
- 2021
27. Optimal channel estimation and interference cancellation in MIMO-OFDM system using MN-based improved AMO model
- Author
-
Chittetti Venkateswarlu and Nandanavanam Venkateswara Rao
- Subjects
Speedup ,Computer science ,MIMO-OFDM ,Theoretical Computer Science ,Single antenna interference cancellation ,Rate of convergence ,Interference (communication) ,Hardware and Architecture ,Bit error rate ,Minification ,Algorithm ,Software ,Computer Science::Information Theory ,Information Systems ,Communication channel - Abstract
In recent years, MIMO-OFDM plays a significant role due to its high-speed transmission rate. Various research studies have been carried out regarding the channel estimation to obtain optimal output without affecting the system performances. But due to increased bit error rate achieving optimal channel estimation is considered as a challenging task. Therefore, this paper proposes the modified Newton’s (MN)-based Improved Animal Migration Optimization (IAMO) algorithm in MIMO-OFDM system. The significant objective of this proposed approach involves the minimization of bit error rate and to enhance the system performance. In this paper, a modified Newton’s method is utilized to determine the discover capability and to speed up the convergence rate thereby obtaining the optimum search space positions. In addition to this, the proposed method is utilized to restrict the interference in the MIMO-OFDM systems. Finally, the performance of the proposed method is compared with other channel estimation methods to determine the effectiveness of the system. The experimental and comparative analyses are carried out, and the results demonstrate that the proposed approach provides better frequency-selective channels than other state-of-the-art methods .
- Published
- 2021
28. Kubernetes in IT administration and serverless computing: An empirical study and research challenges
- Author
-
Hong-Ning Dai, Rui Pan, Subrota K. Mondal, Tan Tian, and H M Dipu Kabir
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Attack tree ,Cloud computing ,computer.software_genre ,Computer security ,Virtualization ,Theoretical Computer Science ,Debugging ,Hardware and Architecture ,Virtual machine ,Container (abstract data type) ,Software design ,Orchestration (computing) ,business ,computer ,Software ,Information Systems ,media_common - Abstract
Today’s industry has gradually realized the importance of lifting efficiency and saving costs during the life-cycle of an application. In particular, we see that most of the cloud-based applications and services often consist of hundreds of micro-services; however, the traditional monolithic pattern is no longer suitable for today’s development life-cycle. This is due to the difficulties of maintenance, scale, load balance, and many other factors associated with it. Consequently, people switch their focus on containerization—a lightweight virtualization technology. The saving grace is that it can use machine resources more efficiently than the virtual machine (VM). In VM, a guest OS is required to simulate on the host machine, whereas containerization enables applications to share a common OS. Furthermore, containerization facilitates users to create, delete, or deploy containers effortlessly. In order to manipulate and manage the multiple containers, the leading Cloud providers introduced the container orchestration platforms, such as Kubernetes, Docker Swarm, Nomad, and many others. In this paper, a rigorous study on Kubernetes from an administrator’s perspective is conducted. In a later stage, serverless computing paradigm was redefined and integrated with Kubernetes to accelerate the development of software applications. Theoretical knowledge and experimental evaluation show that this novel approach can be accommodated by the developers to design software architecture and development more efficiently and effectively by minimizing the cost charged by public cloud providers (such as AWS, GCP, Azure). However, serverless functions are attached with several issues, such as security threats, cold start problem, inadequacy of function debugging, and many other. Consequently, the challenge is to find ways to address these issues. However, there are difficulties and hardships in addressing all the issues altogether. Respectively, in this paper, we simply narrow down our analysis toward the security aspects of serverless. In particular, we quantitatively measure the success probability of attack in serverless (using Attack Tree and Attack–Defense Tree) with the possible attack scenarios and the related countermeasures. Thereafter, we show how the quantification can reflect toward the end-to-end security enhancement. In fine, this study concludes with research challenges such as the burdensome and error-prone steps of setting the platform, and investigating the existing security vulnerabilities of serverless computing, and possible future directions.
- Published
- 2021
29. Design and analysis of SRAM cell using reversible logic gates towards smart computing
- Author
-
M. Siva Kumar and O. Mohana chandrika
- Subjects
Hardware_MEMORYSTRUCTURES ,Computer science ,Transistor ,Theoretical Computer Science ,Power (physics) ,law.invention ,CMOS ,Hardware and Architecture ,law ,Video tracking ,Logic gate ,Electronic engineering ,Electronics ,Static random-access memory ,Software ,Information Systems ,Voltage - Abstract
With the enhancement of technology, the usage of electronics in various applications involving large memories for storing and processing data has increased. In this sort of application, SRAM is mainly used because of its high speed. Moreover, with the high usage of memory cells, power consumption has increased to a great extent. The current literature shows that the various parameters of SRAM, such as speed and power, need to be improved for memory cells used in object tracking applications. To improve these parameters, the architectures of SRAM must be combined with new techniques. In recent years, reversible circuits have gained extensive attention because of their low-power and low-speed characteristics. In this brief, a low-power high-speed reversible static RAM is proposed. The proposed SRAM has the combined features of data processing with low-power dissipation and high speed. The proposed architecture of SRAM yields better performance and is similar to traditional SRAM architecture in terms of delay. This paper also implements a 32 × 64 memory block for object tracking applications. This work is carried out with 45 nm CMOS technology. In the proposed design, transistors are made to operate in the weak inversion region through the use of the EKV model. The design proposed in this paper reduces garbage outputs by 60%, the quantum cost by 70%, and the quantum delay by 70% compared to the current architectures. The proposed design is simulated at different supply voltages to ensure that the power dissipation and delay of SRAM are proportional to the voltage supplied.
- Published
- 2021
30. NodeRank: Finding influential nodes in social networks based on interests
- Author
-
Ibrahim Kamel, Mohammed Bahutair, and Zaher Al Aghbari
- Subjects
Structure (mathematical logic) ,Theoretical computer science ,Social network ,business.industry ,Computer science ,Theoretical Computer Science ,Set (abstract data type) ,Hardware and Architecture ,Scalability ,Spark (mathematics) ,Recursive algorithms ,business ,Software ,Information Systems - Abstract
Finding influential members in social networks received a lot of interest in recent literature. Several algorithms have been proposed that provide techniques for extracting a set of the most influential people in a certain social network. However, most of these algorithms find influential nodes based solely on the topological structure of the network. In this paper, a new algorithm, namely NodeRank, is proposed that ranks every user in a given social network based on the topological structure as well as the interests of the users (nodes). Higher ranks are given to people with great influence on other members of the network. Furthermore, the paper investigates a MapReduce version of the algorithm that enables the algorithm to run on multiple machines simultaneously. Experiments showed that the MapReduce model is not suitable for the NodeRank algorithm since MapReduce is only applicable for batch processes and the NodeRank is highly iterative. For that reason, a parallel version of the algorithm is proposed that utilizes Hadoop Spark, a framework for parallel processes that supports batch operations as well as iterative and recursive algorithms. Several experiments have been carried out to test the accuracy as well as the scalability of the algorithm.
- Published
- 2021
31. TRAM: Technique for resource allocation and management in fog computing environment
- Author
-
Rajni Aron and Heena Wadhwa
- Subjects
Flexibility (engineering) ,business.industry ,Computer science ,Process (engineering) ,Distributed computing ,Cloud computing ,Energy consumption ,Theoretical Computer Science ,Resource (project management) ,Hardware and Architecture ,Computer data storage ,Wireless ,Resource allocation ,business ,Software ,Information Systems - Abstract
The traditional cloud computing technology provides services to a plethora of applications by providing resources. These services support numerous industries for computational purposes and data storage. However, the obstruction of the cloud computing framework is its inadequate flexibility and problem to accommodate the diverse requirements generated from an IoT-based environment. Cloud computing is emerging with the latest paradigms to ensure that the connected heterogeneous system can achieve high-performance computing (HPC). Furthermore, many of today’s requirements prefer diverse geographic distribution of resources and near to the end device location. Hence, the new fog computing paradigm provides some innovative solutions for real-time applications. The fog computing framework’s prime agenda is to support latency-sensitive applications by utilizing all available resources. In this paper, a novel approach is designed for resource allocation and management. TRAM , a technique for resource allocation and management, is proposed to ensure resource utilization at the fog layer. This approach is used to track the intensity level of existing tasks using expectation maximization (EM) algorithm and calculate the current status of resources. All the available resources manage by using a wireless system. This paper provides a scheduling algorithm for the resource grading process in the fog computing environment. The performance of this approach is tested on the iFogSim simulator and compared the results with SJF, FCFS and MPSO. The experimental results demonstrated that TRAM effectively minimizes execution time, network consumption, energy consumption and average loop delay of tasks.
- Published
- 2021
32. Token-based approach in distributed mutual exclusion algorithms: a review and direction to future research
- Author
-
Ashish Singh Parihar and Swarnendu Kumar Chakraborty
- Subjects
Computer science ,business.industry ,Wireless ad hoc network ,Distributed computing ,media_common.quotation_subject ,Security token ,Adaptability ,Theoretical Computer Science ,Shared resource ,Domain (software engineering) ,Hardware and Architecture ,The Internet ,Mutual exclusion ,Mobile telephony ,business ,Software ,Information Systems ,media_common - Abstract
The problem of mutual exclusion is a highly focused area in the distributed architecture. To avoid inconsistency in data, mutual exclusion ensures that no two processes running on different processors are allowed to enter into the same shared resource simultaneously in the system. In recent years, the consistent development of ongoing internet and mobile communication technologies, the devices, infrastructure and resources in networking systems like Ad Hoc Networks are becoming more complex and heterogeneous. Various algorithms have been introduced as a solution to mutual exclusion problem in the domain of distributed architecture over the past years. The performance and adaptability of these solutions depend on the different strategies used by them in the system. Various classifications of these strategies have been proposed such as token-based and non-token-based (also, permission-based). This paper presents a survey of various existing token-based distributed mutual exclusion algorithms (TBDMEA) in the focus of their performance measures and fault-tolerant capabilities which comprises the associated open challenges and directions to future research. In conjunction with traditional to latest proposed TBDMEA, token-based distributed group mutual exclusion algorithms (TBDGMEA) and token-based self-stabilizing distributed mutual exclusion algorithms (TBStDMEA) have also been surveyed in this paper as new variants of the token-based scheme.
- Published
- 2021
33. Smart home security: challenges, issues and solutions at different IoT layers
- Author
-
Mudassar Hussain, Muhammad Bilal, Fadi Al-Turjman, Shakir Zaman, Rashid Amin, and Haseeb Touqeer
- Subjects
020203 distributed computing ,Temperature monitoring ,business.industry ,Computer science ,Control (management) ,02 engineering and technology ,Computer security ,computer.software_genre ,Theoretical Computer Science ,Layered structure ,Hardware and Architecture ,Light control ,Home automation ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,DECIPHER ,Internet of Things ,business ,computer ,Software ,Information Systems - Abstract
The Internet of Things is a rapidly evolving technology in which interconnected computing devices and sensors share data over the network to decipher different problems and deliver new services. For example, IoT is the key enabling technology for smart homes. Smart home technology provides many facilities to users like temperature monitoring, smoke detection, automatic light control, smart locks, etc. However, it also opens the door to new set of security and privacy issues, for example, the private data of users can be accessed by taking control over surveillance devices or activating false fire alarms, etc. These challenges make smart homes feeble to various types of security attacks and people are reluctant to adopt this technology due to the security issues. In this survey paper, we throw light on IoT, how IoT is growing, objects and their specifications, the layered structure of the IoT environment, and various security challenges for each layer that occur in the smart home. This paper not only presents the challenges and issues that emerge in IoT-based smart homes but also presents some solutions that would help to overcome these security challenges.
- Published
- 2021
34. AIEMLA: artificial intelligence enabled machine learning approach for routing attacks on internet of things
- Author
-
Saurabh Sharma and Vinod Kumar Verma
- Subjects
Hyperparameter ,Routing protocol ,Focus (computing) ,Artificial neural network ,Computer science ,business.industry ,Lossy compression ,Machine learning ,computer.software_genre ,Prime (order theory) ,Theoretical Computer Science ,Hardware and Architecture ,The Internet ,Artificial intelligence ,Routing (electronic design automation) ,business ,computer ,Software ,Information Systems - Abstract
The Internet of things (IoT) is emerging as a prime area of research in the modern era. The significance of IoT in the daily life is increasing due to the increase in objects or things connected to the internet. In this paper, routing protocol for low power and lossy networks (RPL) is examined on the Contiki operating system. This paper used RPL attack framework to simulate three RPL attacks, namely hello-flood, decreased-rank and increased-version. These attacks are simulated in a separate and simultaneous manner. The focus remained on the detection of these attacks through artificial neural network (ANN)-based supervised machine learning approach. The accurate detection of the malicious nodes prevents the network from the severe effects of the attack. The accuracy of the proposed model is computed with hold-out approach and tenfold cross-validation technique. The hyperparameters have been optimized through parameter tuning. The model presented in this paper detected the aforesaid attacks simultaneously as well as individually with 100% accuracy. This work also investigated other performance measures like precision, recall, F1-score and Mathews correlation coefficient (MCC).
- Published
- 2021
35. Design and testing of a reversible ALU by quantum cells automata electro-spin technology
- Author
-
Rupsa Roy, Swarup Sarkar, and Sourav Dhar
- Subjects
020203 distributed computing ,Computer science ,business.industry ,Transistor ,Fault tolerance ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Dissipation ,Theoretical Computer Science ,law.invention ,Automaton ,Arithmetic logic unit ,Software ,CMOS ,Hardware and Architecture ,law ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,Electronic engineering ,business ,Hardware_LOGICDESIGN ,Information Systems ,Quantum cellular automaton - Abstract
Arithmetic logic unit (ALU), a core component of a processor, is one of the thrust areas of the current research. Presently, ALU is designed by transistor-based CMOS technique and its individual components are placed in different layers. The current design is affected by the limitations of Moore’s law and design complexity. At present, ‘Quantum cellular automata electro-spin (QCA-ES)’ technology is widely accepted technology as an alternative of ‘CMOS’ to minimize the above discussed problems. In this research paper, the design of a novel multilayer portable, dynamic, fault-tolerant, power-efficient, thermally stable reversible ALU is proposed which is explored through QCA-ES. All the arithmetic and logical components of ALU are separately placed in different layers. Area density, delay, fault tolerance and thermal stability are investigated. A specific type of gate, known as reversible gate (modified 3:3 ‘TSG’ gate), is used in this proposed design with QCA technology to get the optimized design ALU with low occupied area, complexity, delay and power dissipation. Investigation of a fault-free design and saturated amplitude level (of output) change with respect to temperature increment in the proposed device are also discussed in this paper. Not only the thermal stability (up to 6 k temperature) but also an investigation on cell complexity of the 100% fault-free (against multiple cell-omission, cell-displacement, cell-orientation change and extra cell deposition), multilayer nano-device is represented in this work. ‘QCA-Designer’ software is used in this research work to design and develop layout of the proposed components in quantum-field and find out the occupied area, delay and complexity of proposed design. ‘QCA-Pro’ software is used for getting the value of dissipated power.
- Published
- 2021
36. Performance evaluation and optimization of a task offloading strategy on the mobile edge computing with edge heterogeneity
- Author
-
Shunfu Jin and Wei Li
- Subjects
020203 distributed computing ,Karush–Kuhn–Tucker conditions ,Mobile edge computing ,Computer science ,business.industry ,Distributed computing ,Cloud computing ,02 engineering and technology ,Energy consumption ,Theoretical Computer Science ,System model ,Task (computing) ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Computation offloading ,business ,Software ,Information Systems ,Efficient energy use - Abstract
With the development for the technology of mobile edge computing (MEC) and the grave situation for the shortage of global energy, the problem of computation offloading in a cloud computing framework is getting more attention by network managers. In order to improve the experience quality of users and increase the energy efficiency of the system, we focus on the issue of task offloading strategy in MEC system. In this paper, we propose a task offloading strategy in the MEC system with a heterogeneous edge. By considering the execution and transmission of tasks under the task offloading strategy, we present an architecture for the MEC system. We establish a system model composed of M/M/1, M/M/c and M/M/ $$\infty$$ queues to capture the execution process of tasks in local mobile device (MD), MEC server and remote cloud servers, respectively. Moreover, by trading off the average delay of tasks, the energy consumption level of the MD and the offloading expend of the system, we construct a cost function for serving one task and formulate a joint optimization problem for the task offloading strategy accordingly. Furthermore, under the constraints of steady state and proportion scope, we use the Lagrangian function and the corresponding Karush–Kuhn–Tucker (KKT) condition to obtain the optimal task offloading strategy with the minimum system cost. Finally, we carry out numerical experiments on the MEC system to investigate the influence of system parameters on the task offloading strategy and to obtain the optimal results. The experiment results show that the task offloading strategy proposed in this paper can balance the average delay, the energy consumption level and the offloading expend with the optimal allocation ratio.
- Published
- 2021
37. A novel approach for multilevel multi-secret image sharing scheme
- Author
-
Kanchan Bisht and Maroti Deshmukh
- Subjects
Scheme (programming language) ,Theoretical computer science ,Distribution (number theory) ,Computer science ,Image sharing ,Structure (category theory) ,Theoretical Computer Science ,Image (mathematics) ,Hardware and Architecture ,Multimedia data transmission ,computer ,Software ,Information Systems ,computer.programming_language - Abstract
Multi-secret sharing (MSS) is an effective technique that securely encodes multiple secrets to generate shares and distributes them among the participants in such a way that these shares can be used later to reconstruct the secrets. MSS schemes have a considerable advantage over the single-secret sharing schemes for secure multimedia data transmission. This paper presents a novel secret image sharing approach, namely ‘(n, m, l)-Multilevel Multi-Secret Image Sharing (MMSIS) scheme.’ The proposed MMSIS scheme encodes ‘n’ distinct secret images to generate ‘m’ shares and distributes them among ‘m’ participants allocated to ‘l’ distinct levels. The paper proposes two variants of the MMSIS scheme. The first variant is an $$(n,n+1,l)$$ -MMSIS scheme which encodes ‘n’ secret images each having a unique level id $$L_k$$ into $$(n+1)$$ shares. The image shares are then distributed among $$(n+1)$$ participants assigned to ‘ $$l=n$$ ’ different levels. With the increase in level id, the number of shares required to reconstruct the secret image also increases. To reconstruct a secret image of a particular level $$L_k$$ , all the shares at level $$L_k$$ and its preceding levels need to be acquired, which requires the consensus of all participants holding the shares up to level $$L_k$$ . The second variant, namely extended-MMSIS (EMMSIS) scheme is a generalized (n, m, l) version of the former scheme that allows to generate more shares for a specific secret image at a particular level in accordance with the consensus requirements for its reconstruction. The multilevel structure of the scheme makes it useful for multi-secret distribution in a multilevel organizational structure.
- Published
- 2021
38. Simple method of selecting totalistic rules for pseudorandom number generator based on nonuniform cellular automaton
- Author
-
Miroslaw Szaban
- Subjects
Pseudorandom number generator ,Keyspace ,Selection (relational algebra) ,Computer science ,business.industry ,Cryptography ,Cellular automaton ,Theoretical Computer Science ,Set (abstract data type) ,Hardware and Architecture ,Entropy (information theory) ,business ,Algorithm ,Software ,Information Systems ,Generator (mathematics) - Abstract
This paper is devoted to selecting rules for one-dimensional (1D) totalistic cellular automaton (TCA). These rules are used for the generation of pseudorandom sequences, which could be useful in cryptography. The power of pseudorandom number generator (PRNG) based on nonuniform TCA can be improved using not only one rule but a large set of rules. For this purpose, each subset of rules should be analyzed with its assignation to cellular automaton (CA) cells should be analyzed. We examine each of the subsets of totalistic rules, consisting of rules with neighborhood radius equal to 1 and 2. The entropy of bitstreams generated by the nonuniform TCA points out the best set of rules appropriate for the TCA-based generator. The paper also presents the method of simple selection of CA rules based on a cryptographic criterion known as a balance. The proposed method selects a maximal size of the set of available CA rules for a given neighborhood radius and suitable for PRNG. The method guarantees to avoid conflicting assignments of rules resulting in the creation of unwanted stable bit sequences, and provides high-quality pseudorandom sequences. This technique is used to verify the subsets of rules selected experimentally. Verified rules are proposed for 1D TCA-based PRNG as a new subset of best nonuniform TCA rules. New picked, examined, and verified subset of rules could be used in TCA-based PRNG and provide cryptographically strong bit sequences and huge keyspace.
- Published
- 2021
39. E2LG: a multiscale ensemble of LSTM/GAN deep learning architecture for multistep-ahead cloud workload prediction
- Author
-
Saeed Sharifian and Peyman Yazdanian
- Subjects
020203 distributed computing ,Discriminator ,business.industry ,Computer science ,Deep learning ,Chaotic ,Cloud computing ,Workload ,02 engineering and technology ,computer.software_genre ,Standard deviation ,Autoscaling ,Hilbert–Huang transform ,Theoretical Computer Science ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Data mining ,business ,computer ,Software ,Information Systems - Abstract
Efficient resource demand prediction and management are two main challenges for cloud service providers in order to control dynamic autoscaling and power consumption in recent years. The behavior of cloud workload time-series at subminute scale is highly chaotic and volatile; therefore, traditional machine learning-based time-series analysis approaches fail to obtain accurate predictions. In recent years, deep learning-based schemes are suggested to predict highly nonlinear cloud workloads, but sometimes they fail to obtain excellent prediction results. Hence, demands for more accurate prediction algorithm exist. In this paper, we address this issue by proposing a hybrid E2LG algorithm, which decomposes the cloud workload time-series into its constituent components in different frequency bands using empirical mode decomposition method which reduces the complexity and nonlinearity of prediction model in each frequency band. Also, a new state-of-the-art ensemble GAN/LSTM deep learning architecture is proposed to predict each sub band workload time-series individually, based on its degree of complexity and volatility. Our novel ensemble GAN/LSTM architecture, which employs stacked LSTM blocks as its generator and 1D ConvNets as discriminator, can exploit the long-term nonlinear dependencies of cloud workload time-series effectively specially in high-frequency, noise-like components. By validating our approach using extensive set of experiments with standard real cloud workload traces, we confirm that E2LG provides significant improvements in cloud workload prediction accuracy with respect to the mean absolute and standard deviation of the prediction error and outperforming traditional and state-of-the-art deep learning approaches. It improves the prediction accuracy at least 5% and 12% in average compared to the main contemporary approaches in recent papers such as hybrid methods which employs CNN, LSTM or SVR.
- Published
- 2021
40. Research on GPU parallel algorithm for direct numerical solution of two-dimensional compressible flows
- Author
-
Jun’an Zhang, Yongzhen Wang, and Xuefeng Yan
- Subjects
Speedup ,Computer science ,Computation ,Parallel algorithm ,Graphics processing unit ,Direct numerical simulation ,Finite difference ,Upwind scheme ,Theoretical Computer Science ,Computational science ,Hardware and Architecture ,Central processing unit ,Software ,Information Systems - Abstract
In this paper, a novel parallel algorithm is proposed to solve the problems of heavy computation and long simulation time in the field of compressible flows. In this algorithm, a third-order upwind scheme and a fourth-order central difference scheme are employed, with a third-order Runge-Kutta method for time stepping. Considering the powerful floating-point computing ability of the Graphics Processing Unit (GPU), this paper establishes the algorithm on the basis of GPU. Moreover, the direct numerical simulation method is adopted in this algorithm to improve the solution accuracy of the simulation results. To further enhance the efficiency of the algorithm, several optimization strategies are explored in the design of the algorithm as well. Both accuracy and feasibility of the algorithm are verified by a classical two-dimensional example. Compared with solving this example on the Central Processing Unit platform, the experimental results demonstrate that the maximum speedup ratio achieved by our approach is 18.03 times.
- Published
- 2021
41. Parallel optimization of the ray-tracing algorithm based on the HPM model
- Author
-
Wang Yi-Ou, Ding Gangyi, Zhang Fu-quan, Li Yu-Gang, and Wang Jun-Feng
- Subjects
Basis (linear algebra) ,Image quality ,Computer science ,Node (networking) ,Parallel optimization ,Division (mathematics) ,Theoretical Computer Science ,Hardware and Architecture ,Parallelism (grammar) ,Ray tracing (graphics) ,Algorithm ,Time complexity ,Software ,Information Systems - Abstract
This paper proposes a parallel computing analysis model HPM and analyzes the parallel architecture of CPU–GPU based on this model. On this basis, we study the parallel optimization of the ray-tracing algorithm on the CPU–GPU parallel architecture and give full play to the parallelism between nodes, the parallelism of the multi-core CPU inside the node, and the parallelism of the GPU, which improve the calculation speed of the ray-tracing algorithm. This paper uses the space division technology to divide the ground data, constructs the KD-tree organization structure, and improves the construction method of KD-tree to reduce the time complexity of the algorithm. The ground data is evenly distributed to each computing node, and the computing nodes use a combination of CPU–GPU for parallel optimization. This method dramatically improves the drawing speed while ensuring the image quality and provides an effective means for quickly generating photorealistic images.
- Published
- 2021
42. Optimal multilevel media stream caching in cloud-edge environment
- Author
-
Chunlin Li, Yihan Zhang, Youlong Luo, and Hengliang Tang
- Subjects
020203 distributed computing ,Network architecture ,Hardware_MEMORYSTRUCTURES ,business.industry ,Computer science ,Multitier architecture ,Cloud computing ,02 engineering and technology ,Theoretical Computer Science ,Hardware and Architecture ,Knapsack problem ,Server ,0202 electrical engineering, electronic engineering, information engineering ,Enhanced Data Rates for GSM Evolution ,Cache ,Greedy algorithm ,business ,Software ,Information Systems ,Computer network - Abstract
Due to the problem of high link load of edge cache and small storage space of edge server, a caching architecture by the collaborative of edge nodes and the cloud server is proposed. The content cache location is designed and optimized, which can be the content provider, cloud server (CS), and edge node (EN). In the proposed system, cloud servers collaborate with edge servers and the performance of content caching can be improved by coordinating caching on the cloud server or caching on the edge server. In this paper, a cloud-edge collaborative caching model based on the greedy algorithm is proposed, which includes the content caching model and collaborative caching model. Network architecture, file popularity estimation, link capacity, and other factors are considered in the model. Correspondingly, a cloud-edge collaborative cache algorithm based on a greedy algorithm is proposed. The related optimization problem is decomposed into the knapsack problem of cache layout in each layer, and then the greedy algorithm is used to solve the knapsack problem of cache placement and cooperative cache proposed in this paper. The affiliation between CS cache and EN caches in the layered architecture is improved and recognized. In the experimental results, the link load is reduced, the cache hit rate is improved by using the proposed method of edge caching, and it also has obvious advantages in the average end-to-end service delay.
- Published
- 2021
43. An intelligent IoT-based positioning system for theme parks
- Author
-
Sina Einavipour and Reza Javidan
- Subjects
020203 distributed computing ,Focus (computing) ,Positioning system ,Computer science ,business.industry ,Real-time computing ,02 engineering and technology ,Theoretical Computer Science ,Hardware and Architecture ,Video tracking ,0202 electrical engineering, electronic engineering, information engineering ,Frequency-hopping spread spectrum ,Radio-frequency identification ,business ,Theme (computing) ,Software ,Information Systems - Abstract
With the advent of the Internet of Things (IoT) and ubiquitous presence of sensor nodes, positioning technologies have become a topic of interest among researchers. While the applications of positioning systems are very vast, determining the position of moving sensor nodes, finding missing people in large areas, and object tracking are among the most popular ones. The focus of this paper is to propose a positioning system to locate missing people in theme parks. Currently, radio frequency identification (RFID) systems are used in modern theme parks to locate lost visitors. In these systems, a wristband with active RFID tag is given to each visitor and RFID readers are deployed in predetermined locations. When a visitor is in the communication range of a reader, its location can be estimated based on the location of the reader. Therefore, the accuracy of these systems is relevant to the communication range of readers. Another limitation of RFID-based systems is due to the fact that readers cannot be placed in communication range of each other as they can interference with each other. It is clear that the only way to increase the accuracy of such systems is by increasing the number of readers and decreasing the communication range of each reader. In this paper, a Bluetooth low energy (BLE)-based system is proposed to be used in theme parks for locating lost visitors. The advantage of using BLE is due to the fact that it uses frequency hopping spread spectrum (FHSS) thus readers can be placed in communication range of each other without severe interference. In the proposed method, at first, the optimal places for deploying readers are obtained using ant colony optimization (ACO). Then, a fuzzy approach is used to increase the accuracy of the system. Three different signal levels are defined to be used in our fuzzy system based on which the location of visitors can be estimated. By using three levels of signal strength, the accuracy of the system is increased compared with the similar system with the similar number of readers. The simulation results show that the accuracy of the system is improved using this method, and the cost of the system is decreased as BLE readers are much less expensive than their RFID counterparts.
- Published
- 2021
44. Revisiting non-tree routing for maximum lifetime data gathering in wireless sensor networks
- Author
-
Xiaojun Zhu
- Subjects
020203 distributed computing ,Computer science ,Node (networking) ,Maximum flow problem ,02 engineering and technology ,Topology ,Theoretical Computer Science ,Tree structure ,Hardware and Architecture ,Path (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,Routing (electronic design automation) ,Time complexity ,Wireless sensor network ,Software ,Information Systems - Abstract
Wireless sensor networks usually adopt a tree structure for routing, where each node sends and forwards messages to its parent. However, lifetime maximization with tree routing structure is NP-hard, and all algorithms attempting to find the optimal solution run in exponential time unless $$P=\mathrm{NP}$$ . This paper revisits the problem of non-tree routing structure, where a node can send different messages to different neighbors. Though lifetime maximization with non-tree routing can be solved in polynomial time, the existing method transforms it into a series of maximum flow problems, which are either complicated or with high running time. This paper proposes an algorithm with O(mn) running time, where m is the number of edges and n is the number of nodes. The heart of the algorithm is a method to find a routing path from any node to the sink in O(m) time without disconnecting existing routing paths. The proposed algorithm is also suitable for distributed implementation. When a node fails, each influenced node can establish a new routing path in O(m) time. Simulations are conducted to compare the optimal lifetimes of tree structure and non-tree structure on random networks. The results verify the effectiveness of the proposed algorithm.
- Published
- 2021
45. High performance of brain emotional intelligent controller for DTC-SVM based sensorless induction motor drive
- Author
-
Sridhar Savarapu and Yadaiah Narri
- Subjects
020203 distributed computing ,Electronic speed control ,Computer science ,Rotor (electric) ,Stator ,02 engineering and technology ,Theoretical Computer Science ,law.invention ,Support vector machine ,Stator voltage ,Direct torque control ,Hardware and Architecture ,Control theory ,law ,Adaptive system ,0202 electrical engineering, electronic engineering, information engineering ,Software ,Induction motor ,Information Systems - Abstract
This paper introduces the application of the induction motor (IM) drive brain emotional intelligent controller (BEIC). Intelligent regulation, modelled on the human brain, is capable of generating impulses and is used as a controller. A Model Reference Adaptive System is developed using stator current and stator voltages, which are further developed with BEIC to approximate the rotor rpm. This paper proposes that speed estimation using BEIC for direct torque control (DTC) of IM drive. The experimental work is conducted on a hardware-in-loop mechanism using a real-time digital simulator (Op-RTDS-OP5600). The simulation and test results are discussed. The proposed method is compared to the DTC-SVM-based IM drive speed control with the existing controllers.
- Published
- 2021
46. VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture
- Author
-
Vladimir V. Voevodin, Kazuhiko Komatsu, Ilya V. Afanasyev, and Hiroaki Kobayashi
- Subjects
Structure (mathematical logic) ,Connected component ,Speedup ,Computer science ,Parallel computing ,Supercomputer ,Graph ,Theoretical Computer Science ,Vector processor ,Vector graphics ,Hardware and Architecture ,Programming paradigm ,Software ,Information Systems - Abstract
Developing efficient graph algorithms implementations is an extremely important problem of modern computer science, since graphs are frequently used in various real-world applications. Graph algorithms typically belong to the data-intensive class, and thus using architectures with high-bandwidth memory potentially allows to solve many graph problems significantly faster compared to modern multicore CPUs. Among other supercomputer architectures, vector systems, such as the SX family of NEC vector supercomputers, are equipped with high-bandwidth memory. However, the highly irregular structure of many real-world graphs makes it extremely challenging to implement graph algorithms on vector systems, since these implementations are usually bulky and complicated, and a deep understanding of vector architectures hardware features is required. This paper presents the world first attempt to develop an efficient and simultaneously simple graph processing framework for modern vector systems. Our vector graph library (VGL) framework targets NEC SX-Aurora TSUBASA as a primary vector architecture and provides relatively simple computational and data abstractions. These abstractions incorporate many vector-oriented optimization strategies into a high-level programming model, allowing quick implementation of new graph algorithms with a small amount of code and minimal knowledge about features of vector systems. In this paper, we evaluate the VGL performance on four widely used graph processing problems: breadth-first search, single source shortest paths, connected components, and page rank. The provided comparative performance analysis demonstrates that the VGL-based implementations achieve significant acceleration over the existing high-performance frameworks and libraries: up to 14 times speedup over multicore CPUs (Ligra, Galois, GAPBS) and up to 3 times speedup compared to NVIDIA GPU (Gunrock, NVGRAPH) implementations.
- Published
- 2021
47. GPU-based embedded edge server configuration and offloading for a neural network service
- Author
-
Joo-Hwan Kim, Shan Ullah, and Deok-Hwan Kim
- Subjects
020203 distributed computing ,Artificial neural network ,business.industry ,Computer science ,Graphics processing unit ,Cloud computing ,02 engineering and technology ,Theoretical Computer Science ,Edge server ,Computer architecture ,Hardware and Architecture ,Server ,0202 electrical engineering, electronic engineering, information engineering ,The Internet ,Enhanced Data Rates for GSM Evolution ,Latency (engineering) ,business ,Software ,Edge computing ,Information Systems - Abstract
Recently, emerging edge computing technology has been proposed as a new paradigm that compensates for the disadvantages of the current cloud computing. In particular, edge computing is used for service applications with low latency while using local data. For this emerging technology, a neural network approach is required to run large-scale machine learning on edge servers. In this paper, we propose a pod allocation method by adding various graphics processing unit (GPU) resources to increase the efficiency of a Kubernetes-based edge server configuration using a GPU-based embedded board and a TensorFlow-based neural network service application. As a result of experiments performed on the proposed edge server, the following are inferred: 1) The bandwidth, according to the time and data size, changes in local (20.4–42.4 Mbps) and Internet environments (6.31–25.5 Mbps) for service applications. 2) When two neural network applications are run on an edge server consisted with Xavier, TX2 and Nano, the network times of the object detection application are from 112.2 ms (Xavier) to 515.8 ms (Nano); the network times of the driver profiling application are from 321.8 ms (Xavier) to 495.7 ms (Nano). 3) The proposed pod allocation method demonstrates better performance than the default pod allocation method. We observe that the number of allocatable pods on three worker nodes increases from five to seven, and compared to other papers, the proposed offloading shows similar or better response times in environments where multiple deep learning applications are implemented.
- Published
- 2021
48. HSAC-ALADMM: an asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication
- Author
-
Yongmei Lei, Dongxia Wang, Jinyang Xie, and Guozheng Wang
- Subjects
Optimization problem ,Computer science ,Node (networking) ,Payload (computing) ,Filter (signal processing) ,Theoretical Computer Science ,Hardware and Architecture ,Asynchronous communication ,Multithreading ,Scalability ,Algorithm ,Software ,Information Systems ,Sparse matrix - Abstract
The distributed alternating direction method of multipliers (ADMM) is an effective algorithm for solving large-scale optimization problems. However, its high communication cost limits its scalability. An asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication mode (HSAC-ALADMM) is proposed to reduce the communication cost of the distributed ADMM: firstly, this paper proposes a lazily aggregate parameters strategy to filter the transmission parameters of the distributed ADMM, which reduces the payload of the node per iteration. Secondly, a hierarchical sparse allreduce communication mode is tailored for sparse data to aggregate the filtered transmission parameters effectively. Finally, a Calculator-Communicator-Manager framework is designed to implement the proposed algorithm, which combines the asynchronous communication protocol and the allreduce communication mode effectively. It separates the calculation and communication by multithreading, thus improving the efficiency of system calculation and communication. Experimental results for the L1-regularized logistic regression problem with public datasets show that the HSAC-ALADMM algorithm is faster than existing asynchronous ADMM algorithms. Compared with existing sparse allreduce algorithms, the hierarchical sparse allreduce algorithm proposed in this paper makes better use of the characteristics of sparse data to reduce system time in multi-core cluster.
- Published
- 2021
49. Efficient design and implementation of a robust coplanar crossover and multilayer hybrid full adder–subtractor using QCA technology
- Author
-
Mukesh Patidar and Namit Gupta
- Subjects
020203 distributed computing ,Adder ,Computer science ,Circuit design ,Crossover ,Quantum dot cellular automaton ,Hardware_PERFORMANCEANDRELIABILITY ,02 engineering and technology ,Theoretical Computer Science ,Euler method ,symbols.namesake ,Hardware and Architecture ,Subtractor ,Hardware_INTEGRATEDCIRCUITS ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Electronic engineering ,Software ,Hardware_LOGICDESIGN ,Information Systems ,Electronic circuit - Abstract
Quantum dot cellular automaton (QCA) is a novel emerging nanometer-scale-based circuit design using nanocomputing technology, which overcomes the limitations of complementary MOS technology in the precondition of the circuit design area, power, and latency/delay. This paper presents an efficient design of crossover single-layer (coplanar) and multilayer novel hybrid full adder–subtractor circuits by implementing majority gate minimization functional J-map technique. The proposed circuits have been found more efficient in terms of minimum number of QCA cells, low latency, required area in µm2, and reduced quantum cost as compared to existing QCA adder–subtractor designs and also avoid the thermodynamics problems occurring due to long QCA wires with the applied synchronization clocking method. In this paper, we have introduced 14 nm × 14 nm and 16 nm × 16 nm cell size QCA circuits and compared with an existing and proposed novel 18 nm × 18 nm single-layer and multilayer designs. Both designs are implemented by the QCADesigner-E tool with bistable vector and coherent vector energy setup in the Euler method and the Runge–Kutta method.
- Published
- 2021
50. Design and implementation of an academic expert system through big data analysis
- Author
-
Jaesoo Yoo, Hyeonbyeong Lee, Kyoungsoo Bok, and Dojin Choi
- Subjects
Influence factor ,Impact factor ,Computer science ,business.industry ,media_common.quotation_subject ,Big data ,computer.software_genre ,Data science ,Expert system ,Field (computer science) ,Theoretical Computer Science ,Hardware and Architecture ,Factor (programming language) ,Quality (business) ,business ,computer ,Software ,Information Systems ,media_common ,computer.programming_language - Abstract
Most researchers establish research directions in their study of new fields by providing expert advice or publishing expert papers. The existing academic search services display papers by field but do not provide experts by field. Therefore, researchers are left to judge experts in each field by analyzing the papers for themselves. In this paper, we design and implement an expert search system based on papers that have been published in the academic societies. The academic expert search system is based on a big data processing system to handle a large amount of data in academic fields. It calculates an expert score using quality and influence factors. The quality factor is calculated based on the citations, impact factor, and recentness of a paper. The influence factor is measured by the sparsity of a field and the degree of contributiveness of an author. The proposed system provides various services such as expert searches, keyword searches, the hot topics, expert relationships, and academic society statistics. By finding experts in a specific field, our system can support researchers’ research activities.
- Published
- 2021
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.