Descriptor: "Big data processing" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Big data processing"' showing total 841 results

Start Over Descriptor "Big data processing"

841 results on '"Big data processing"'

1. Event Evolution Analysis of Network Text Based on Pre-trained Language Model and Event Graph

Author: Yang, Jinshun, Huang, Shuangxi, Huang, Mingfeng, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Luo, Yuhua, editor
Published: 2024
Full Text: View/download PDF

2. A Data Modeling Process for Achieving Interoperability

Author: Kouremenou, Eleftheria, Kiourtis, Athanasios, Kyriazis, Dimosthenis, Magjarević, Ratko, Series Editor, Ładyżyński, Piotr, Associate Editor, Ibrahim, Fatimah, Associate Editor, Lackovic, Igor, Associate Editor, Rock, Emilio Sacristan, Associate Editor, Costin, Hariton-Nicolae, editor, and Petroiu, Gladiola Gabriela, editor
Published: 2024
Full Text: View/download PDF

3. Walks on Algebraic Small World Graphs of Large Girth and New Secure Stream Ciphers

Author: Ustimenko, Vasyl, Chojecki, Tymoteusz, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

4. A Unified Approach to Real-Time Public Transport Data Processing

Author: Lazúr, Juraj, Hynek, Jiří, Hruška, Tomáš, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Rocha, Álvaro, editor, Adeli, Hojjat, editor, Dzemyda, Gintautas, editor, Moreira, Fernando, editor, and Poniszewska-Marańda, Aneta, editor
Published: 2024
Full Text: View/download PDF

5. Big Data Driven Map Reduce Framework for Automated Flood Disaster Detection Based on Heuristic-Based Ensemble Learning.

Author: Ali Shatat, Abdallah Saleh, Mobin Akhtar, Md., Zamani, Abu Sarwar, Dilshad, Sara, and Samdani, Faizan
Abstract: Flooding disaster causes huge impacts on the socio-economic world. In the inundated area, some geo-referenced images are shared through some media posts, which assist in providing alertness to the critical volunteers and managing the financial loss crisis. In this work, the Adaptive Billiards-Inspired Optimization (A-BIO) and Optimized Ensemble-learning-based detection (OED) with map reduce framework is proposed for flood disaster detection. Initially, the big data is gathered and processed for detection. During the map phase, data preprocessing is performed to enhance the performance of the data, which helps in removing the noise or unwanted attributes. Furthermore, the reduction phase can be done through weighted feature selection, where the features to be selected and the weight is optimized through A-BIO, which assists in getting the most significant features for improving the performance and reducing the complexity of the designed model. Finally, OED is performed by a set of classifiers like Convolutional Neural Networks, Adaboost, XGBoost, Long Short-Term Memory, and Deep Neural Networks, where the parameters of ensemble learning classifiers are optimized by A-BIO algorithm. Finally, through the performance analysis, this detection model can provide high accuracy and better detection performance to avoid the huge impacts of flood disasters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. An efficient architecture for processing real-time traffic data streams using apache flink.

Author: Deepthi, B. Gnana, Rani, K. Sandhya, Krishna, P. Venkata, and Saritha, V.
Abstract: Big Data technologies emerging day by day and are making drastic changes in various real-world applications. Traditional data mining tools adequate to process volumes of data but from past decades the rapid growth in data becomes difficult for processing. Due to continuous flow of data, data streams require additional computational processing than the traditional one. Big data stream processing considers different features of the data streams heterogeneity, scalability, fault tolerance and query optimization. Efficient implementation of these features in real-world applications using big data analytics is a challenging job during data storage, processing, and analysis phases. Therefore, the proposed model FRTSPS is a generic architecture which is influenced by popular big data processing Lambda architecture, based on distributed computing platform. The architecture using open-source platform Apache Flink for doing data processing. Flink is a popular platform for processing historical and stream data flows at once parallelly. Its stateful streaming can obtain more scalability and flexibility along with high throughput and low latency than the remaining stream processing programming models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Big data processing and analysis platform based on deep neural network model

Author: Sheng Huang
Subjects: Big data processing, Analytics platform, Deep neural network, Stock prediction, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Users are increasingly turning to big data processing systems to extract valuable information from massive datasets as the field of big data grows. Data analytics platforms are used by e-commerce enterprises to improve product suggestions and model business processes. In order to meet the needs of large-scale data center operation and maintenance management, Internet companies often use Flink to process log data. This paper takes the big data processing and analysis platforms built by Internet financial companies and large banks as examples, and implants a stock prediction model based on Deep Neural Network (DNN). In this context, this paper completes the following work: 1) The research status of big data processing and analysis platforms at home and abroad is introduced. 2) Drawing on the modular design idea, the commercial bank big data platform is designed and the functions of each sub-module are introduced. Then the basic principle and structure of Convolutional Neural Networks (CNN) are expounded. 3) The optimal parameters of Convolutional Neural Networks are selected through experiments, and then the trained model is used for experiments. It can be seen that the stock prediction model proposed in this article has a higher prediction accuracy compared to existing models, which also verifies the validity of the proposed model. Input the data and compare the obtained results with the actual results, and finally show that the model in this paper has a good performance on stock prediction.
Published: 2024
Full Text: View/download PDF

8. A mapreduce-based approach for shortest path problem in road networks.

Author: Zhang, Dongbo, Shou, Yanfang, and Xu, Jianmin
Abstract: In the era of big data, using of data mining instead of data collection represents a new challenge for researchers and engineers. In the field of transportation, computing of the shortest path based on MapReduce using widely existing vehicle data is meaningful both in theory and practice. Therefore, this article proposes a simple shortest path approach to relieve urban traffic congestion. The objective is not to guarantee the optimality but to provide high-quality solutions in acceptable computational time. The proposed approach is based on partitioning of original graph into a set of subgraphs, and parallel solving of the shortest path for each subgraph in order to obtain a solution for the original graph. An iterative procedure is introduced to improve the accuracy. The experimental results show that proposed approach significantly reduces computational time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Harmonizing Dimensionality: Unveiling the Prowess of Variational Auto-Encoder in Spark for Big Data Processing.

Author: Jawad, Wasnaa and Al-Bakry, Abbas
Subjects: DISTRIBUTED computing, MACHINE learning, BIG data
Abstract: In the dynamic realm of big data processing, conquering the challenges imposed by highdimensional datasets is imperative. This paper introduces a groundbreaking advancement in dimensionality reduction, employing Variational Auto-Encoder (VAE) within the Spark distributed framework. The deliberate selection of the "TLC" dataset, representative of New York City taxi trips with inherent high dimensionality, highlights the practicality of our approach. Our research showcases the virtuoso performance of VAE, achieving an impressive 95.12% reduction ratio and 89.26% accuracy. This highlights VAE's ability to elegantly distill essential information while discarding superfluous dimensions, achieving a harmonious balance between reduction and accuracy. Furthermore, building on the demonstrated superiority of Spark over Hadoop in prior successes, our adoption of VAE aligns with the overarching goal of enhancing big data processing. Spark's consistent advantage as a distributed framework reaffirms its reliability in handling diverse machine learning algorithms. This paper not only contributes to the advancement of machine learning in big data processing but also underscores the adaptability, versatility, and consistent performance of our approach across various methodologies and frameworks. The success of VAE in reducing dimensionality, coupled with Spark's inherent advantages, positions this research as a valuable contribution to the exploration of advanced techniques in distributed big data processing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Cellular automata-based MapReduce design: Migrating a big data processing model from Industry 4.0 to Industry 5.0

Author: Arnab Mitra
Subjects: Cyber-physical systems (CPS), Industry 4.0, Industry 5.0, Big data processing, MapReduce model, Elementary cellular automata (ECAs), Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: A successful deployment of Industry 5.0 is significantly dependent on the synergetic integration of several advanced technologies such as big data processing, Artificial Intelligence (AI) integration, and several effective digitization techniques that emphasize the uses of Robotics, Internet of Things (IoTs), Cloud Computing, etc. with active participations from human workers. Several researchers have explored the importance of big data processing in view of Industry 4.0 as it facilitated enhanced production at any smart manufacturing line by ensuring efficient process management, typically involving big data processing. Researchers presented several MapReduce-based data processing models at smart manufacturing lines to facilitate big data processing. Among several others, the Elementary Cellular Automata (ECAs)-based MapReduce model was introduced as an energy-efficient low-cost model for big data processing in view of the Industry 4.0 scenario. In the present research, an investigation is further proposed to explore the true potential (if any) for the ECAs-based MapReduce model with reference to available several MapReduce models, to migrate an existing big data processing model from Industry 4.0 into the future i.e., Industry 5.0. Investigation results as achieved from the comparison among several MapReduce models and further examinations about the inherent quality of shuffle in those MapReduce blocks, explore the inherent advantages of ECAs-based MapReduce model towards its choice for big data processing in Industry 5.0 scenario.
Published: 2024
Full Text: View/download PDF

11. Big Data Processing in Smart City Application Using 6G Driven IoT Framework

Author: Sun, Maojin and Sun, Minghui
Published: 2024
Full Text: View/download PDF

12. Application of Artificial Intelligence in Computer Neural Network Algorithm Technology in the Age of Big Data

Author: Zhou Sheng
Subjects: neural network algorithm, particle swarm algorithm, tenno whisker algorithm, mapreduce, big data processing, 62-07, Mathematics, QA1-939
Abstract: The arrival of the big data era makes the amount of data explosive growth, which puts forward new challenges and demands for computer network technology, and the integration of big data and network technology has become an important trend. This paper uses the optimization strategy and the elimination mechanism of the genetic algorithm to optimize the inertia weight and particle position speed updating mechanism of the particle swarm algorithm and combines the searching method of the Tennessee whisker algorithm with the sharing mechanism of the particle swarm algorithm to achieve the optimal data searching ability. Finally, the improved artificial intelligence algorithm and MapReduce are combined to improve the performance of the computer neural network algorithm in big data processing. The average data redundancy rate of this paper’s algorithm for big data processing is only 1.18%, and the resource integration checking rate always exceeds 85%, according to simulation experiments. In addition, the algorithm also shows good performance in practical applications, and it can achieve accurate classification of big data labels in big data label classification tasks while maintaining a low energy overhead. Meanwhile, it can accurately recognize electronic medical record data in large medical databases. Big data processing can benefit greatly from the proposed neural network algorithm in this paper.
Published: 2024
Full Text: View/download PDF

13. An Exploration of the Application of Principal Component Analysis in Big Data Processing

Author: Li Guo and Qin Yi
Subjects: principal component analysis (pca), big data processing, data dimensionality reduction, data compression, 68q05, Mathematics, QA1-939
Abstract: With the arrival of the significant data era, efficiently processing large-scale multidimensional data has become challenging. As a powerful data dimensionality reduction tool, Principal Component Analysis (PCA) plays a vital role in big data processing, especially in information extraction and data simplification, showing unique advantages. The research aims to simplify the data processing process and improve the data processing efficiency by PCA method. The research method adopts the basic theory of PCA, the improvement of the weighted principal component analysis algorithm, and standardized and homogenized data processing techniques to process large-scale multidimensional data sets. The results show that the data dimensionality is significantly reduced after using PCA, for example, in the Analysis of the earnings quality of listed companies in the e-commerce industry, the cumulative variance contribution rate of the first four principal components extracted by PCA reaches 81.623%, which effectively removes the primary information of the original data. PCA not only reduces the complexity of the data, but also retains a large amount of crucial information, which is a significant application value for the processing of big data, especially in the fields of data compression and pattern recognition.
Published: 2024
Full Text: View/download PDF

14. Production Planning Digitalization Using Open-Source Big Data Technologies

Author: Suleykin, Alexander, Panfilov, Peter, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Haddar, Mohamed, Editorial Board Member, Kwon, Young W., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Durakbasa, Numan M., editor, and Gençyılmaz, M. Güneş, editor
Published: 2023
Full Text: View/download PDF

15. Electric Meters Monitoring System for Residential Buildings

Author: Nataliia, Fedorova, Yevgen, Havrylko, Artem, Kovalchuk, Denys, Smakovskiy, Iryna, Husyeva, Xhafa, Fatos, Series Editor, Hu, Zhengbing, editor, Wang, Yong, editor, and He, Matthew, editor
Published: 2023
Full Text: View/download PDF

16. Educational Management Data Based on Performance Appraisal Model

Author: Shang, Zhuo, Li, Kan, Editor-in-Chief, Li, Qingyong, Associate Editor, Fournier-Viger, Philippe, Series Editor, Hong, Wei-Chiang, Series Editor, Liang, Xun, Series Editor, Wang, Long, Series Editor, Xu, Xuesong, Series Editor, Zhan, Zehui, editor, Zou, Bin, editor, and Yeoh, William, editor
Published: 2023
Full Text: View/download PDF

17. Research on key architecture and model of coal mine water hazard intelligent early warning system

Author: Hao QIU, Hongjie LI, Wen LI, Jianghua LI, Mingze DU, and Peng JIANG
Subjects: mine water hazard, intelligent early warning, deep learning, big data processing, intelligent computing, Mining engineering. Metallurgy, TN1-997
Abstract: In order to ensure the safe production of mine threatened by water hazard, speed up the intelligent process of mine water hazard prediction and early warning technology, and improve the effect of mine water hazard prediction and early warning, based on the research status of water hazard mechanism and monitoring and early warning at home and abroad, four types of key technical issues for constructing water hazard monitoring and intelligent early warning systems are analyzed. The complexity of early warning requirements and data access standards, the classification and spatio-temporal matching of multi-source heterogeneous big data information, the intelligent processing and analysis of water hazard big data information, and the timeliness of early warning and intelligent decision information release are discussed in detail. From the perspective of early warning system resource integration and data drive, water hazard warning resources are divided into information collection resources and computing resources, water hazard warning big data information is divided into static source information and dynamic monitoring information, and data processing is divided into basic geological model data processing, numerical processing and Computational simulation and information fusion data processing divide coal mine disaster early warning into primary monitoring parameter early warning, intermediate index grading early warning, and advanced intelligent model early warning. The key technical architecture of an intelligent warning system for coal mine water hazards is proposed and analyzed. A software service architecture that meets the technical requirements is proposed, including infrastructure layer, data resource layer, application support layer, business application layer, and user presentation layer. Based on the water hazard warning construction process, a Gated Recurrent Unit algorithm warning model for water hazard monitoring data is proposed, and the network structure of the warning model is given. The forward calculation, backward propagation calculation, and weight gradient calculation methods of the warning model are studied. The classification of different types of perception data access, storage, encoding, models, construction and testing of intelligent deep learning models, and technical paths for warning information release are analyzed. It provides a reference for the intelligent construction of coal mine water hazard early warning.
Published: 2023
Full Text: View/download PDF

18. Designing a Data Warehouse for Collected Data About User Activity in Social Networks Using Elasticsearch

Author: Iryna Mysiuk
Subjects: social networks, data warehouse, data analytics, big data processing, system design, Science
Abstract: In this paper, a data storage data warehouse is designed to store collected data from social networks. Creating indexes with data and selecting a configuration with the appropriate number of shards and replicas is described – the primary states of the cluster and possibilities of its scaling. The features of working with the non-relational Elasticsearch database are described when working with data on user activity in social network posts. Among social networks, Facebook and Instagram were chosen for analysis. The paper describes the advantages and disadvantages of using such a data store compared to Apache Kafka. Analysed existing data insertion Application Program Interfaces (APIs) and data visualisation tools integrated with Elasticsearch. The study describes the use of the Bulk API to insert many records at once into a database. The designed data warehouse uses Kibana, a data visualisation and analytics tool integrated with the selected database. Also, it is shown the ability to insert and view logs using Elasticsearch, Logstash, and Kibana (ELK stack). Tested data ingest by logging into the database using Beats. The obtained results can help implement a system for analysing user activities from social network data based on Elasticsearch as a central component.
Published: 2023
Full Text: View/download PDF

19. 煤矿水害智能预警系统关键架构及模型研究.

Author: 邱浩, 李宏杰, 李文, 李江华, 杜明泽, and 姜鹏
Subjects: MINE water, DEEP learning, COAL mining, GEOLOGICAL modeling, SOFTWARE architecture, SYSTEM integration
Abstract: Copyright of Coal Science & Technology (0253-2336) is the property of Coal Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

20. Big Data-Based Smart Health Monitoring System: Using Deep Ensemble Learning

Author: Mustufa Haider Abidi, Usama Umer, Syed Hammad Mian, and Abdulrahman Al-Ahmari
Subjects: Smart health monitoring system, eHealth, health care 40, big data processing, deep ensemble learning, extreme learning machine, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Human life has become smarter by utilizing big data, telecommunication technologies, and wearable sensors over pervasive computing to give better healthcare services. Big data is built with the possibility to improve the healthcare industry. Big data makes the interconnection between patients, wearable sensors, healthcare caregivers, and providers through the utilization of Information and Communication Technology (ICT) and software. Most of the economic challenges in developing countries are caused by the healthcare sector, which occurs predominantly due to the increasing population requiring more quality of care concerning older people. Older people need great attention and care as they lead with irreparable damages when a minor accident or insignificant disease occurs. Therefore, the necessity of implementing new technologies and tools has arisen to support senior citizens regarding their healthcare. Various advancements in wireless technology, miniaturization, computing power, and processing made diverse healthcare innovations that led to developing the connected medical devices. Hence, this proposal develops a new healthcare monitoring system for tracking the activities of elderly people, where the Hadoop MapReduce technique for parallel processing the large-sized data. The data collected as mentioned in the available datasets is performed by using the numerous wearable sensors fixed on the “subject’s left ankle, right arm, and chest” that are transformed to the cloud platform and also to the data analytics layer according to the Internet of Medical Things (IoMT) devices. The given input undergoes data splitting to produce tiny chunks. These small chunks of the input files are then considered as Map tasks. Here, in the map phase, the features are optimally selected by the Hybrid Dingo Coyote Optimization (HDCO). The combiner phase classifies the physical activities using the developed Deep Ensemble Learning (DEL) consisting of classifiers such as “Extreme Learning Machine (ELM), deep Convolutional Neural Network (CNN), Long short-term memory (LSTM), Deep Belief Network (DBN), and Deep Neural Network (DNN)”. The parameter tuning in these classifiers is done by the same HDCO. The reducer phase extracts data from different chunks by merging the same classes. The developed HDCO-DEL has secured 13.66%, 16.01%, 17.33%, 13.6%, and 14.01% better accuracy than ELM, CNN, LSTM, DBN, DNN, and HealthFog, respectively on second dataset. The comparison with existing methods shows its better performance and also predicts physical activities with overall high accuracy.
Published: 2023
Full Text: View/download PDF

21. Programming and processing of big data using python language in medicine

Author: Ergashev Otabek, Mamadaliev Nurillo, Khonturaev Sardorbek, and Sobirov Muzaffar
Subjects: neural network, big data, data analysis, big data processing, python library, Environmental sciences, GE1-350
Abstract: This article is devoted to the use and further application of Python libraries in the medical industry. These libraries include NumPy, Pandas, Scikit-learn, Keras and TensorFlow, Matplotlib, Seaborn, and Plotly. On the example of the Keras library, the problem associated with the use of medical data analysis was considered.
Published: 2024
Full Text: View/download PDF

22. EODIE — Earth Observation Data Information Extractor

Author: Samantha Wittke, Anne Fouilloux, Petteri Lehti, Juuso Varho, Arttu Kivimäki, Maiju Karhu, Mika Karjalainen, Matti Vaaja, and Eetu Puttonen
Subjects: Remote sensing, Big data processing, Earth observation, Open-source software, Computer software, QA76.75-76.765
Abstract: Remote sensing satellites provide a vast amount of data to monitor and observe Earth’s surface and events on it. To use these data efficiently in subsequent analysis and decision-making, highly automated easy-to-use tools are needed. Here, we present Earth Observation Data Information Extractor (EODIE). EODIE is a toolkit to extract object-level time-series information from several multispectral satellite remote sensing platforms and to produce analysis-ready products for subsequent data analysis. EODIE has a modular design that makes it adjustable for end-user requirements. Users have a possibility to exchange and add modules in EODIE for flexible processing in different computing environments. With EODIE, remote sensing data can be processed to object level array, geotiff or statistics information of different (vegetation) indices or plain wavelength intervals.
Published: 2023
Full Text: View/download PDF

23. TM‐HOL: Topic memory model for detection of hate speech and offensive language.

Author: Chen, Jing, Ma, Kun, Ji, Ke, and Chen, Zhenxiang
Subjects: HATE speech, SOCIAL media, DIGITAL technology, PSYCHOLOGICAL factors, SOCIAL networks
Abstract: In the era of the explosion of digital content of large‐scale self‐media, user‐friendly social platforms such as Twitter and Facebook, provide opportunities for people to express their ideas and opinions freely. Due to lack of restrictions, hateful speech and its exposure can have profound psychological impacts on society. Current social networking platform is over‐reliant on the manual check, and it is labor‐intensive and time‐consuming. Although there are many machines learning methods for the detection of hate speech, short text with character limit on social platforms is more challenging for the detection of hate speech and offensive language. To address the problem of data sparsity, we have proposed a topic memory model for hate speech and offensive language detection (abbreviated as TM‐HOL). Potential topics are generated with our encoder and decoder to enrich short text features. Two memory matrices correspond to the topic words and the text, and the hate feature matrix is used to learn the syntactic features. It is demonstrated that our proposed method is effective on three datasets, performing better weighted‐F1. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. 基于数据中台的园区经营监管平台的设计与实现.

Author: 张雯 and 周明升
Abstract: Copyright of Cyber Security & Data Governance is the property of Editorial Office of Information Technology & Network Security and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

25. Big Data Analytics Using Graph Signal Processing.

Author: Amin, Farhan, Barukab, Omar M., and Gyu Sang Choi
Subjects: SIGNAL processing, DIGITAL signal processing, SIGNAL filtering, BIG data, FOURIER transforms, COMMUNITIES
Abstract: The networks are fundamental to our modern world and they appear throughout science and society. Access to a massive amount of data presents a unique opportunity to the researcher's community. As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace. Therefore, this paper initiates a discussion on graph signal processing for large-scale data analysis. We first provide a comprehensive overview of core ideas in Graph signal processing (GSP) and their connection to conventional digital signal processing (DSP). We then summarize recent developments in developing basic GSP tools, including methods for graph filtering or graph learning, graph signal, graph Fourier transform(GFT), spectrum, graph frequency, etc. Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise. We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets. To show the suitability and the effeteness, we first created a noisy graph signal and then applied it to the filter. After several rounds of simulation results. We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal. By using this example application, we thoroughly demonstrated that graph filtration is efficient for big data analytics. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Characterization of the Spatiotemporal Behavior of a Sweeping System Using Supervised Machine Learning Enhanced with Feature Engineering

Author: Daya, Bechir Ben, Audy, Jean-François, Lamghari, Amina, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Tröltzsch, Fredi, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Reis, Ricardo, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Camarinha-Matos, Luis M., editor, Ortiz, Angel, editor, Boucher, Xavier, editor, and Osório, A. Luís, editor
Published: 2022
Full Text: View/download PDF

27. Supporting Semantic Data Enrichment at Scale

Author: Ciavotta, Michele, Cutrona, Vincenzo, De Paoli, Flavio, Nikolov, Nikolay, Palmonari, Matteo, Roman, Dumitru, Curry, Edward, editor, Auer, Sören, editor, Berre, Arne J., editor, Metzger, Andreas, editor, Perez, Maria S., editor, and Zillner, Sonja, editor
Published: 2022
Full Text: View/download PDF

28. Artificial Intelligence, Big Data Analytics and Big Data Processing for IoT-Based Sensing Data

Author: Ilmudeen, Aboobucker, Al-Turjman, Fadi, editor, Yadav, Satya Prakash, editor, Kumar, Manoj, editor, Yadav, Vibhash, editor, and Stephan, Thompson, editor
Published: 2022
Full Text: View/download PDF

29. Development of an Algorithm for Energy Efficient Resource Scheduling of a Multi-cloud Platform for Big Data Processing

Author: Legashev, Leonid V., Zabrodina, Lyubov S., Parfenov, Denis I., Bolodurina, Irina P., Xhafa, Fatos, Series Editor, Hu, Zhengbing, editor, Petoukhov, Sergey, editor, and He, Matthew, editor
Published: 2022
Full Text: View/download PDF

30. Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO) Algorithm-Based Efficient Heterogeneous Resource Scheduling for Big Data Processing on Clouds.

Author: Jagadish Kumar, N. and Balasubramanian, C.
Subjects: GOLDEN eagle, ELECTRONIC data processing, BIG data, HETEROGENEOUS computing, RESOURCE allocation, SCHEDULING, CLOUD computing, ON-demand computing
Abstract: Resource scheduling is indispensable for enhancing the system performance during big data processing on clouds. It is highly useful for attaining significant utilization of computing resources completely concentrating towards the facilitation of resource scalability and on-demand services. The resources essential for running different applications is determined to be maximum heterogeneous in cloud computing. This heterogeneous resource demand introduces a resource gap in which some of the resource potentialities are drained on par with the other resource potentialities still available in the same server resulting in imbalanced resource utilization. This imbalanced resource allocation condition is more apparent when the computing resources are more heterogeneous. At this juncture, intelligent resource scheduling strategy becomes essential to distribute resources for big data processing by adopting a potential decision-making process that focusses on the objective of achieving necessitated tasks over time. In this paper, Hybrid Gradient Descent Golden Eagle Optimization (HGDGEO) algorithm-based efficient heterogeneous resource scheduling process is proposed for handling the challenges that are highly possible during big data processing in the Hadoop heterogenous cloud environment. This HGDGEO algorithm is proposed as an adaptive resource scheduling strategy that handles the dynamic characteristics of the resources and users' fluctuating demand during big data stream processing by mimicking the golden eagles' intelligence which alternates the speed of tuning at different spiral trajectory stages of hunting. It handles big data processing by adopting two adaptive parameters which completely concentrates on optimal resource allocation to suitable VMs in the shortest time possible depending on their requirements. The simulation results of HGDGEO algorithm confirmed its predominance in terms of makespan, load balance and throughput on par with the competitive resource scheduling algorithms. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. 大数据技术前瞻.

Author: 梅宏, 杜小勇, 金海, 程学旗, 柴云鹏, 石宣化, 靳小龙, 王亚沙, and 刘驰
Abstract: Major countries in the world attach great importance to the development of big data technology. China also puts big data as a national strategy, of great significance to develop in the long run. Big data technologies include data collection, transmission, management, processing, analysis, and application, forming a data life cycle as well as the data governance related to each procedure. Big data management, processing, analysis, and governance in four areas were seleceted, to identify the gap between China and the world. On the other hand, driven by diverse successful big data applications, the system architecture of computing technology is being restructured. From “computation-centric” to “data-centric”, fundamental computing theories and core technologies need to be redesigned, therefore a new type of big data system technology is becoming an important research direction. Against this background, four technical challenges and ten future development trends of big data technologies were aimed at identifying. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. A innovative wavelet transformation method optimization in the noise-canceling application within intelligent building occupancy detection monitoring

Author: Jan Vanus, Jan Kubicek, Dominik Vilimek, Marek Penhaker, and Petr Bilik
Subjects: Smart home, Prediction of room occupancy, Big data processing, Presence of person monitoring, Activities monitoring with indirect methods, Science (General), Q1-390, Social sciences (General), H1-99
Abstract: The study deals with detection of the occupation of Intelligent Building (IB) using data obtained from indirect methods with Big Data Analysis within IoT. In the area of daily living activity monitoring, one of the most challenging tasks is occupancy prediction, giving us information about people's mobility in the building. This task can be done via monitoring of CO2 as a reliable method, which has the ambition to predict the presence of the people in specific areas. In this paper, we propose a novel hybrid system, which is based on the Support Vector Machine (SVM) prediction of the CO2 waveform with the use of sensors that measure indoor/outdoor temperature and relative humidity. For each such prediction, we also record the gold standard CO2 signal to objectively compare and evaluate the quality of the proposed system. Unfortunately, this prediction is often linked with a presence of predicted signal activities in the form of glitches, often having an oscillating character, which inaccurately approximates the real CO2 signals. Thus, the difference between the gold standard and the prediction results from SVM is increasing. Therefore, we employed as the second part of the proposed system a smoothing procedure based on Wavelet transformation, which has ambitions to reduce inaccuracies in predicted signal via smoothing and increase the accuracy of the whole prediction system. The whole system is completed with an optimization procedure based on the Artificial Bee Colony (ABC) algorithm, which finally classifies the wavelet's response to recommend the most suitable wavelet settings to be used for data smoothing.
Published: 2023
Full Text: View/download PDF

33. Methodology for Creating a Digital Bathymetric Model Using Neural Networks for Combined Hydroacoustic and Photogrammetric Data in Shallow Water Areas

Author: Małgorzata Łącka and Jacek Łubczonek
Subjects: digital bathymetric model, big data processing, MLP neural network, data reduction, USV, UAV, Chemical technology, TP1-1185
Abstract: This study uses a neural network to propose a methodology for creating digital bathymetric models for shallow water areas that are partially covered by a mix of hydroacoustic and photogrammetric data. A key challenge of this approach is the preparation of the training dataset from such data. Focusing on cases in which the training dataset covers only part of the measured depths, the approach employs generalized linear regression for data optimization followed by multilayer perceptron neural networks for bathymetric model creation. The research assessed the impact of data reduction, outlier elimination, and regression surface-based filtering on neural network learning. The average values of the root mean square (RMS) error were successively obtained for the studied nearshore, middle, and deep water areas, which were 0.12 m, 0.03 m, and 0.06 m, respectively; moreover, the values of the mean absolute error (MAE) were 0.11 m, 0.02 m, and 0.04 m, respectively. Following detailed quantitative and qualitative error analyses, the results indicate variable accuracy across different study areas. Nonetheless, the methodology demonstrated effectiveness in depth calculations for water bodies, although it faces challenges with respect to accuracy, especially in preserving nearshore values in shallow areas.
Published: 2023
Full Text: View/download PDF

34. Non-linear block least-squares adjustment for a large number of observations.

Author: Mahboub, Vahid and Ebrahimzadeh, Somayeh
Subjects: *NONLINEAR equations, *NUMBER systems, *LEAST squares
Abstract: In this contribution two algorithms are developed to solve non-linear system of equations which can contain a large number of measurements. These algorithms are based on nonlinear block least-squares (BLS). Although block least squares was investigated by some researchers, the non-linear case was not examined by now. The first algorithm is proposed to solve a special case of non-linear problems that do not require linearization. Such an algorithm can be called total block least-squares. The second algorithm is based on linearization within a general nonlinear mixed model using a new notation which is in agreement with the rigorous linearization presented by Pope. Both of these algorithms can handle constraints on the parameters. By use of these algorithms, big data processing is feasible with inexpensive computers. Furthermore, expensive processors can solve systems with a large number of equations faster. Two case studies with more than 120,000 equations show that fast and accurate computations are possible by applying these algorithms without any loss of accuracy. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. Review On Technologies And Tools Of Big Data Analytics.

Author: Kavya, P., Vineela, B., madhavi, G. Bindhu, saranya, K. Vahini sai, and Likhitha, M.
Abstract: Big data and data science are the two most prominent contemporary developments. Big data analysis increases the requirement for innovative system designs, which promotes the creation of procedures that can manage massive data quantities while maintaining the ag ility, flexibility, and interactive feel that a data scientist requires. Big Data is the outcome of data being created at an exponential rate. This data is diverse and Includes unstructured, and semi-structured data types. It provides valuable information for many Sorts of stakeholders based on their needs, but it cannot be met using standard tools and procedures. Big data technologies play a critical role in handling, storing, and processing this massive quantity of data. It is further classified as text analytics, audio analytics, video analytics, and social media analytics. Big data analytics, when combined with big data analysis, has a major impact. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

36. JHTD: An Efficient Joint Scheduling Framework Based on Hypergraph for Task Placement and Data Transfer Across Geographically Distributed Data Centers

Author: Chao Jing and Penggao Dan
Subjects: Big data processing, geographically distributed data centers, joint scheduling framework, hypergraph, task placement, data transferring, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: As the explosive growth of the data volume, data center is playing a critical role to store and process huge amount of data. Traditional single data center can no longer to adapt into incredibly fast-growing data. Recently, some researches have extended the tasks such data processing to geographically distributed data centers. However, since the joint consideration of task placement and data transfer, it is complex and difficult to design a proper scheduling approach with the goal of minimizing makespan under the constraint of task dependencies, processing capability and network, etc. Therefore, our work proposes $JHTD$ : an efficient joint scheduling framework based on hypergraph for task placement and data transfer across geographically distributed data centers. Generally, there are two crucial stages in $JHTD$ . Initially, due to the outstanding of hypergraphs in modeling complex problems, we have leveraged a hypergraph-based model to establish the relationship between tasks, data files, and data centers. Thereafter, a hypergraph-based partition method has been developed for task placement within the first stage. In the second stage, a task reallocation scheme has been devised in terms of each task-to-data dependency. Meanwhile, a data dependency aware transferring scheme has been designed to minimize the makespan. Last, the real-world model China-VO project has been used to conduct a variety of simulation experiments. The results have demonstrated that $JHTD$ effectively optimizes the problems of task placement and data transfer across geographically distributed data centers. $JHTD$ has been compared with three other state-of-the-art algorithms. The results have demonstrated that $JHTD$ can reduce the makespan by up to 20.6%. Also, various impacts (data transfer volume and load balancing) have been taken into account to show and discuss the effectiveness of $JHTD$ .
Published: 2022
Full Text: View/download PDF

37. Advances in MapReduce Big Data Processing: Platform, Tools, and Algorithms

Author: Abualigah, Laith, Masri, Bahaa Al, Kacprzyk, Janusz, Series Editor, Manoharan, Kalaiselvi Geetha, editor, Nehru, Jawaharlal Arun, editor, and Balasubramanian, Sivaraman, editor
Published: 2021
Full Text: View/download PDF

38. Fast SQL/Row Pattern Recognition Query Processing Using Parallel Primitives on GPUs

Author: Ohara, Tsubasa, Chang, Qiong, Miyazaki, Jun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Strauss, Christine, editor, Kotsis, Gabriele, editor, Tjoa, A Min, editor, and Khalil, Ismail, editor
Published: 2021
Full Text: View/download PDF

39. Using Multi-Dimensional Dynamic Time Warping to Identify Time-Varying Lead-Lag Relationships.

Author: Stübinger, Johannes and Walter, Dominik
Subjects: *TIME series analysis, *TIME
Abstract: This paper develops a multi-dimensional Dynamic Time Warping (DTW) algorithm to identify varying lead-lag relationships between two different time series. Specifically, this manuscript contributes to the literature by improving upon the use towards lead-lag estimation. Our two-step procedure computes the multi-dimensional DTW alignment with the aid of shapeDTW and then utilises the output to extract the estimated time-varying lead-lag relationship between the original time series. Next, our extensive simulation study analyses the performance of the algorithm compared to the state-of-the-art methods Thermal Optimal Path (TOP), Symmetric Thermal Optimal Path (TOPS), Rolling Cross-Correlation (RCC), Dynamic Time Warping (DTW), and Derivative Dynamic Time Warping (DDTW). We observe a strong outperformance of the algorithm regarding efficiency, robustness, and feasibility. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. Edge computing for big data processing in underwater applications.

Author: Periola, A. A., Alonge, A. A., and Ogudo, K. A.
Subjects: *BIG data, *ELECTRONIC data processing, *EDGE computing, *AUTOMATED teller machines, *SERVER farms (Computer network management)
Abstract: Underwater data acquisition entities acquire big data that are processed aboard terrestrial data centres. However, processing the big data aboard terrestrial computing entities involves high latency data transfer. In addition, the processing of data in a terrestrial environment is challenging when there is inadequate edge node capacity. These challenges are addressed here. The paper proposes the heterogeneous edge computing paradigm to realize low latency transfer of increasing underwater big data. This is realized via the use of underwater computing entities instead of terrestrial computing entities for processing acquired big data. The proposed heterogeneous edge computing paradigm presents the multi-mode automated teller machine (ATM) as low cost terrestrial edge network entity. The multi-mode ATM is suitable when edge nodes have inadequate computing capacity. Performance evaluation shows that the use of underwater computing entities instead of terrestrial computing entities (existing work) enhances network performance and related capital costs. The number of hops, computing entity access latency and required autonomous underwater vehicle acquisition costs by an average of (5.3–88.4)%, 63.5% and (31.8–95.4)%, respectively. Evaluation shows that the use of the multi-mode ATM in the context of terrestrial cloud computing reduces the number of hops and latency by 44.4% and 37.3% on average, respectively. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. An Elastic Data Processing Method Based on Data-Center-Platform

Author: Pan, Zhang, Fenggang, Lai, Jing, Du, Zhangchi, Ying, Rui, Kong, Yi, Zhou, Xiao, Yu, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Xiang, Yang, editor, Liu, Zheli, editor, and Li, Jin, editor
Published: 2020
Full Text: View/download PDF

42. A Novel Throughput Based Temporal Violation Handling Strategy for Instance-Intensive Cloud Business Workflows

Author: Wang, Futian, Liu, Xiao, Zhang, Wei, Zhang, Cheng, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Barbosa, Simone Diniz Junqueira, Founding Editor, He, Jing, editor, Yu, Philip S., editor, Shi, Yong, editor, Li, Xingsen, editor, Xie, Zhijun, editor, Huang, Guangyan, editor, Cao, Jie, editor, and Xiao, Fu, editor
Published: 2020
Full Text: View/download PDF

43. Latency Estimation of Big Data Processing Under the MapReduce Framework with Coupling Effects

Author: Lin, Di, Cai, Lingshuang, Zhang, Xiaofeng, Zhang, Xiao, Huo, Jiazhi, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zhang, Junjie James, Series Editor, Wang, Wei, editor, Mu, Jiasong, editor, Liu, Xin, editor, Na, Zhenyu, editor, and Chen, Bingcai, editor
Published: 2020
Full Text: View/download PDF

44. Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark

Author: Phan, Thuong-Cang, Phan, Anh-Cang, Tran, Thi-To-Quyen, Trieu, Ngoan-Thanh, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Le Thi, Hoai An, editor, Le, Hoai Minh, editor, and Pham Dinh, Tao, editor
Published: 2020
Full Text: View/download PDF

45. Modeling Big Data Processing Programs

Author: de Souza Neto, João Batista, Moreira, Anamaria Martins, Vargas-Solar, Genoveva, Musicante, Martin A., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Carvalho, Gustavo, editor, and Stolz, Volker, editor
Published: 2020
Full Text: View/download PDF

46. Emerging Hardware Technologies for IoT Data Processing

Author: Bojnordi, Mahdi Nazm, Behnam, Payman, Firouzi, Farshad, editor, Chakrabarty, Krishnendu, editor, and Nassif, Sani, editor
Published: 2020
Full Text: View/download PDF

47. DRIIS: MapReduce Parameter Optimization of Hadoop Using Genetic Algorithm.

Author: Anusha, R. J. and RamaParvathy, L.
Subjects: GENETIC algorithms, VIRTUAL machine systems, NURSES' aides, JOB performance, SEARCH algorithms, GENETIC programming
Abstract: In this era of massive knowledge, Hadoop, jointly of the foremost unremarkably used massive processing platforms, features a variety of parameters that are closely associated with resource utilization, particularly mainframe and memory. Calibration of these parameters through improvement may increase Hadoop's resource utilization. Manual calibration of those parameters is just about not possible, thanks to the time price. Within the massive knowledge business, there's a requirement to mechanically set up parameters and thereby maximize resource usage. The previous automatic calibration strategies take an extended time to realize the optimum configuration, reducing the cluster's overall performance. With the help of a novel perceptive procedure, each genetic programming and a genetic rule are supported with a view to enhancing the performance of a Hadoop MapReduce work, we propose an assistant in nursing the best configuration finder. We have a tendency to use the algorithms on top to search out the simplest values for parameter settings. Experiments were performed on four common applications, WordCount, TeraSort, Index and Grep, and eight virtual machines (VMs) in a very typical Hadoop cluster. Our projected methodology will increase MapReduce job potency by 53.63% for a one GB dataset and by 67.4% for a five GB dataset, in keeping with the findings, and by 73.68% on a 10 GB dataset; additionally, for a TeraSort programmed application, MapReduce job potency will increase by 52.62% for an one GB dataset, 61.2% for a five GB dataset, and 55.17% for a 10 GB dataset. MapReduce jobs boost the performance of Grep applications by 44.4% for a one GB dataset, 56.25% for a five GB dataset, and 49.44% for a 10 GB dataset. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

48. Fast cluster-based computation of exact betweenness centrality in large graphs

Author: Cecile Daniel, Angelo Furno, Lorenzo Goglia, and Eugenio Zimeo
Subjects: Complex networks analysis, Betweenness centrality, Distributed computation, Big data processing, Computer engineering. Computer hardware, TK7885-7895, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Nowadays a large amount of data is originated by complex systems, such as social networks, transportation systems, computer and service networks. These systems can be modeled by using graphs and studied by exploiting graph metrics, such as betweenness centrality (BC), a popular metric to analyze node centrality of graphs. In spite of its great potential, this metric requires long computation time, especially for large graphs. In this paper, we present a very fast algorithm to compute BC of undirected graphs by exploiting clustering. The algorithm leverages structural properties of graphs to find classes of equivalent nodes: by selecting one representative node for each class, we are able to compute BC by significantly reducing the number of single-source shortest path explorations adopted by Brandes’ algorithm. We formally prove the graph properties that we exploit to define the algorithm and present an implementation based on Scala for both sequential and parallel map-reduce executions. The experimental evaluation of both versions, conducted with synthetic and real graphs, reveals that our solution largely outperforms Brandes’ algorithm and significantly improves known heuristics.
Published: 2021
Full Text: View/download PDF

49. PERIDOT: Modeling Execution Time of Spark Applications

Author: Sarah Shah, Yasaman Amannejad, Diwakar Krishnamurthy, and Mea Wang
Subjects: Apache spark, Big Data processing, performance prediction, performance engineering, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: A data analytics application submitted to a Spark cluster often has to finish executing by a specified time target. To use cluster resources effectively, the key challenge is having the ability to gain quick insights on how the execution time of any given application is likely to be impacted by the resources allocated to the application, e.g., the number of Spark executor cores assigned, and the size of the data to be processed. Such insights can be used to quickly estimate the required resources and configure a Spark application for a desired execution time using the least amount of resources. Our paper proposes an automated execution time estimation approach called PERIDOT that involves executing a given application under a fixed resource setting with two different-sized, small subsets of its input data to offer fast, lightweight execution time predictions. It analyzes logs from these two executions to estimate the dependencies between internal stages of the application. Information on these dependencies combined with knowledge of Spark's data partitioning mechanisms is used to derive an analytic model that can estimate execution times for other resource settings and input data sizes. Our results from a wide range of applications and multiple Spark clusters show that PERIDOT can accurately estimate the execution time of an application from limited historical data, and suggest the minimum amount of resources required to closely meet an execution time target.
Published: 2021
Full Text: View/download PDF

50. Spark‐based parallel processing whale optimization algorithm.

Author: Alshayeji, Mohammad, Behbehani, Bader, and Ahmad, Imtiaz
Subjects: MATHEMATICAL optimization, SWARM intelligence, PARALLEL processing, HUMPBACK whale behavior, PROCESS optimization, DISTRIBUTED computing
Abstract: Summary: Swarm intelligence meta‐heuristic optimization algorithms for optimizing engineering applications have become increasingly popular. The whale optimization algorithm (WOA) is a recent and effective swarm intelligence optimization algorithm that mimics humpback whales' behaviors when optimizing a problem. Applying the algorithm to achieve optimal solutions has shown good results compared to most meta‐heuristic optimization algorithms. However, complex applications might require the processing of large‐scale computations, which results in down‐scaling computational throughput of WOA. Apache Spark, a well‐known parallel data processing framework, is the most recent distributed computing framework and has been proven to be the most efficient. In this article, we propose a WOA implementation on top of Apache Spark, represented as SBWOA, to enhance its computational performance while providing higher scalability of the algorithm for handling more complex problems. Compared with the recently reported MapReduce WOA (MR‐WOA), and serial implementation of WOA, our approach achieves significant enhancements with respect to computational performance for the highest population size with the maximum number of iterations. SBWOA successfully handles higher‐complexity problems which require complex computations. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

841 results on '"Big data processing"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources