127 results on '"Lambda architecture"'
Search Results
2. The Impact of Data Ingestion Layer in an Improved Lambda Architecture
- Author
-
Foko Sindjoung, Miguel Landry, Fotseu Fotseu, Ernest Basile, Velempini, Mthulisi, Fotsing Talla, Bernard, Bomgni (PI), Alain Bertrand, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
3. Big Data Architectures
- Author
-
Garriga, Martin, Monsieur, Geert, Tamburri, Damian, Liebregts, Werner, editor, van den Heuvel, Willem-Jan, editor, van den Born, Arjan, editor, Van den Heuvel, Willem-Jan, Section Editor, Tamburri, Damian A., Section Editor, Böing-Messing, Florian, Section Editor, and Lafarre, Anne J. F., Section Editor
- Published
- 2023
- Full Text
- View/download PDF
4. Real-Time Assessment of Live Feeds in Big Data
- Author
-
Bhagat, Amol, Deshpande, Makrand, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Joshi, Amit, editor, Mahmud, Mufti, editor, and Ragel, Roshan G., editor
- Published
- 2023
- Full Text
- View/download PDF
5. Performance Analysis of Lambda Architecture-Based Big-Data Systems on Air/Ground Surveillance Application with ADS-B Data.
- Author
-
Demirezen, Mustafa Umut and Navruz, Tuğba Selcen
- Subjects
- *
AUTOMATIC dependent surveillance-broadcast , *OXYGEN consumption , *SOFTWARE frameworks , *COMPUTER software development , *ELECTRONIC data processing - Abstract
This study introduces a novel methodology designed to assess the accuracy of data processing in the Lambda Architecture (LA), an advanced big-data framework qualified for processing streaming (data in motion) and batch (data at rest) data. Distinct from prior studies that have focused on hardware performance and scalability evaluations, our research uniquely targets the intricate aspects of data-processing accuracy within the various layers of LA. The salient contribution of this study lies in its empirical approach. For the first time, we provide empirical evidence that validates previously theoretical assertions about LA, which have remained largely unexamined due to LA's intricate design. Our methodology encompasses the evaluation of prospective technologies across all levels of LA, the examination of layer-specific design limitations, and the implementation of a uniform software development framework across multiple layers. Specifically, our methodology employs a unique set of metrics, including data latency and processing accuracy under various conditions, which serve as critical indicators of LA's accurate data-processing performance. Our findings compellingly illustrate LA's "eventual consistency". Despite potential transient inconsistencies during real-time processing in the Speed Layer (SL), the system ultimately converges to deliver precise and reliable results, as informed by the comprehensive computations of the Batch Layer (BL). This empirical validation not only confirms but also quantifies the claims posited by previous theoretical discourse, with our results indicating a 100% accuracy rate under various severe data-ingestion scenarios. We applied this methodology in a practical case study involving air/ground surveillance, a domain where data accuracy is paramount. This application demonstrates the effectiveness of the methodology using real-world data-intake scenarios, therefore distinguishing this study from hardware-centric evaluations. This study not only contributes to the existing body of knowledge on LA but also addresses a significant literature gap. By offering a novel, empirically supported methodology for testing LA, a methodology with potential applicability to other big-data architectures, this study sets a precedent for future research in this area, advancing beyond previous work that lacked empirical validation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. An ensemble deep learning based IDS for IoT using Lambda architecture
- Author
-
Rubayyi Alghamdi and Martine Bellaiche
- Subjects
IoT ,IDS ,Lambda architecture ,Cyber-attacks ,Deep learning ,Ensemble learning ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract The Internet of Things (IoT) has revolutionized our world today by providing greater levels of accessibility, connectivity and ease to our everyday lives. It enables massive amounts of data to be traversed across multiple heterogeneous devices that are all interconnected. This phenomenon makes IoT networks vulnerable to various network attacks and intrusions. Building an Intrusion Detection System (IDS) for IoT networks is challenging as they enable a massive amount of data to be aggregated, which is difficult to handle and analyze in real time mainly because of the heterogeneous nature of IoT devices. This inefficient, traditional IDS approach accentuates the need to develop advanced IDS techniques by employing Machine or Deep Learning. This paper presents a deep ensemble-based IDS using Lambda architecture by following a multi-pronged classification approach. Binary classification uses Long Short Term Memory (LSTM) to differentiate between malicious and benign traffic, while the multi-class classifier uses an ensemble of LSTM, Convolutional Neural Network and Artificial Neural Network classifiers to detect the type of attacks. The model training is performed in the batch layer, while real-time evaluation is carried out through model inferences in the speed layer of the Lambda architecture. The proposed approach gives high accuracy of over 99.93% and saves useful processing time due to the multi-pronged classification strategy and using the lambda architecture.
- Published
- 2023
- Full Text
- View/download PDF
7. Lambda Architecture-Based Big Data System for Large-Scale Targeted Social Engineering Email Detection.
- Author
-
Demirezen, Mustafa Umut and Navruz, Tuğba Selcen
- Subjects
- *
MACHINE learning , *SOCIAL engineering (Fraud) , *LANGUAGE models , *ELECTRONIC data processing , *BIG data , *SOCIAL engineering (Political science) - Abstract
In this research, we delve deep into the realm of Targeted Social Engineering Email Detection, presenting a novel approach that harnesses the power of Lambda Architecture (LA). Our innovative methodology strategically segments the BERT model into two distinct components: the embedding generator and the classification segment. This segmentation not only optimizes resource consumption but also improves system efficiency, making it a pioneering step in the field. Our empirical findings, derived from a rigorous comparison between the fastText and BERT models, underscore the superior performance of the latter. Specifically, The BERT model has high precision rates for identifying malicious and benign emails, with impressive recall values and F1 scores. Its overall accuracy rate was 0.9988, with a Matthews Correlation Coefficient value of 0.9978. In comparison, the fastText model showed lower precision rates. Leveraging principles reminiscent of the Lambda architecture, our study delves into the performance dynamics of data processing models. The Separated-BERT (Sep-BERT) model emerges as a robust contender, adept at managing both real-time (stream) and large-scale (batch) data processing. Compared to the traditional BERT, Sep-BERT showcased superior efficiency, with reduced memory and CPU consumption across diverse email sizes and ingestion rates. This efficiency, combined with rapid inference times, positions Sep-BERT as a scalable and cost-effective solution, aligning well with the demands of Lambda-inspired architectures. This study marks a significant step forward in the fields of big data and cybersecurity. By introducing a novel methodology and demonstrating its efficacy in detecting targeted social engineering emails, we not only advance the state of knowledge in these domains but also lay a robust foundation for future research endeavors, emphasizing the transformative potential of integrating advanced big data frameworks with machine learning models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. An ensemble deep learning based IDS for IoT using Lambda architecture.
- Author
-
Alghamdi, Rubayyi and Bellaiche, Martine
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,DENIAL of service attacks ,LONG-term memory ,INTERNET of things ,MACHINE learning - Abstract
The Internet of Things (IoT) has revolutionized our world today by providing greater levels of accessibility, connectivity and ease to our everyday lives. It enables massive amounts of data to be traversed across multiple heterogeneous devices that are all interconnected. This phenomenon makes IoT networks vulnerable to various network attacks and intrusions. Building an Intrusion Detection System (IDS) for IoT networks is challenging as they enable a massive amount of data to be aggregated, which is difficult to handle and analyze in real time mainly because of the heterogeneous nature of IoT devices. This inefficient, traditional IDS approach accentuates the need to develop advanced IDS techniques by employing Machine or Deep Learning. This paper presents a deep ensemble-based IDS using Lambda architecture by following a multi-pronged classification approach. Binary classification uses Long Short Term Memory (LSTM) to differentiate between malicious and benign traffic, while the multi-class classifier uses an ensemble of LSTM, Convolutional Neural Network and Artificial Neural Network classifiers to detect the type of attacks. The model training is performed in the batch layer, while real-time evaluation is carried out through model inferences in the speed layer of the Lambda architecture. The proposed approach gives high accuracy of over 99.93% and saves useful processing time due to the multi-pronged classification strategy and using the lambda architecture. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. An Urban Rail Signal Fault Diagnosis System Based on Knowledge Model
- Author
-
YANG Jiang, YANG Yibin, OU Shengfen, and DENG Yongqi
- Subjects
signal system ,knowledge model ,ontology web language(owl) ,fault diagnosis ,lambda architecture ,hadoop technical ecosystem ,Control engineering systems. Automatic machinery (General) ,TJ212-225 ,Technology - Abstract
At present, urban rail signal maintenance system can only alarm a single fault source, and can not quickly locate the cause of fault and guide operation and maintenance personnel to deal with the fault. However, urban rail signal system has a large variety of faults, complex diagnosis and analysis logic, customized development of fault diagnosis procedures for different scenarios can not, quickly respond to the needs of operation and maintenance, and the cost is high. To solve this problem, this paper develops an information-based and platform-based urban rail signal fault diagnosis system based on knowledge model to realize signal system fault knowledge modeling, fault diagnosis semantic correlation and fault diagnosis process modeling. In order to realize the complete reasoning of signal system fault diagnosis, OWL DL(ontology web language description logic) is used to model knowledge, extract and describe the analysis logic of signal system fault diagnosis, and establish the knowledge model. The operation state of system equipment is used to match the fault cause, and the mapping from signal system equipment state space to fault cause space is used to realize the fault self-diagnosis of signal system equipment, so as to provide decision support for the production, operation and maintenance management of signal system equipment. Application results show that it can reduce the operation safety accident rate by 15% and improve the fault handling efficiency by 20%.
- Published
- 2022
- Full Text
- View/download PDF
10. Architecture Patterns—Batch and Real-Time Capabilities
- Author
-
Kraetz, Dennis, Morawski, Michael, Liermann, Volker, editor, and Stegmann, Claus, editor
- Published
- 2021
- Full Text
- View/download PDF
11. A Conceptual Framework for Sensitive Big Data Publishing
- Author
-
Victor, Nancy, Lopez, Daphne, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Purohit, Sunil Dutt, editor, Singh Jat, Dharm, editor, Poonia, Ramesh Chandra, editor, Kumar, Sandeep, editor, and Hiranwal, Saroj, editor
- Published
- 2021
- Full Text
- View/download PDF
12. Big Data Accident Prediction System in Green Networks and Intelligent Transportation Systems
- Author
-
Tantaoui, Mouad, Laanaoui, My Driss, Kabil, Mustapha, Pisello, Anna Laura, Editorial Board Member, Hawkes, Dean, Editorial Board Member, Bougdah, Hocine, Editorial Board Member, Rosso, Federica, Editorial Board Member, Abdalla, Hassan, Editorial Board Member, Boemi, Sofia-Natalia, Editorial Board Member, Mohareb, Nabil, Editorial Board Member, Mesbah Elkaffas, Saleh, Editorial Board Member, Bozonnet, Emmanuel, Editorial Board Member, Pignatta, Gloria, Editorial Board Member, Mahgoub, Yasser, Editorial Board Member, De Bonis, Luciano, Editorial Board Member, Kostopoulou, Stella, Editorial Board Member, Pradhan, Biswajeet, Editorial Board Member, Abdul Mannan, Md., Editorial Board Member, Alalouch, Chaham, Editorial Board Member, O. Gawad, Iman, Editorial Board Member, Nayyar, Anand, Editorial Board Member, Amer, Mourad, Series Editor, Ben Ahmed, Mohamed, editor, Mellouli, Sehl, editor, Braganca, Luis, editor, Anouar Abdelhakim, Boudhir, editor, and Bernadetta, Kwintiana Ane, editor
- Published
- 2021
- Full Text
- View/download PDF
13. Toward Public Opinion Monitoring System of Large-Scale Data with Lambda Architecture
- Author
-
Zhang, Weijuan, Lu, Yue, Ma, Kun, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Panda, Mrutyunjaya, editor, Pradhan, Subhrajit, editor, Garcia-Hernandez, Laura, editor, and Ma, Kun, editor
- Published
- 2021
- Full Text
- View/download PDF
14. Lambda+, the Renewal of the Lambda Architecture: Category Theory to the Rescue
- Author
-
Gillet, Annabelle, Leclercq, Éric, Cullot, Nadine, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, La Rosa, Marcello, editor, Sadiq, Shazia, editor, and Teniente, Ernest, editor
- Published
- 2021
- Full Text
- View/download PDF
15. A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications
- Author
-
Jerry C. C. Tseng, Sun-Yuan Hsieh, and Vincent S. Tseng
- Subjects
Complex event sequence ,data stream ,episode pattern mining ,incremental mining ,lambda architecture ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the ubiquity of sensor networks and smart devices that continuously collect data, we face the challenge of analyzing the growing stream of data in real time. In recent years, there has been a huge need to gain useful knowledge by incrementally analyzing event sequence data. Although episode pattern mining techniques have existed for years, people have recently become more aware of their practical value in solving real-life domain problems such as manufacturing records, stock markets, and weather forecasts. The effective and efficient application of episode pattern mining techniques to analyze complex event data is becoming increasingly important for solving real-life problems in wide domains. However, few studies have focused on developing a scalable framework based on episode pattern mining of complex event sequences for applications in various domains. In this work, we propose a novel framework named SAAF (Scalable Analytical Application Framework) based on complex event episode mining techniques, including batch episode mining, delta episode mining, incremental episode mining, and pattern merging, to consider both efficiency and accuracy. Moreover, to enhance scalability, we adopt the lambda architecture with Apache Spark and Apache Spark Streaming as the system development framework. Finally, the experimental results on three real datasets of different domains and two benchmark datasets showed that the proposed SAAF framework exhibits excellent performance in terms of efficiency, accuracy, and scalability.
- Published
- 2022
- Full Text
- View/download PDF
16. Cloud and distributed architectures for data management in agriculture 4.0 : Review and future trends.
- Author
-
Debauche, Olivier, Mahmoudi, Saïd, Manneback, Pierre, and Lebeau, Frédéric
- Subjects
DATA management ,AGRICULTURE ,COST effectiveness ,CLOUD computing ,AGRICULTURAL technology ,MAINTAINABILITY (Engineering) - Abstract
[Display omitted] • Cloud architectures used in Agriculture 4.0. • Distributed Architectures and Cloud Computing complements. • Strategies of association between Edge, Fog, Cloud. • New architectural and computing trends. The Agriculture 4.0, also called Smart Agriculture or Smart Farming, is at the origin of the production of a huge amount of data that must be collected, stored, and processed in a very short time. Processing this massive quantity of data needs to use specific infrastructure that use adapted IoT architectures. Our review offers a comparative panorama of Central Cloud, Distributed Cloud Architectures, Collaborative Computing Strategies, and new trends used in the context of Agriculture 4.0. In this review, we try to answer 4 research questions: (1) Which storage and processing architectures are best suited to Agriculture 4.0 applications and respond to its peculiarities? (2) Can generic architectures meet the needs of Agriculture 4.0 application cases? (3) What are the horizontal development possibilities that allow the transition from research to industrialization? (4) What are the vertical valuations possibilities to move from algorithms trained in the cloud to embedded or stand-alone products? For this, we compare architectures with 8 criteria (User Proximity, Latency & Jitter, Network stability, high throughput, Reliability, Scalability, Cost Effectiveness, Maintainability), and analyze the advantages and disadvantages of each of them. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure
- Author
-
Suthakar, Uthayanath, Smith, D., Khan, A., and Magnoni, L.
- Subjects
004 ,Big data ,Data science ,Distributed system ,Lambda Architecture ,Parallel computing - Abstract
Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
- Published
- 2017
18. Lambda Architecture for Anomaly Detection in Online Process Mining Using Autoencoders
- Author
-
Krajsic, Philippe, Franczyk, Bogdan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Hernes, Marcin, editor, Wojtkiewicz, Krystian, editor, and Szczerbicki, Edward, editor
- Published
- 2020
- Full Text
- View/download PDF
19. Stream and Event Processing Services for Real-time Linked Dataspaces
- Author
-
Curry, Edward and Curry, Edward
- Published
- 2020
- Full Text
- View/download PDF
20. Architecting IoT Cloud
- Author
-
Firouzi, Farshad, Farahani, Bahar, Firouzi, Farshad, editor, Chakrabarty, Krishnendu, editor, and Nassif, Sani, editor
- Published
- 2020
- Full Text
- View/download PDF
21. The Next-Generation NIDS Platform: Cloud-Based Snort NIDS Using Containers and Big Data.
- Author
-
Saputra, Ferry Astika, Salman, Muhammad, Hasim, Jauari Akhmad Nur, Nadhori, Isbat Uzzin, and Ramli, Kalamullah
- Subjects
SENSOR placement ,BIG data ,COMPUTER network security ,CONTAINERS ,SERVER farms (Computer network management) ,DATA logging - Abstract
Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor's platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. A Machine Hearing Framework for Real-Time Streaming Analytics Using Lambda Architecture
- Author
-
Demertzis, Konstantinos, Iliadis, Lazaros, Anezakis, Vardis-Dimitris, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Yuan, Junsong, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Macintyre, John, editor, Iliadis, Lazaros, editor, Maglogiannis, Ilias, editor, and Jayne, Chrisina, editor
- Published
- 2019
- Full Text
- View/download PDF
23. The Big Data-RTAP: Toward a Secured Video Surveillance System in Smart Environment
- Author
-
Ezzahout, Abderrahmane, Oubaha, Jawad, Kacprzyk, Janusz, Series Editor, Zbakh, Mostapha, editor, Essaaidi, Mohammed, editor, Manneback, Pierre, editor, and Rong, Chunming, editor
- Published
- 2019
- Full Text
- View/download PDF
24. Toward Information System Architecture to Support Predictive Maintenance Approach
- Author
-
Sarazin, Alexandre, Truptil, Sébastien, Montarnal, Aurélie, Lamothe, Jacques, Popplewell, Keith, editor, Thoben, Klaus-Dieter, editor, Knothe, Thomas, editor, and Poler, Raúl, editor
- Published
- 2019
- Full Text
- View/download PDF
25. Lambda architecture for cost-effective batch and speed big data processing
- Author
-
Kiran, M, Murphy, P, Monga, I, Dugan, J, and Baveja, SS
- Subjects
big data processing ,lambda architecture ,Amazon EC2 ,sensor data analysis ,Networking and Information Technology R&D - Abstract
Sensor and smart phone technologies present opportunities for data explosion, streaming and collecting from heterogeneous devices every second. Analyzing these large datasets can unlock multiple behaviors previously unknown, and help optimize approaches to city wide applications or societal use cases. However, collecting and handling of these massive datasets presents challenges in how to perform optimized online data analysis 'on-the-fly', as current approaches are often limited by capability, expense and resources. This presents a need for developing new methods for data management particularly using public clouds to minimize cost, network resources and on-demand availability. This paper presents an implementation of the lambda architecture design pattern to construct a data-handling backend on Amazon EC2, providing high throughput, dense and intense data demand delivered as services, minimizing the cost of the network maintenance. This paper combines ideas from database management, cost models, query management and cloud computing to present a general architecture that could be applied in any given scenario where affordable online data processing of Big Datasets is needed. The results are presented with a case study of processing router sensor data on the current ESnet network data as a working example of the approach. The results showcase a reduction in cost and argue benefits for performing online analysis and anomaly detection for sensor data.
- Published
- 2015
26. Lambda Architecture for Cost-Effective Batch and Speed Big Data Processing
- Author
-
Kiran, Mariam, Murphy, Peter, Monga, Inder, Dugan, Jon, and Baveja, Sartaj Singh
- Subjects
Networking and Information Technology R&D (NITRD) ,big data processing ,lambda architecture ,Amazon EC2 ,sensor data analysis - Abstract
Sensor and smart phone technologies present opportunities for data explosion, streaming and collecting from heterogeneous devices every second. Analyzing these large datasets can unlock multiple behaviors previously unknown, and help optimize approaches to city wide applications or societal use cases. However, collecting and handling of these massive datasets presents challenges in how to perform optimized online data analysis 'on-the-fly', as current approaches are often limited by capability, expense and resources. This presents a need for developing new methods for data management particularly using public clouds to minimize cost, network resources and on-demand availability. This paper presents an implementation of the lambda architecture design pattern to construct a data-handling backend on Amazon EC2, providing high throughput, dense and intense data demand delivered as services, minimizing the cost of the network maintenance. This paper combines ideas from database management, cost models, query management and cloud computing to present a general architecture that could be applied in any given scenario where affordable online data processing of Big Datasets is needed. The results are presented with a case study of processing router sensor data on the current ESnet network data as a working example of the approach. The results showcase a reduction in cost and argue benefits for performing online analysis and anomaly detection for sensor data.
- Published
- 2015
27. Lifelong Machine Learning and root cause analysis for large-scale cancer patient data
- Author
-
Gautam Pal, Xianbin Hong, Zhuo Wang, Hongyi Wu, Gangmin Li, and Katie Atkinson
- Subjects
Lifelong learning ,Real-time data processing ,Lambda Architecture ,Streaming k-means ,Random Decision Forest ,Dimension reduction ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract Introduction This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model. Case description A cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient. Discussion and evaluation The model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields. Conclusion We propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.
- Published
- 2019
- Full Text
- View/download PDF
28. The Next-Generation NIDS Platform: Cloud-Based Snort NIDS Using Containers and Big Data
- Author
-
Ferry Astika Saputra, Muhammad Salman, Jauari Akhmad Nur Hasim, Isbat Uzzin Nadhori, and Kalamullah Ramli
- Subjects
Snort ,big data ,cloud-based IDS ,docker ,lambda architecture ,Technology - Abstract
Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor’s platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform.
- Published
- 2022
- Full Text
- View/download PDF
29. Cloud architecture for plant phenotyping research.
- Author
-
Debauche, Olivier, Mahmoudi, Sidi Ahmed, De Cock, Nicolas, Mahmoudi, Saïd, Manneback, Pierre, and Lebeau, Frédéric
- Subjects
DATA management ,BIG data ,CLOUD computing ,DISTRIBUTED databases - Abstract
Summary: Digital phenotyping is an emergent science mainly based on imagery techniques. The tremendous amount of data generated needs important cloud computing for their processing. The coupling of recent advance of distributed databases and cloud computing offers new possibilities of big data management and data sharing for the scientific research. In this paper, we present a solution combining a lambda architecture built around Apache Druid and a hosting platform leaning on Apache Mesos. Lambda architecture has already proved its performance and robustness. However, the capacity of ingesting and requesting of the database is essential and can constitute a bottleneck for the architecture, in particular, for in terms of availability and response time of data. We focused our experimentation on the response time of different databases to choose the most adapted for our phenotyping architecture. Apache Druid has shown its ability to respond to typical queries of phenotyping applications in times generally inferior to the second. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Renewable estimation and incremental inference in generalized linear models with streaming data sets.
- Author
-
Luo, Lan and Song, Peter X.‐K.
- Subjects
ASYMPTOTIC normality ,DISTRIBUTED computing ,DATA modeling ,DATA analysis - Abstract
Summary: The paper presents an incremental updating algorithm to analyse streaming data sets using generalized linear models. The method proposed is formulated within a new framework of renewable estimation and incremental inference, in which the maximum likelihood estimator is renewed with current data and summary statistics of historical data. Our framework can be implemented within a popular distributed computing environment, known as Apache Spark, to scale up computation. Consisting of two data‐processing layers, the rho architecture enables us to accommodate inference‐related statistics and to facilitate sequential updating of the statistics used in both estimation and inference. We establish estimation consistency and asymptotic normality of the proposed renewable estimator, in which the Wald test is utilized for an incremental inference. Our methods are examined and illustrated by various numerical examples from both simulation experiments and a real world data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. -CoAP: An Internet of Things and Cloud Computing Integration Based on the Lambda Architecture and CoAP
- Author
-
Díaz, Manuel, Martín, Cristian, Rubio, Bartolomé, Akan, Ozgur, Series editor, Cao, Jiannong, Series editor, Coulson, Geoffrey, Series editor, Dressler, Falko, Series editor, Ferrari, Domenico, Series editor, Gerla, Mario, Series editor, Kobayashi, Hisashi, Series editor, Palazzo, Sergio, Series editor, Sahni, Sartaj, Series editor, Shen, Xuemin (Sherman), Series editor, Stan, Mircea, Series editor, Xiaohua, Jia, Series editor, Zomaya, Albert, Series editor, Bellavista, Paolo, Series editor, Guo, Song, editor, Liao, Xiaofei, editor, Liu, Fangming, editor, and Zhu, Yanmin, editor
- Published
- 2016
- Full Text
- View/download PDF
32. Online Anomaly Energy Consumption Detection Using Lambda Architecture
- Author
-
Liu, Xiufeng, Iftikhar, Nadeem, Nielsen, Per Sieverts, Heller, Alfred, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Madria, Sanjay, editor, and Hara, Takahiro, editor
- Published
- 2016
- Full Text
- View/download PDF
33. Information Flow Monitoring System
- Author
-
Sang Hun Han, Aziz Nasridinov, and Keun Ho Ryu
- Subjects
Information flow ,social media data ,skyline ,Lambda architecture ,MapReduce ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Vast quantities of data are generated by social networks in seconds. The information generated in a social network is transformed into a flow by the subjects who produce, transmit, and consume it. This flow can be represented in a very complicated directional graph where each subject is represented as a node, and the flow of information is represented as a directed edge. In this paper, we introduce a method of dividing this complex directional graph by user and quantifying the flow of information between and among users based on information flow vectors. We propose a system that can monitor the flow of information in social networks using information flow vectors extracted from social media data. We also introduce an improved skyline algorithm that can respond quickly to a user's various queries.
- Published
- 2018
- Full Text
- View/download PDF
34. Stream Processing on Demand for Lambda Architectures
- Author
-
Kroß, Johannes, Brunnert, Andreas, Prehofer, Christian, Runkler, Thomas A., Krcmar, Helmut, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Beltrán, Marta, editor, Knottenbelt, William, editor, and Bradley, Jeremy, editor
- Published
- 2015
- Full Text
- View/download PDF
35. NEW APPROACH OF STORING AND RETRIEVING LARGE DATA VOLUMES.
- Author
-
Šikanjić, Nedeljko and Avramović, Zoran Ž.
- Subjects
BIG data ,DATA quality ,DATA warehousing ,DATA - Abstract
In today's world of advanced informational technologies, society is facing a huge amount of data that is just getting impossible to store, process and analyze. In these big data volumes, some of the important information is being lost, that could help us improve the quality of personal and business life. This paper focus is on finding the best possible way of approaching this issue to find a feasible solution in increasing the efficiency and quality of data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Cloud services integration for farm animals' behavior studies based on smartphones as activity sensors.
- Author
-
Debauche, Olivier, Mahmoudi, Saïd, Andriamandroso, Andriamasinoro Lalaina Herinaina, Manneback, Pierre, Bindelle, Jérôme, and Lebeau, Frédéric
- Abstract
Smartphones, particularly iPhone, can be relevant instruments for researchers in animal behavior because they are readily available on the planet, contain many sensors and require no hardware development. They are equipped with high performance Inertial Measurement Units (IMU) and absolute positioning systems analyzing users' movements, but they can easily be diverted to analyze likewise the behaviors of domestic animals such as cattle. The study of animal behavior using smartphones requires the storage of many high frequency variables from a large number of individuals and their processing through various relevant variables combinations for modeling and decision-making. Transferring, storing, treating and sharing such an amount of data is a big challenge. In this paper, a lambda cloud architecture innovatively coupled to a scientific sharing platform used to archive, and process high-frequency data are proposed to integrate future developments of the Internet of Things applied to the monitoring of domestic animals. An application to the study of cattle behavior on pasture based on the data recorded with the IMU of iPhone 4s is exemplified. Performances comparison between iPhone 4s and iPhone 5s is also achieved. The package comes also with a web interface to encode the actual behavior observed on videos and to synchronize observations with the sensor signals. Finally, the use of Edge computing on the iPhone reduced by 43.5% on average the size of the raw data by eliminating redundancies. The limitation of the number of digits on individual variable can reduce data redundancy up to 98.5%. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. An Approach to Implementing the Batch Layer in an Energy Management System.
- Author
-
Marinov, Milko
- Subjects
- *
ENERGY management , *BIG data , *ELECTRONIC data processing - Abstract
Recently, Energy Management Systems (EMS) have become more integrated and created higher demand of big data processing, thus challenging the real time analysis based on big data technologies. Marz's Lambda architecture is used for solving problems with querying high amounts of petabyte data. In relation to this, this article presents the Lambda architecture of a particular EMS. The quick and inaccurate results received from the speed layer are replaced by the more precise results of the batch layer. With reference to this, the author proposes a realization of the batch layer based on Hadoop MapReduce technology. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks.
- Author
-
Demertzis, Konstantinos, Tziritas, Nikos, Kikiras, Panayiotis, Sanchez, Salvador Llopis, and Iliadis, Lazaros
- Subjects
COMPUTER network security ,INDUSTRIAL security ,COGNITIVE computing ,INTERNET security ,CYBERTERRORISM - Abstract
A Security Operations Center (SOC) is a central technical level unit responsible for monitoring, analyzing, assessing, and defending an organization's security posture on an ongoing basis. The SOC staff works closely with incident response teams, security analysts, network engineers and organization managers using sophisticated data processing technologies such as security analytics, threat intelligence, and asset criticality to ensure security issues are detected, analyzed and finally addressed quickly. Those techniques are part of a reactive security strategy because they rely on the human factor, experience and the judgment of security experts, using supplementary technology to evaluate the risk impact and minimize the attack surface. This study suggests an active security strategy that adopts a vigorous method including ingenuity, data analysis, processing and decision-making support to face various cyber hazards. Specifically, the paper introduces a novel intelligence driven cognitive computing SOC that is based exclusively on progressive fully automatic procedures. The proposed λ-Architecture Network Flow Forensics Framework (λ-NF3) is an efficient cybersecurity defense framework against adversarial attacks. It implements the Lambda machine learning architecture that can analyze a mixture of batch and streaming data, using two accurate novel computational intelligence algorithms. Specifically, it uses an Extreme Learning Machine neural network with Gaussian Radial Basis Function kernel (ELM/GRBFk) for the batch data analysis and a Self-Adjusting Memory k-Nearest Neighbors classifier (SAM/k-NN) to examine patterns from real-time streams. It is a forensics tool for big data that can enhance the automate defense strategies of SOCs to effectively respond to the threats their environments face. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Big Data Technology to Exploit Climate Information/Consumption Models and to Predict Future Behaviours
- Author
-
Cortés, A., Téllez, A. E., Gallardo, M., Peralta, J. J., Tzafestas, S.G., Series editor, and González Alonso, Ignacio, editor
- Published
- 2014
- Full Text
- View/download PDF
40. Forecast Model Update Based on a Real-Time Data Processing Lambda Architecture for Estimating Partial Discharges in Hydrogenerator
- Author
-
Fabio Henrique Pereira, Francisco Elânio Bezerra, Diego Oliva, Gilberto Francisco Martha de Souza, Ivan Eduardo Chabu, Josemir Coelho Santos, Shigueru Nagao Junior, and Silvio Ikuyo Nabeta
- Subjects
autoregressive forecasting model ,lambda architecture ,partial discharges ,power hydrogenerators ,real-time data processing ,Chemical technology ,TP1-1185 - Abstract
The prediction of partial discharges in hydrogenerators depends on data collected by sensors and prediction models based on artificial intelligence. However, forecasting models are trained with a set of historical data that is not automatically updated due to the high cost to collect sensors’ data and insufficient real-time data analysis. This article proposes a method to update the forecasting model, aiming to improve its accuracy. The method is based on a distributed data platform with the lambda architecture, which combines real-time and batch processing techniques. The results show that the proposed system enables real-time updates to be made to the forecasting model, allowing partial discharge forecasts to be improved with each update with increasing accuracy.
- Published
- 2020
- Full Text
- View/download PDF
41. Applying the ETL Process to Blockchain Data. Prospect and Findings
- Author
-
Roberta Galici, Laura Ordile, Michele Marchesi, Andrea Pinna, and Roberto Tonelli
- Subjects
ETL ,Bitcoin ,blockchain ,lambda architecture ,blockchain analytics ,Information technology ,T58.5-58.64 - Abstract
We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction. We describe the implementation of address clustering algorithms specific for the Bitcoin blockchain and the process to collect and transform data and to load them in the database. To balance the input data rate with the elaboration time, we manage blockchain data according to the lambda architecture. To evaluate our process, we first analyzed the performances in terms of scalability, and then we checked its usability by analyzing loaded data. Finally, we present the results of a toy analysis, which provides some findings about blockchain data, focusing on a comparison between the statistics of the last year of transactions, and previous results of historical blockchain data found in the literature. The ETL process we realized to analyze blockchain data is proven to be able to perform a reliable and scalable data acquisition process, whose result makes stored data available for further analysis and business.
- Published
- 2020
- Full Text
- View/download PDF
42. Lambda Architecture
- Author
-
Sakr, Sherif, editor and Zomaya, Albert Y., editor
- Published
- 2019
- Full Text
- View/download PDF
43. Scalable prediction-based online anomaly detection for smart meter data.
- Author
-
Liu, Xiufeng and Nielsen, Per Sieverts
- Subjects
- *
ANOMALY detection (Computer security) , *SMART meters , *SCALABILITY , *ELECTRIC meters , *PREDICTION models , *ENERGY consumption , *DATA mining - Abstract
Today smart meters are widely used in the energy sector to record energy consumption in real time. Large amounts of smart meter data have been accumulated and used for diverse analysis purposes. Anomaly detection raises the big data problem, namely the detection of abnormal events or unusual consumption behaviors. However, there is a lack of appropriate online systems that can handle anomaly detection for large-scale smart meter data effectively and efficiently. This paper proposes a lambda system for detecting anomalous consumption patterns, aiming at assisting decision makings for smart energy management. The proposed system uses a prediction-based detection method, combined with a novel lambda architecture for iterative model updates and real-time anomaly detection. This paper evaluates the system using a real-world data set and a large synthetic data set, and compares with three baselines. The results show that the proposed system has good scalability, and has a competitive advantage over others in anomaly detection. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
44. A real-time recommendation engine using lambda architecture.
- Author
-
Numnonda, Thanisa
- Abstract
In a data science theory, the recommended methodology is one of the most popular theories and has been deployed in many real industries. However, one of the most challenging problems these days is how to recommend items with massively streaming data. Therefore, this paper aims to do a real-time recommendation engine using the Lambda architecture. The Apache Hadoop and Apache Spark frameworks were used in this research to process the MovieLens dataset comprised 100 K and 20 M ratings from the GroupLens research. Using alternating least squares (ALS) and k-means algorithms, the top K recommendation movies and the top K trending movies for each user were shown as results. Additionally, the mean squared error (MSE) and within cluster sum of squared error (WCSS) had been computed to evaluate the performance of the ALS and k-means algorithms, sequentially. The results showed that they are acceptable since the MSE and WCSS values are low when comparing to the size of data. However, they can still be improved by tuning some parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
45. Monitoring System Using Internet of Things For Potential Landslides.
- Author
-
Moulat, Meryem El, Debauche, Olivier, Mahmoudi, Saïd, Brahim, Lahcen Aït, Manneback, Pierre, and Lebeau, Frédéric
- Subjects
INTERNET of things ,LOGISTIC regression analysis ,LANDSLIDES ,ROBUST control ,ELECTRIC power system faults - Abstract
The North-Western RIF of Morocco is considered as one of the most mountainous zone in the Middle East and North Africa. This area is more serious in the corridor faults region, where the recent reactivation of those tectonic layering may greatly contribute to the triggering of landslides. The consequences of this phenomenon can be enormous property damage and human casualties. Furthermore, this disaster can disrupt progress and destroy developmental efforts of government, and often pushing nations back by many years. In our previous works of Tetouan-Ras-Mazari region, we identified the areas that are prone to landslides by different methods like Weights of Evidence (WofE) and Logistic Regression (LR). In fact, these zones are built and susceptible. Undoubtedly, the challenge to save human lives is vital. For this reason, we develop a robust monitoring model as part of an alert system to evacuate populations in case of imminent danger risks. This model is ground-based remote monitoring system consist of more than just field sensors; they employ data acquisition units to record sensor measurements, automated data processing, and display of current conditions usually via the Internet of Things (IoT). To sum up, this paper outlines a new approach of monitoring to detect when hillslopes are primed for sliding and can provide early indications of rapid and catastrophic movement. It reports also continuous information from up-to-the-minute or real-time monitoring, provides prompt notification of landslide activities, advances our understanding of landslide behaviors, and enables more effective engineering and planning efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Web Monitoring of Bee Health for Researchers and Beekeepers Based on the Internet of Things.
- Author
-
Debauche, Olivier, Moulat, Meryem El, Mahmoudi, Saïd, Boukraa, Slimane, Manneback, Pierre, and Lebeau, Frédéric
- Subjects
COLONY collapse disorder of honeybees ,BIODIVERSITY ,BEEKEEPERS ,INTERNET of things ,POLLINATION by bees ,PRECISION farming - Abstract
The Colony Collapse Disorder (CCD) also entitled ‘ Colony Loss ’ has a significant impact on the biodiversity, on the pollination of crops and on the profitability. The Internet of Things associated with cloud computing offers possibilities to collect and treat a wide range of data to monitor and follow the health status of the colon. The surveillance of the animals’ pollination by collecting data at large scale is an important issue in order to ensure their survival and pollination, which is mandatory for food production. Moreover, new network technologies like Low Power Wide Area (LPWAN) or 3GPP protocols and the appearance on the market easily programmable nodes allow to create, at low-cost, sensors and effectors for the Internet of Things. In this paper, we propose a technical solution easily replicable, based on accurate and affordable sensors and a cloud architecture to monitor and follow bees’ behavior. This solution provides a platform for researchers to better understand and measure the impacts factors which lead to the mass extinction of bees. The suggested model is also a digital and useful tool for beekeepers to better follow up with their beehives. It helps regularly inspect their hives to check the health of the colony. The massive collection of data opens new research for a better understanding of factors that influence the life of bees. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
47. Data lake for electronic commerce
- Author
-
Vale, Filip and Vrdoljak, Boris
- Subjects
data store ,data lake ,veliki podaci ,big data ,TECHNICAL SCIENCES. Computing ,elektronička trgovina ,TEHNIČKE ZNANOSTI. Računarstvo ,electronic commerce ,jezero podataka ,lambda arhitektura ,lambda architecture ,spremište podataka - Abstract
Jezero podataka je spremište podataka koje pohranjuje neobrađene nestrukturirane, polustrukturirane i strukturirane podatke. U ovome radu navedena su svojstva jezera podataka kao i velikih skupova podataka, objašnjena je lambda arhitektura te je implementirano jezero podataka za elektroničku trgovinu. Koristeći Apacheove alate Sqoop, Flume, Kafku i Flink ostvaren je unos podataka u jezero podataka. Za pohranu podataka koristio se Hadoop dok se za analitičke upite koristio Hive. A data lake is a data repository that stores raw unstructured, semi-structured, and structured data. This paper presents the properties of data lakes as well as large data sets, explains the lambda architecture, and implements a data lake for electronic commerce. Apache's tools Sqoop, Flume, Kafka and Flink were used to insert data in the data lake. Hadoop was used for data storage while Hive was used for analytical queries.
- Published
- 2022
48. Adaptive real-time anomaly detection in cloud infrastructures.
- Author
-
Agrawal, Bikash, Wiktorski, Tomasz, and Rong, Chunming
- Subjects
ANOMALY detection (Computer security) ,CLOUD computing ,LAMBDA algebra ,OUTLIER detection ,REAL-time computing ,SCALABILITY - Abstract
Cloud computing has become increasingly popular, which has led many individuals and organizations towards cloud storage systems. This move is motivated by benefits such as shared storage, computation, and transparent service among a massive number of users. However, cloud-computing systems require the maintenance of complex and large-scale systems with practically unavoidable runtime problems caused by hardware and software faults. Large systems are very complex due to heterogeneity, dynamicity, scalability, hidden complexity, and time limitations. Automatic anomaly detection is a critical technique for managing such complex cloud resources. This paper proposes a scalable model for automatic anomaly detection on a large system like a cloud. The anomaly detection process is capable of issuing a correct early warning of unusual behavior in dynamic environments after learning the system characteristic of normal operation. To detect unusual activity in the cloud, we need to monitor the data center and collect cloud performance logs. In this paper, we propose an adaptive anomaly detection mechanism, which investigates principal components of the performance metrics. It transforms the performance metrics into a low-rank matrix and calculates the orthogonal distance using the Robust PCA algorithm. The proposed model updates itself recursively, while learning and adjusting the new threshold value, to minimize reconstruction errors. This paper also investigates robust principal component analysis in distributed environments using Apache Spark as the underlying framework. It specifically addresses cases in which normal operation might exhibit multiple hidden modes. The accuracy and sensitivity of the model were tested on Amazon CloudWatch datasets, and Yahoo! datasets. The model achieved an accuracy of 88.54 %. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
49. Ahab: A cloud-based distributed big data analytics framework for the Internet of Things.
- Author
-
Vögler, Michael, Schleicher, Johannes M., Inzinger, Christian, and Dustdar, Schahram
- Subjects
SMART cities ,DATA analytics ,URBAN planning ,SOFTWARE maintenance ,INTERNET of things - Abstract
Smart city applications generate large amounts of operational data during their execution, such as information from infrastructure monitoring, performance and health events from used toolsets, and application execution logs. These data streams contain vital information about the execution environment that can be used to fine-tune or optimize different layers of a smart city application infrastructure. Current approaches do not sufficiently address the efficient collection, processing, and storage of this information in the smart city domain. In this paper, we present Ahab, a generic, scalable, and fault-tolerant data processing framework based on the cloud that allows operators to perform online and offline analyses on gathered data to better understand and optimize the behavior of the available smart city infrastructure. Ahab is designed for easy integration of new data sources, provides an extensible API to perform custom analysis tasks, and a domain-specific language to define adaptation rules based on analysis results. We demonstrate the feasibility of the proposed approach using an example application for autonomous intersection management in smart city environments. Our framework is able to autonomously optimize application deployment topologies by distributing processing load over available infrastructure resources when necessary based on both online analysis of the current state of the environment and patterns learned from historical data. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
50. Comparative Study of Real Time Machine Learning Models for Stock Prediction through Streaming Data
- Author
-
Santanu Kumar Rath, Ranjan Kumar Behera, Robertas Damasevicius, Sanjay Misra, and Sushree Das
- Subjects
General Computer Science ,Twitter API ,Computer science ,business.industry ,Lambda Archi ,NodeJS ,QA75.5-76.95 ,Machine learning ,computer.software_genre ,Stock prediction ,Theoretical Computer Science ,MLlib ,Spark Streaming ,Electronic computers. Computer science ,Streaming data ,Artificial intelligence ,business ,computer ,Lambda Architecture - Abstract
Stock prediction is one of the emerging applications in the field of data science which help the companies to make better decision strategy. Machine learning models play a vital role in the field of prediction. In this paper, we have proposed various machine learning models which predicts the stock price from the real-time streaming data. Streaming data has been a potential source for real-time prediction which deals with continuous ow of data having information from various sources like social networking websites, server logs, mobile phone applications, trading oors etc. We have adopted the distributed platform, Spark to analyze the streaming data collected from two different sources as represented in two case studies in this paper. The first case study is based on stock prediction from the historical data collected from Google finance websites through NodeJs and the second one is based on the sentiment analysis of Twitter collected through Twitter API available in Stanford NLP package. Several researches have been made in developing models for stock prediction based on static data. In this work, an effort has been made to develop scalable, fault tolerant models for stock prediction from the real-time streaming data. The Proposed model is based on a distributed architecture known as Lambda architecture. The extensive comparison is made between actual and predicted output for different machine learning models. Support vector regression is found to have better accuracy as compared to other models. The historical data is considered as a ground truth data for validation.
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.