Descriptor: "Lambda architecture" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lambda architecture"' showing total 127 results

Start Over Descriptor "Lambda architecture"

127 results on '"Lambda architecture"'

1. Databases, Data Warehousing, and Data Analytics

Author: Eble, Michael, Hoch, Julian M., Krone, Jan, editor, and Pellegrini, Tassilo, editor
Published: 2024
Full Text: View/download PDF

2. The Impact of Data Ingestion Layer in an Improved Lambda Architecture

Author: Foko Sindjoung, Miguel Landry, Fotseu Fotseu, Ernest Basile, Velempini, Mthulisi, Fotsing Talla, Bernard, Bomgni (PI), Alain Bertrand, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

3. Big Data Architectures

Author: Garriga, Martin, Monsieur, Geert, Tamburri, Damian, Liebregts, Werner, editor, van den Heuvel, Willem-Jan, editor, van den Born, Arjan, editor, Van den Heuvel, Willem-Jan, Section Editor, Tamburri, Damian A., Section Editor, Böing-Messing, Florian, Section Editor, and Lafarre, Anne J. F., Section Editor
Published: 2023
Full Text: View/download PDF

4. Real-Time Assessment of Live Feeds in Big Data

Author: Bhagat, Amol, Deshpande, Makrand, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Joshi, Amit, editor, Mahmud, Mufti, editor, and Ragel, Roshan G., editor
Published: 2023
Full Text: View/download PDF

5. Performance Analysis of Lambda Architecture-Based Big-Data Systems on Air/Ground Surveillance Application with ADS-B Data.

Author: Demirezen, Mustafa Umut and Navruz, Tuğba Selcen
Subjects: *AUTOMATIC dependent surveillance-broadcast, *OXYGEN consumption, *SOFTWARE frameworks, *COMPUTER software development, *ELECTRONIC data processing
Abstract: This study introduces a novel methodology designed to assess the accuracy of data processing in the Lambda Architecture (LA), an advanced big-data framework qualified for processing streaming (data in motion) and batch (data at rest) data. Distinct from prior studies that have focused on hardware performance and scalability evaluations, our research uniquely targets the intricate aspects of data-processing accuracy within the various layers of LA. The salient contribution of this study lies in its empirical approach. For the first time, we provide empirical evidence that validates previously theoretical assertions about LA, which have remained largely unexamined due to LA's intricate design. Our methodology encompasses the evaluation of prospective technologies across all levels of LA, the examination of layer-specific design limitations, and the implementation of a uniform software development framework across multiple layers. Specifically, our methodology employs a unique set of metrics, including data latency and processing accuracy under various conditions, which serve as critical indicators of LA's accurate data-processing performance. Our findings compellingly illustrate LA's "eventual consistency". Despite potential transient inconsistencies during real-time processing in the Speed Layer (SL), the system ultimately converges to deliver precise and reliable results, as informed by the comprehensive computations of the Batch Layer (BL). This empirical validation not only confirms but also quantifies the claims posited by previous theoretical discourse, with our results indicating a 100% accuracy rate under various severe data-ingestion scenarios. We applied this methodology in a practical case study involving air/ground surveillance, a domain where data accuracy is paramount. This application demonstrates the effectiveness of the methodology using real-world data-intake scenarios, therefore distinguishing this study from hardware-centric evaluations. This study not only contributes to the existing body of knowledge on LA but also addresses a significant literature gap. By offering a novel, empirically supported methodology for testing LA, a methodology with potential applicability to other big-data architectures, this study sets a precedent for future research in this area, advancing beyond previous work that lacked empirical validation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. An ensemble deep learning based IDS for IoT using Lambda architecture

Author: Rubayyi Alghamdi and Martine Bellaiche
Subjects: IoT, IDS, Lambda architecture, Cyber-attacks, Deep learning, Ensemble learning, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract The Internet of Things (IoT) has revolutionized our world today by providing greater levels of accessibility, connectivity and ease to our everyday lives. It enables massive amounts of data to be traversed across multiple heterogeneous devices that are all interconnected. This phenomenon makes IoT networks vulnerable to various network attacks and intrusions. Building an Intrusion Detection System (IDS) for IoT networks is challenging as they enable a massive amount of data to be aggregated, which is difficult to handle and analyze in real time mainly because of the heterogeneous nature of IoT devices. This inefficient, traditional IDS approach accentuates the need to develop advanced IDS techniques by employing Machine or Deep Learning. This paper presents a deep ensemble-based IDS using Lambda architecture by following a multi-pronged classification approach. Binary classification uses Long Short Term Memory (LSTM) to differentiate between malicious and benign traffic, while the multi-class classifier uses an ensemble of LSTM, Convolutional Neural Network and Artificial Neural Network classifiers to detect the type of attacks. The model training is performed in the batch layer, while real-time evaluation is carried out through model inferences in the speed layer of the Lambda architecture. The proposed approach gives high accuracy of over 99.93% and saves useful processing time due to the multi-pronged classification strategy and using the lambda architecture.
Published: 2023
Full Text: View/download PDF

7. Lambda Architecture-Based Big Data System for Large-Scale Targeted Social Engineering Email Detection.

Author: Demirezen, Mustafa Umut and Navruz, Tuğba Selcen
Subjects: *MACHINE learning, *SOCIAL engineering (Fraud), *LANGUAGE models, *ELECTRONIC data processing, *BIG data, *SOCIAL engineering (Political science)
Abstract: In this research, we delve deep into the realm of Targeted Social Engineering Email Detection, presenting a novel approach that harnesses the power of Lambda Architecture (LA). Our innovative methodology strategically segments the BERT model into two distinct components: the embedding generator and the classification segment. This segmentation not only optimizes resource consumption but also improves system efficiency, making it a pioneering step in the field. Our empirical findings, derived from a rigorous comparison between the fastText and BERT models, underscore the superior performance of the latter. Specifically, The BERT model has high precision rates for identifying malicious and benign emails, with impressive recall values and F1 scores. Its overall accuracy rate was 0.9988, with a Matthews Correlation Coefficient value of 0.9978. In comparison, the fastText model showed lower precision rates. Leveraging principles reminiscent of the Lambda architecture, our study delves into the performance dynamics of data processing models. The Separated-BERT (Sep-BERT) model emerges as a robust contender, adept at managing both real-time (stream) and large-scale (batch) data processing. Compared to the traditional BERT, Sep-BERT showcased superior efficiency, with reduced memory and CPU consumption across diverse email sizes and ingestion rates. This efficiency, combined with rapid inference times, positions Sep-BERT as a scalable and cost-effective solution, aligning well with the demands of Lambda-inspired architectures. This study marks a significant step forward in the fields of big data and cybersecurity. By introducing a novel methodology and demonstrating its efficacy in detecting targeted social engineering emails, we not only advance the state of knowledge in these domains but also lay a robust foundation for future research endeavors, emphasizing the transformative potential of integrating advanced big data frameworks with machine learning models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. An ensemble deep learning based IDS for IoT using Lambda architecture.

Author: Alghamdi, Rubayyi and Bellaiche, Martine
Subjects: CONVOLUTIONAL neural networks, DEEP learning, DENIAL of service attacks, LONG-term memory, INTERNET of things, MACHINE learning
Abstract: The Internet of Things (IoT) has revolutionized our world today by providing greater levels of accessibility, connectivity and ease to our everyday lives. It enables massive amounts of data to be traversed across multiple heterogeneous devices that are all interconnected. This phenomenon makes IoT networks vulnerable to various network attacks and intrusions. Building an Intrusion Detection System (IDS) for IoT networks is challenging as they enable a massive amount of data to be aggregated, which is difficult to handle and analyze in real time mainly because of the heterogeneous nature of IoT devices. This inefficient, traditional IDS approach accentuates the need to develop advanced IDS techniques by employing Machine or Deep Learning. This paper presents a deep ensemble-based IDS using Lambda architecture by following a multi-pronged classification approach. Binary classification uses Long Short Term Memory (LSTM) to differentiate between malicious and benign traffic, while the multi-class classifier uses an ensemble of LSTM, Convolutional Neural Network and Artificial Neural Network classifiers to detect the type of attacks. The model training is performed in the batch layer, while real-time evaluation is carried out through model inferences in the speed layer of the Lambda architecture. The proposed approach gives high accuracy of over 99.93% and saves useful processing time due to the multi-pronged classification strategy and using the lambda architecture. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. An Urban Rail Signal Fault Diagnosis System Based on Knowledge Model

Author: YANG Jiang, YANG Yibin, OU Shengfen, and DENG Yongqi
Subjects: signal system, knowledge model, ontology web language(owl), fault diagnosis, lambda architecture, hadoop technical ecosystem, Control engineering systems. Automatic machinery (General), TJ212-225, Technology
Abstract: At present, urban rail signal maintenance system can only alarm a single fault source, and can not quickly locate the cause of fault and guide operation and maintenance personnel to deal with the fault. However, urban rail signal system has a large variety of faults, complex diagnosis and analysis logic, customized development of fault diagnosis procedures for different scenarios can not, quickly respond to the needs of operation and maintenance, and the cost is high. To solve this problem, this paper develops an information-based and platform-based urban rail signal fault diagnosis system based on knowledge model to realize signal system fault knowledge modeling, fault diagnosis semantic correlation and fault diagnosis process modeling. In order to realize the complete reasoning of signal system fault diagnosis, OWL DL(ontology web language description logic) is used to model knowledge, extract and describe the analysis logic of signal system fault diagnosis, and establish the knowledge model. The operation state of system equipment is used to match the fault cause, and the mapping from signal system equipment state space to fault cause space is used to realize the fault self-diagnosis of signal system equipment, so as to provide decision support for the production, operation and maintenance management of signal system equipment. Application results show that it can reduce the operation safety accident rate by 15% and improve the fault handling efficiency by 20%.
Published: 2022
Full Text: View/download PDF

10. Architecture Patterns—Batch and Real-Time Capabilities

Author: Kraetz, Dennis, Morawski, Michael, Liermann, Volker, editor, and Stegmann, Claus, editor
Published: 2021
Full Text: View/download PDF

11. A Conceptual Framework for Sensitive Big Data Publishing

Author: Victor, Nancy, Lopez, Daphne, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Purohit, Sunil Dutt, editor, Singh Jat, Dharm, editor, Poonia, Ramesh Chandra, editor, Kumar, Sandeep, editor, and Hiranwal, Saroj, editor
Published: 2021
Full Text: View/download PDF

12. Big Data Accident Prediction System in Green Networks and Intelligent Transportation Systems

Author: Tantaoui, Mouad, Laanaoui, My Driss, Kabil, Mustapha, Pisello, Anna Laura, Editorial Board Member, Hawkes, Dean, Editorial Board Member, Bougdah, Hocine, Editorial Board Member, Rosso, Federica, Editorial Board Member, Abdalla, Hassan, Editorial Board Member, Boemi, Sofia-Natalia, Editorial Board Member, Mohareb, Nabil, Editorial Board Member, Mesbah Elkaffas, Saleh, Editorial Board Member, Bozonnet, Emmanuel, Editorial Board Member, Pignatta, Gloria, Editorial Board Member, Mahgoub, Yasser, Editorial Board Member, De Bonis, Luciano, Editorial Board Member, Kostopoulou, Stella, Editorial Board Member, Pradhan, Biswajeet, Editorial Board Member, Abdul Mannan, Md., Editorial Board Member, Alalouch, Chaham, Editorial Board Member, O. Gawad, Iman, Editorial Board Member, Nayyar, Anand, Editorial Board Member, Amer, Mourad, Series Editor, Ben Ahmed, Mohamed, editor, Mellouli, Sehl, editor, Braganca, Luis, editor, Anouar Abdelhakim, Boudhir, editor, and Bernadetta, Kwintiana Ane, editor
Published: 2021
Full Text: View/download PDF

13. Toward Public Opinion Monitoring System of Large-Scale Data with Lambda Architecture

Author: Zhang, Weijuan, Lu, Yue, Ma, Kun, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Panda, Mrutyunjaya, editor, Pradhan, Subhrajit, editor, Garcia-Hernandez, Laura, editor, and Ma, Kun, editor
Published: 2021
Full Text: View/download PDF

14. Lambda+, the Renewal of the Lambda Architecture: Category Theory to the Rescue

Author: Gillet, Annabelle, Leclercq, Éric, Cullot, Nadine, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, La Rosa, Marcello, editor, Sadiq, Shazia, editor, and Teniente, Ernest, editor
Published: 2021
Full Text: View/download PDF

15. A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

Author: Jerry C. C. Tseng, Sun-Yuan Hsieh, and Vincent S. Tseng
Subjects: Complex event sequence, data stream, episode pattern mining, incremental mining, lambda architecture, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the ubiquity of sensor networks and smart devices that continuously collect data, we face the challenge of analyzing the growing stream of data in real time. In recent years, there has been a huge need to gain useful knowledge by incrementally analyzing event sequence data. Although episode pattern mining techniques have existed for years, people have recently become more aware of their practical value in solving real-life domain problems such as manufacturing records, stock markets, and weather forecasts. The effective and efficient application of episode pattern mining techniques to analyze complex event data is becoming increasingly important for solving real-life problems in wide domains. However, few studies have focused on developing a scalable framework based on episode pattern mining of complex event sequences for applications in various domains. In this work, we propose a novel framework named SAAF (Scalable Analytical Application Framework) based on complex event episode mining techniques, including batch episode mining, delta episode mining, incremental episode mining, and pattern merging, to consider both efficiency and accuracy. Moreover, to enhance scalability, we adopt the lambda architecture with Apache Spark and Apache Spark Streaming as the system development framework. Finally, the experimental results on three real datasets of different domains and two benchmark datasets showed that the proposed SAAF framework exhibits excellent performance in terms of efficiency, accuracy, and scalability.
Published: 2022
Full Text: View/download PDF

16. Cloud and distributed architectures for data management in agriculture 4.0 : Review and future trends.

Author: Debauche, Olivier, Mahmoudi, Saïd, Manneback, Pierre, and Lebeau, Frédéric
Subjects: DATA management, AGRICULTURE, COST effectiveness, CLOUD computing, AGRICULTURAL technology, MAINTAINABILITY (Engineering)
Abstract: [Display omitted] • Cloud architectures used in Agriculture 4.0. • Distributed Architectures and Cloud Computing complements. • Strategies of association between Edge, Fog, Cloud. • New architectural and computing trends. The Agriculture 4.0, also called Smart Agriculture or Smart Farming, is at the origin of the production of a huge amount of data that must be collected, stored, and processed in a very short time. Processing this massive quantity of data needs to use specific infrastructure that use adapted IoT architectures. Our review offers a comparative panorama of Central Cloud, Distributed Cloud Architectures, Collaborative Computing Strategies, and new trends used in the context of Agriculture 4.0. In this review, we try to answer 4 research questions: (1) Which storage and processing architectures are best suited to Agriculture 4.0 applications and respond to its peculiarities? (2) Can generic architectures meet the needs of Agriculture 4.0 application cases? (3) What are the horizontal development possibilities that allow the transition from research to industrialization? (4) What are the vertical valuations possibilities to move from algorithms trained in the cloud to embedded or stand-alone products? For this, we compare architectures with 8 criteria (User Proximity, Latency & Jitter, Network stability, high throughput, Reliability, Scalability, Cost Effectiveness, Maintainability), and analyze the advantages and disadvantages of each of them. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

17. A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure

Author: Suthakar, Uthayanath, Smith, D., Khan, A., and Magnoni, L.
Subjects: 004, Big data, Data science, Distributed system, Lambda Architecture, Parallel computing
Abstract: Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
Published: 2017

18. Lambda Architecture for Anomaly Detection in Online Process Mining Using Autoencoders

Author: Krajsic, Philippe, Franczyk, Bogdan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Hernes, Marcin, editor, Wojtkiewicz, Krystian, editor, and Szczerbicki, Edward, editor
Published: 2020
Full Text: View/download PDF

19. Stream and Event Processing Services for Real-time Linked Dataspaces

Author: Curry, Edward and Curry, Edward
Published: 2020
Full Text: View/download PDF

20. Architecting IoT Cloud

Author: Firouzi, Farshad, Farahani, Bahar, Firouzi, Farshad, editor, Chakrabarty, Krishnendu, editor, and Nassif, Sani, editor
Published: 2020
Full Text: View/download PDF

21. The Next-Generation NIDS Platform: Cloud-Based Snort NIDS Using Containers and Big Data.

Author: Saputra, Ferry Astika, Salman, Muhammad, Hasim, Jauari Akhmad Nur, Nadhori, Isbat Uzzin, and Ramli, Kalamullah
Subjects: SENSOR placement, BIG data, COMPUTER network security, CONTAINERS, SERVER farms (Computer network management), DATA logging
Abstract: Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor's platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. A Machine Hearing Framework for Real-Time Streaming Analytics Using Lambda Architecture

Author: Demertzis, Konstantinos, Iliadis, Lazaros, Anezakis, Vardis-Dimitris, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Yuan, Junsong, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Macintyre, John, editor, Iliadis, Lazaros, editor, Maglogiannis, Ilias, editor, and Jayne, Chrisina, editor
Published: 2019
Full Text: View/download PDF

23. The Big Data-RTAP: Toward a Secured Video Surveillance System in Smart Environment

Author: Ezzahout, Abderrahmane, Oubaha, Jawad, Kacprzyk, Janusz, Series Editor, Zbakh, Mostapha, editor, Essaaidi, Mohammed, editor, Manneback, Pierre, editor, and Rong, Chunming, editor
Published: 2019
Full Text: View/download PDF

24. Toward Information System Architecture to Support Predictive Maintenance Approach

Author: Sarazin, Alexandre, Truptil, Sébastien, Montarnal, Aurélie, Lamothe, Jacques, Popplewell, Keith, editor, Thoben, Klaus-Dieter, editor, Knothe, Thomas, editor, and Poler, Raúl, editor
Published: 2019
Full Text: View/download PDF

25. Lambda architecture for cost-effective batch and speed big data processing

Author: Kiran, M, Murphy, P, Monga, I, Dugan, J, and Baveja, SS
Subjects: big data processing, lambda architecture, Amazon EC2, sensor data analysis, Networking and Information Technology R&D
Abstract: Sensor and smart phone technologies present opportunities for data explosion, streaming and collecting from heterogeneous devices every second. Analyzing these large datasets can unlock multiple behaviors previously unknown, and help optimize approaches to city wide applications or societal use cases. However, collecting and handling of these massive datasets presents challenges in how to perform optimized online data analysis 'on-the-fly', as current approaches are often limited by capability, expense and resources. This presents a need for developing new methods for data management particularly using public clouds to minimize cost, network resources and on-demand availability. This paper presents an implementation of the lambda architecture design pattern to construct a data-handling backend on Amazon EC2, providing high throughput, dense and intense data demand delivered as services, minimizing the cost of the network maintenance. This paper combines ideas from database management, cost models, query management and cloud computing to present a general architecture that could be applied in any given scenario where affordable online data processing of Big Datasets is needed. The results are presented with a case study of processing router sensor data on the current ESnet network data as a working example of the approach. The results showcase a reduction in cost and argue benefits for performing online analysis and anomaly detection for sensor data.
Published: 2015

26. Lambda Architecture for Cost-Effective Batch and Speed Big Data Processing

Author: Kiran, Mariam, Murphy, Peter, Monga, Inder, Dugan, Jon, and Baveja, Sartaj Singh
Subjects: Networking and Information Technology R&D (NITRD), big data processing, lambda architecture, Amazon EC2, sensor data analysis
Abstract: Sensor and smart phone technologies present opportunities for data explosion, streaming and collecting from heterogeneous devices every second. Analyzing these large datasets can unlock multiple behaviors previously unknown, and help optimize approaches to city wide applications or societal use cases. However, collecting and handling of these massive datasets presents challenges in how to perform optimized online data analysis 'on-the-fly', as current approaches are often limited by capability, expense and resources. This presents a need for developing new methods for data management particularly using public clouds to minimize cost, network resources and on-demand availability. This paper presents an implementation of the lambda architecture design pattern to construct a data-handling backend on Amazon EC2, providing high throughput, dense and intense data demand delivered as services, minimizing the cost of the network maintenance. This paper combines ideas from database management, cost models, query management and cloud computing to present a general architecture that could be applied in any given scenario where affordable online data processing of Big Datasets is needed. The results are presented with a case study of processing router sensor data on the current ESnet network data as a working example of the approach. The results showcase a reduction in cost and argue benefits for performing online analysis and anomaly detection for sensor data.
Published: 2015

27. Lifelong Machine Learning and root cause analysis for large-scale cancer patient data

Author: Gautam Pal, Xianbin Hong, Zhuo Wang, Hongyi Wu, Gangmin Li, and Katie Atkinson
Subjects: Lifelong learning, Real-time data processing, Lambda Architecture, Streaming k-means, Random Decision Forest, Dimension reduction, Computer engineering. Computer hardware, TK7885-7895, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Introduction This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model. Case description A cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient. Discussion and evaluation The model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields. Conclusion We propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.
Published: 2019
Full Text: View/download PDF

28. The Next-Generation NIDS Platform: Cloud-Based Snort NIDS Using Containers and Big Data

Author: Ferry Astika Saputra, Muhammad Salman, Jauari Akhmad Nur Hasim, Isbat Uzzin Nadhori, and Kalamullah Ramli
Subjects: Snort, big data, cloud-based IDS, docker, lambda architecture, Technology
Abstract: Snort is a well-known, signature-based network intrusion detection system (NIDS). The Snort sensor must be placed within the same physical network, and the defense centers in the typical NIDS architecture offer limited network coverage, especially for remote networks with a restricted bandwidth and network policy. Additionally, the growing number of sensor instances, followed by a quick increase in log data volume, has caused the present system to face big data challenges. This research paper proposes a novel design for a cloud-based Snort NIDS using containers and implementing big data in the defense center to overcome these problems. Our design consists of Docker as the sensor’s platform, Apache Kafka, as the distributed messaging system, and big data technology orchestrated on lambda architecture. We conducted experiments to measure sensor deployment, optimum message delivery from the sensors to the defense center, aggregation speed, and efficiency in the data-processing performance of the defense center. We successfully developed a cloud-based Snort NIDS and found the optimum method for message-delivery from the sensor to the defense center. We also succeeded in developing the dashboard and attack maps to display the attack statistics and visualize the attacks. Our first design is reported to implement the big data architecture, namely, lambda architecture, as the defense center and utilize rapid deployment of Snort NIDS using Docker technology as the network security monitoring platform.
Published: 2022
Full Text: View/download PDF

29. Cloud architecture for plant phenotyping research.

Author: Debauche, Olivier, Mahmoudi, Sidi Ahmed, De Cock, Nicolas, Mahmoudi, Saïd, Manneback, Pierre, and Lebeau, Frédéric
Subjects: DATA management, BIG data, CLOUD computing, DISTRIBUTED databases
Abstract: Summary: Digital phenotyping is an emergent science mainly based on imagery techniques. The tremendous amount of data generated needs important cloud computing for their processing. The coupling of recent advance of distributed databases and cloud computing offers new possibilities of big data management and data sharing for the scientific research. In this paper, we present a solution combining a lambda architecture built around Apache Druid and a hosting platform leaning on Apache Mesos. Lambda architecture has already proved its performance and robustness. However, the capacity of ingesting and requesting of the database is essential and can constitute a bottleneck for the architecture, in particular, for in terms of availability and response time of data. We focused our experimentation on the response time of different databases to choose the most adapted for our phenotyping architecture. Apache Druid has shown its ability to respond to typical queries of phenotyping applications in times generally inferior to the second. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Renewable estimation and incremental inference in generalized linear models with streaming data sets.

Author: Luo, Lan and Song, Peter X.‐K.
Subjects: ASYMPTOTIC normality, DISTRIBUTED computing, DATA modeling, DATA analysis
Abstract: Summary: The paper presents an incremental updating algorithm to analyse streaming data sets using generalized linear models. The method proposed is formulated within a new framework of renewable estimation and incremental inference, in which the maximum likelihood estimator is renewed with current data and summary statistics of historical data. Our framework can be implemented within a popular distributed computing environment, known as Apache Spark, to scale up computation. Consisting of two data‐processing layers, the rho architecture enables us to accommodate inference‐related statistics and to facilitate sequential updating of the statistics used in both estimation and inference. We establish estimation consistency and asymptotic normality of the proposed renewable estimator, in which the Wald test is utilized for an incremental inference. Our methods are examined and illustrated by various numerical examples from both simulation experiments and a real world data analysis. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

31. -CoAP: An Internet of Things and Cloud Computing Integration Based on the Lambda Architecture and CoAP

Author: Díaz, Manuel, Martín, Cristian, Rubio, Bartolomé, Akan, Ozgur, Series editor, Cao, Jiannong, Series editor, Coulson, Geoffrey, Series editor, Dressler, Falko, Series editor, Ferrari, Domenico, Series editor, Gerla, Mario, Series editor, Kobayashi, Hisashi, Series editor, Palazzo, Sergio, Series editor, Sahni, Sartaj, Series editor, Shen, Xuemin (Sherman), Series editor, Stan, Mircea, Series editor, Xiaohua, Jia, Series editor, Zomaya, Albert, Series editor, Bellavista, Paolo, Series editor, Guo, Song, editor, Liao, Xiaofei, editor, Liu, Fangming, editor, and Zhu, Yanmin, editor
Published: 2016
Full Text: View/download PDF

32. Online Anomaly Energy Consumption Detection Using Lambda Architecture

Author: Liu, Xiufeng, Iftikhar, Nadeem, Nielsen, Per Sieverts, Heller, Alfred, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Madria, Sanjay, editor, and Hara, Takahiro, editor
Published: 2016
Full Text: View/download PDF

33. Information Flow Monitoring System

Author: Sang Hun Han, Aziz Nasridinov, and Keun Ho Ryu
Subjects: Information flow, social media data, skyline, Lambda architecture, MapReduce, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Vast quantities of data are generated by social networks in seconds. The information generated in a social network is transformed into a flow by the subjects who produce, transmit, and consume it. This flow can be represented in a very complicated directional graph where each subject is represented as a node, and the flow of information is represented as a directed edge. In this paper, we introduce a method of dividing this complex directional graph by user and quantifying the flow of information between and among users based on information flow vectors. We propose a system that can monitor the flow of information in social networks using information flow vectors extracted from social media data. We also introduce an improved skyline algorithm that can respond quickly to a user's various queries.
Published: 2018
Full Text: View/download PDF

34. Stream Processing on Demand for Lambda Architectures

Author: Kroß, Johannes, Brunnert, Andreas, Prehofer, Christian, Runkler, Thomas A., Krcmar, Helmut, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Beltrán, Marta, editor, Knottenbelt, William, editor, and Bradley, Jeremy, editor
Published: 2015
Full Text: View/download PDF

35. NEW APPROACH OF STORING AND RETRIEVING LARGE DATA VOLUMES.

Author: Šikanjić, Nedeljko and Avramović, Zoran Ž.
Subjects: BIG data, DATA quality, DATA warehousing, DATA
Abstract: In today's world of advanced informational technologies, society is facing a huge amount of data that is just getting impossible to store, process and analyze. In these big data volumes, some of the important information is being lost, that could help us improve the quality of personal and business life. This paper focus is on finding the best possible way of approaching this issue to find a feasible solution in increasing the efficiency and quality of data. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

36. Cloud services integration for farm animals' behavior studies based on smartphones as activity sensors.

Author: Debauche, Olivier, Mahmoudi, Saïd, Andriamandroso, Andriamasinoro Lalaina Herinaina, Manneback, Pierre, Bindelle, Jérôme, and Lebeau, Frédéric
Abstract: Smartphones, particularly iPhone, can be relevant instruments for researchers in animal behavior because they are readily available on the planet, contain many sensors and require no hardware development. They are equipped with high performance Inertial Measurement Units (IMU) and absolute positioning systems analyzing users' movements, but they can easily be diverted to analyze likewise the behaviors of domestic animals such as cattle. The study of animal behavior using smartphones requires the storage of many high frequency variables from a large number of individuals and their processing through various relevant variables combinations for modeling and decision-making. Transferring, storing, treating and sharing such an amount of data is a big challenge. In this paper, a lambda cloud architecture innovatively coupled to a scientific sharing platform used to archive, and process high-frequency data are proposed to integrate future developments of the Internet of Things applied to the monitoring of domestic animals. An application to the study of cattle behavior on pasture based on the data recorded with the IMU of iPhone 4s is exemplified. Performances comparison between iPhone 4s and iPhone 5s is also achieved. The package comes also with a web interface to encode the actual behavior observed on videos and to synchronize observations with the sensor signals. Finally, the use of Edge computing on the iPhone reduced by 43.5% on average the size of the raw data by eliminating redundancies. The limitation of the number of digits on individual variable can reduce data redundancy up to 98.5%. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

37. An Approach to Implementing the Batch Layer in an Energy Management System.

Author: Marinov, Milko
Subjects: *ENERGY management, *BIG data, *ELECTRONIC data processing
Abstract: Recently, Energy Management Systems (EMS) have become more integrated and created higher demand of big data processing, thus challenging the real time analysis based on big data technologies. Marz's Lambda architecture is used for solving problems with querying high amounts of petabyte data. In relation to this, this article presents the Lambda architecture of a particular EMS. The quick and inaccurate results received from the speed layer are replaced by the more precise results of the batch layer. With reference to this, the author proposes a realization of the batch layer based on Hadoop MapReduce technology. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks.

Author: Demertzis, Konstantinos, Tziritas, Nikos, Kikiras, Panayiotis, Sanchez, Salvador Llopis, and Iliadis, Lazaros
Subjects: COMPUTER network security, INDUSTRIAL security, COGNITIVE computing, INTERNET security, CYBERTERRORISM
Abstract: A Security Operations Center (SOC) is a central technical level unit responsible for monitoring, analyzing, assessing, and defending an organization's security posture on an ongoing basis. The SOC staff works closely with incident response teams, security analysts, network engineers and organization managers using sophisticated data processing technologies such as security analytics, threat intelligence, and asset criticality to ensure security issues are detected, analyzed and finally addressed quickly. Those techniques are part of a reactive security strategy because they rely on the human factor, experience and the judgment of security experts, using supplementary technology to evaluate the risk impact and minimize the attack surface. This study suggests an active security strategy that adopts a vigorous method including ingenuity, data analysis, processing and decision-making support to face various cyber hazards. Specifically, the paper introduces a novel intelligence driven cognitive computing SOC that is based exclusively on progressive fully automatic procedures. The proposed λ-Architecture Network Flow Forensics Framework (λ-NF3) is an efficient cybersecurity defense framework against adversarial attacks. It implements the Lambda machine learning architecture that can analyze a mixture of batch and streaming data, using two accurate novel computational intelligence algorithms. Specifically, it uses an Extreme Learning Machine neural network with Gaussian Radial Basis Function kernel (ELM/GRBFk) for the batch data analysis and a Self-Adjusting Memory k-Nearest Neighbors classifier (SAM/k-NN) to examine patterns from real-time streams. It is a forensics tool for big data that can enhance the automate defense strategies of SOCs to effectively respond to the threats their environments face. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

39. Big Data Technology to Exploit Climate Information/Consumption Models and to Predict Future Behaviours

Author: Cortés, A., Téllez, A. E., Gallardo, M., Peralta, J. J., Tzafestas, S.G., Series editor, and González Alonso, Ignacio, editor
Published: 2014
Full Text: View/download PDF

40. Forecast Model Update Based on a Real-Time Data Processing Lambda Architecture for Estimating Partial Discharges in Hydrogenerator

Author: Fabio Henrique Pereira, Francisco Elânio Bezerra, Diego Oliva, Gilberto Francisco Martha de Souza, Ivan Eduardo Chabu, Josemir Coelho Santos, Shigueru Nagao Junior, and Silvio Ikuyo Nabeta
Subjects: autoregressive forecasting model, lambda architecture, partial discharges, power hydrogenerators, real-time data processing, Chemical technology, TP1-1185
Abstract: The prediction of partial discharges in hydrogenerators depends on data collected by sensors and prediction models based on artificial intelligence. However, forecasting models are trained with a set of historical data that is not automatically updated due to the high cost to collect sensors’ data and insufficient real-time data analysis. This article proposes a method to update the forecasting model, aiming to improve its accuracy. The method is based on a distributed data platform with the lambda architecture, which combines real-time and batch processing techniques. The results show that the proposed system enables real-time updates to be made to the forecasting model, allowing partial discharge forecasts to be improved with each update with increasing accuracy.
Published: 2020
Full Text: View/download PDF

41. Applying the ETL Process to Blockchain Data. Prospect and Findings

Author: Roberta Galici, Laura Ordile, Michele Marchesi, Andrea Pinna, and Roberto Tonelli
Subjects: ETL, Bitcoin, blockchain, lambda architecture, blockchain analytics, Information technology, T58.5-58.64
Abstract: We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction. We describe the implementation of address clustering algorithms specific for the Bitcoin blockchain and the process to collect and transform data and to load them in the database. To balance the input data rate with the elaboration time, we manage blockchain data according to the lambda architecture. To evaluate our process, we first analyzed the performances in terms of scalability, and then we checked its usability by analyzing loaded data. Finally, we present the results of a toy analysis, which provides some findings about blockchain data, focusing on a comparison between the statistics of the last year of transactions, and previous results of historical blockchain data found in the literature. The ETL process we realized to analyze blockchain data is proven to be able to perform a reliable and scalable data acquisition process, whose result makes stored data available for further analysis and business.
Published: 2020
Full Text: View/download PDF

42. Lambda Architecture

Author: Sakr, Sherif, editor and Zomaya, Albert Y., editor
Published: 2019
Full Text: View/download PDF

43. Scalable prediction-based online anomaly detection for smart meter data.

Author: Liu, Xiufeng and Nielsen, Per Sieverts
Subjects: *ANOMALY detection (Computer security), *SMART meters, *SCALABILITY, *ELECTRIC meters, *PREDICTION models, *ENERGY consumption, *DATA mining
Abstract: Today smart meters are widely used in the energy sector to record energy consumption in real time. Large amounts of smart meter data have been accumulated and used for diverse analysis purposes. Anomaly detection raises the big data problem, namely the detection of abnormal events or unusual consumption behaviors. However, there is a lack of appropriate online systems that can handle anomaly detection for large-scale smart meter data effectively and efficiently. This paper proposes a lambda system for detecting anomalous consumption patterns, aiming at assisting decision makings for smart energy management. The proposed system uses a prediction-based detection method, combined with a novel lambda architecture for iterative model updates and real-time anomaly detection. This paper evaluates the system using a real-world data set and a large synthetic data set, and compares with three baselines. The results show that the proposed system has good scalability, and has a competitive advantage over others in anomaly detection. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

44. A real-time recommendation engine using lambda architecture.

Author: Numnonda, Thanisa
Abstract: In a data science theory, the recommended methodology is one of the most popular theories and has been deployed in many real industries. However, one of the most challenging problems these days is how to recommend items with massively streaming data. Therefore, this paper aims to do a real-time recommendation engine using the Lambda architecture. The Apache Hadoop and Apache Spark frameworks were used in this research to process the MovieLens dataset comprised 100 K and 20 M ratings from the GroupLens research. Using alternating least squares (ALS) and k-means algorithms, the top K recommendation movies and the top K trending movies for each user were shown as results. Additionally, the mean squared error (MSE) and within cluster sum of squared error (WCSS) had been computed to evaluate the performance of the ALS and k-means algorithms, sequentially. The results showed that they are acceptable since the MSE and WCSS values are low when comparing to the size of data. However, they can still be improved by tuning some parameters. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

45. Monitoring System Using Internet of Things For Potential Landslides.

Author: Moulat, Meryem El, Debauche, Olivier, Mahmoudi, Saïd, Brahim, Lahcen Aït, Manneback, Pierre, and Lebeau, Frédéric
Subjects: INTERNET of things, LOGISTIC regression analysis, LANDSLIDES, ROBUST control, ELECTRIC power system faults
Abstract: The North-Western RIF of Morocco is considered as one of the most mountainous zone in the Middle East and North Africa. This area is more serious in the corridor faults region, where the recent reactivation of those tectonic layering may greatly contribute to the triggering of landslides. The consequences of this phenomenon can be enormous property damage and human casualties. Furthermore, this disaster can disrupt progress and destroy developmental efforts of government, and often pushing nations back by many years. In our previous works of Tetouan-Ras-Mazari region, we identified the areas that are prone to landslides by different methods like Weights of Evidence (WofE) and Logistic Regression (LR). In fact, these zones are built and susceptible. Undoubtedly, the challenge to save human lives is vital. For this reason, we develop a robust monitoring model as part of an alert system to evacuate populations in case of imminent danger risks. This model is ground-based remote monitoring system consist of more than just field sensors; they employ data acquisition units to record sensor measurements, automated data processing, and display of current conditions usually via the Internet of Things (IoT). To sum up, this paper outlines a new approach of monitoring to detect when hillslopes are primed for sliding and can provide early indications of rapid and catastrophic movement. It reports also continuous information from up-to-the-minute or real-time monitoring, provides prompt notification of landslide activities, advances our understanding of landslide behaviors, and enables more effective engineering and planning efforts. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

46. Web Monitoring of Bee Health for Researchers and Beekeepers Based on the Internet of Things.

Author: Debauche, Olivier, Moulat, Meryem El, Mahmoudi, Saïd, Boukraa, Slimane, Manneback, Pierre, and Lebeau, Frédéric
Subjects: COLONY collapse disorder of honeybees, BIODIVERSITY, BEEKEEPERS, INTERNET of things, POLLINATION by bees, PRECISION farming
Abstract: The Colony Collapse Disorder (CCD) also entitled ‘ Colony Loss ’ has a significant impact on the biodiversity, on the pollination of crops and on the profitability. The Internet of Things associated with cloud computing offers possibilities to collect and treat a wide range of data to monitor and follow the health status of the colon. The surveillance of the animals’ pollination by collecting data at large scale is an important issue in order to ensure their survival and pollination, which is mandatory for food production. Moreover, new network technologies like Low Power Wide Area (LPWAN) or 3GPP protocols and the appearance on the market easily programmable nodes allow to create, at low-cost, sensors and effectors for the Internet of Things. In this paper, we propose a technical solution easily replicable, based on accurate and affordable sensors and a cloud architecture to monitor and follow bees’ behavior. This solution provides a platform for researchers to better understand and measure the impacts factors which lead to the mass extinction of bees. The suggested model is also a digital and useful tool for beekeepers to better follow up with their beehives. It helps regularly inspect their hives to check the health of the colony. The massive collection of data opens new research for a better understanding of factors that influence the life of bees. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

47. Data lake for electronic commerce

Author: Vale, Filip and Vrdoljak, Boris
Subjects: data store, data lake, veliki podaci, big data, TECHNICAL SCIENCES. Computing, elektronička trgovina, TEHNIČKE ZNANOSTI. Računarstvo, electronic commerce, jezero podataka, lambda arhitektura, lambda architecture, spremište podataka
Abstract: Jezero podataka je spremište podataka koje pohranjuje neobrađene nestrukturirane, polustrukturirane i strukturirane podatke. U ovome radu navedena su svojstva jezera podataka kao i velikih skupova podataka, objašnjena je lambda arhitektura te je implementirano jezero podataka za elektroničku trgovinu. Koristeći Apacheove alate Sqoop, Flume, Kafku i Flink ostvaren je unos podataka u jezero podataka. Za pohranu podataka koristio se Hadoop dok se za analitičke upite koristio Hive. A data lake is a data repository that stores raw unstructured, semi-structured, and structured data. This paper presents the properties of data lakes as well as large data sets, explains the lambda architecture, and implements a data lake for electronic commerce. Apache's tools Sqoop, Flume, Kafka and Flink were used to insert data in the data lake. Hadoop was used for data storage while Hive was used for analytical queries.
Published: 2022

48. Adaptive real-time anomaly detection in cloud infrastructures.

Author: Agrawal, Bikash, Wiktorski, Tomasz, and Rong, Chunming
Subjects: ANOMALY detection (Computer security), CLOUD computing, LAMBDA algebra, OUTLIER detection, REAL-time computing, SCALABILITY
Abstract: Cloud computing has become increasingly popular, which has led many individuals and organizations towards cloud storage systems. This move is motivated by benefits such as shared storage, computation, and transparent service among a massive number of users. However, cloud-computing systems require the maintenance of complex and large-scale systems with practically unavoidable runtime problems caused by hardware and software faults. Large systems are very complex due to heterogeneity, dynamicity, scalability, hidden complexity, and time limitations. Automatic anomaly detection is a critical technique for managing such complex cloud resources. This paper proposes a scalable model for automatic anomaly detection on a large system like a cloud. The anomaly detection process is capable of issuing a correct early warning of unusual behavior in dynamic environments after learning the system characteristic of normal operation. To detect unusual activity in the cloud, we need to monitor the data center and collect cloud performance logs. In this paper, we propose an adaptive anomaly detection mechanism, which investigates principal components of the performance metrics. It transforms the performance metrics into a low-rank matrix and calculates the orthogonal distance using the Robust PCA algorithm. The proposed model updates itself recursively, while learning and adjusting the new threshold value, to minimize reconstruction errors. This paper also investigates robust principal component analysis in distributed environments using Apache Spark as the underlying framework. It specifically addresses cases in which normal operation might exhibit multiple hidden modes. The accuracy and sensitivity of the model were tested on Amazon CloudWatch datasets, and Yahoo! datasets. The model achieved an accuracy of 88.54 %. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

49. Ahab: A cloud-based distributed big data analytics framework for the Internet of Things.

Author: Vögler, Michael, Schleicher, Johannes M., Inzinger, Christian, and Dustdar, Schahram
Subjects: SMART cities, DATA analytics, URBAN planning, SOFTWARE maintenance, INTERNET of things
Abstract: Smart city applications generate large amounts of operational data during their execution, such as information from infrastructure monitoring, performance and health events from used toolsets, and application execution logs. These data streams contain vital information about the execution environment that can be used to fine-tune or optimize different layers of a smart city application infrastructure. Current approaches do not sufficiently address the efficient collection, processing, and storage of this information in the smart city domain. In this paper, we present Ahab, a generic, scalable, and fault-tolerant data processing framework based on the cloud that allows operators to perform online and offline analyses on gathered data to better understand and optimize the behavior of the available smart city infrastructure. Ahab is designed for easy integration of new data sources, provides an extensible API to perform custom analysis tasks, and a domain-specific language to define adaptation rules based on analysis results. We demonstrate the feasibility of the proposed approach using an example application for autonomous intersection management in smart city environments. Our framework is able to autonomously optimize application deployment topologies by distributing processing load over available infrastructure resources when necessary based on both online analysis of the current state of the environment and patterns learned from historical data. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

50. Comparative Study of Real Time Machine Learning Models for Stock Prediction through Streaming Data

Author: Santanu Kumar Rath, Ranjan Kumar Behera, Robertas Damasevicius, Sanjay Misra, and Sushree Das
Subjects: General Computer Science, Twitter API, Computer science, business.industry, Lambda Archi, NodeJS, QA75.5-76.95, Machine learning, computer.software_genre, Stock prediction, Theoretical Computer Science, MLlib, Spark Streaming, Electronic computers. Computer science, Streaming data, Artificial intelligence, business, computer, Lambda Architecture
Abstract: Stock prediction is one of the emerging applications in the field of data science which help the companies to make better decision strategy. Machine learning models play a vital role in the field of prediction. In this paper, we have proposed various machine learning models which predicts the stock price from the real-time streaming data. Streaming data has been a potential source for real-time prediction which deals with continuous ow of data having information from various sources like social networking websites, server logs, mobile phone applications, trading oors etc. We have adopted the distributed platform, Spark to analyze the streaming data collected from two different sources as represented in two case studies in this paper. The first case study is based on stock prediction from the historical data collected from Google finance websites through NodeJs and the second one is based on the sentiment analysis of Twitter collected through Twitter API available in Stanford NLP package. Several researches have been made in developing models for stock prediction based on static data. In this work, an effort has been made to develop scalable, fault tolerant models for stock prediction from the real-time streaming data. The Proposed model is based on a distributed architecture known as Lambda architecture. The extensive comparison is made between actual and predicted output for different machine learning models. Support vector regression is found to have better accuracy as compared to other models. The historical data is considered as a ground truth data for validation.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

127 results on '"Lambda architecture"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources