Descriptor: "data enrichment" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"data enrichment"' showing total 309 results

Start Over Descriptor "data enrichment"

309 results on '"data enrichment"'

1. Physics-Informed Learning

Author: Neuer, Marcus J. and Neuer, Marcus J.
Published: 2025
Full Text: View/download PDF

2. Enhancing TRIZ Contradiction Resolution with AI-Driven Contradiction Navigator (AICON)

Author: Brad, Stelian, Brad, Emilia, Cîrlejan, Alexandru, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, M. Davison, Robert, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Cavallucci, Denis, editor, Brad, Stelian, editor, and Livotov, Pavel, editor
Published: 2025
Full Text: View/download PDF

3. Detection of Municipal Heat Plan Documents Using Semantic Recognition Methods

Author: Doms, Nicolas, Schlachter, Thorsten, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Jørgensen, Bo Nørregaard, editor, Ma, Zheng Grace, editor, Wijaya, Fransisco Danang, editor, Irnawan, Roni, editor, and Sarjiya, Sarjiya, editor
Published: 2025
Full Text: View/download PDF

4. ViSL model: The model automatically generates sentences of Vietnamese sign language

Author: Khanh Dang and Igor A. Bessmertny
Subjects: vietnamese sign language, sign language model, automatic sentence generation, n-gram, markov model, breadth-first search, data enrichment, grammatical rules, Optics. Light, QC350-467, Electronic computers. Computer science, QA75.5-76.95
Abstract: The main problem in building intelligent systems is the lack of data for machine learning, which is especially important for sign language recognition for the deaf and hard of hearing. One of the ways to increase the amount of data for training is synthesis. Unlike speech synthesis, it is impossible to create a sequence of gestures in Vietnamese and some other languages that exactly repeat the text. This is due to the significant limitations of the gesture dictionary and the different word order in sentences. The aim of the work is to enrich the educational corpus of video data for use in creating recognition systems for the Vietnamese Sign Language (ViSL). Since it is impossible to translate the words of the source text into gestures one to one, the problem of translating from a regular language into a sign language arises. The paper proposes to use a two-phase process for this. The first phase involves pre-processing the text with standardization of the text format, segmentation of words and sentences, and then encoding the words using the sign language dictionary. At this stage, it should be noted that there is no need to remove punctuation marks and stop words, since they are related to the accuracy of the N-gram model. Next, instead of using syntactic analysis, a statistical method for forming a sequence of gestures is used, and the Markov model on the transition graph between words is taken as a basis in which the probability of the next word depends only on the two previous words. Transition probabilities are calculated on the existing marked corpus of the ViSL. The Breadth-first Search method is used to compile a list of all sentences generated based on a given grammatical rule and a matrix of semantic interactions between words. The inverse of the logarithm of the product of the probabilities of co-occurrence of consecutive 3-word phrases in a sentence is used to estimate the frequency of occurrence of that sentence in a given data set. Based on the ViSL data of 3,234 words, we calculated probability matrices representing the relationships between words based on Vietnamese natural language data with 50 million sentences collected from Vietnamese newspapers and magazines. For different grammar rules, we compare the number of generated sentences and evaluate the accuracy of the 50 most frequent sentences. The average accuracy is 88 %. The accuracy of the generated sentences is estimated by manual statistical methods. The number of generated sentences depends on the number of word parts that are labeled according to the grammar rules. The semantic accuracy of the generated sentences will be very high if the search words are labeled with the correct part-of-speech tagging. Compared with machine learning methods, our proposed method gives very good results for languages without inflections and word order that follow certain rules, such as Vietnamese, and does not require large computational resources. The disadvantage of this method is that its accuracy largely depends on the type of word, sentence, and word segmentation. The relationship of words depends on the observed dataset. Future research direction is to generate paragraphs in sign language. The obtained data can be used in machine learning models for sign language processing tasks.
Published: 2024
Full Text: View/download PDF

5. Analysis of data-driven approaches for radar target classification

Author: Coşkun, Aysu and Bilicz, Sándor
Published: 2024
Full Text: View/download PDF

6. Integration of web scraping, fine-tuning, and data enrichment in a continuous monitoring context via large language model operations.

Author: Bodor, Anas, Hnida, Meriem, and Daoudi, Najima
Subjects: LANGUAGE models, REAL estate listings, ACQUISITION of data, DATA modeling
Abstract: This paper presents and discusses a framework that leverages large-scale language models (LLMs) for data enrichment and continuous monitoring emphasizing its essential role in optimizing the performance of deployed models. It introduces a comprehensive large language model operations (LLMOps) methodology based on continuous monitoring and continuous improvement of the data, the primary determinant of the model, in order to optimize the prediction of a given phenomenon. To this end, first we examine the use of real-time web scraping using tools such as Kafka and Spark Streaming for data acquisition and processing. In addition, we explore the integration of LLMOps for complete lifecycle management of machine learning models. Focusing on continuous monitoring and improvement, we highlight the importance of this approach for ensuring optimal performance of deployed models based on data and machine learning (ML) model monitoring. We also illustrate this methodology through a case study based on real data from several real estate listing sites, demonstrating how MLflow can be integrated into an LLMOps pipeline to guarantee complete development traceability, proactive detection of performance degradations and effective model lifecycle management. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

7. Let's explain crisis: deep multi-scale hierarchical attention framework for crisis-task identification.

Author: Priya, Shalini, Joshi, Vaishali, and Chandra, Joydeep
Subjects: *IDENTIFICATION, *CRISIS management, *CRISES, *EMERGENCY medical services, *DATA quality
Abstract: Emergency services rely heavily on Twitter for early detection of crisis tasks to enhance crisis management systems. However, employing state-of-the-art models often face data sparsity as well as their inadequacy to handle long-range dependencies between tweet tokens. Additionally, the authorities need to gain confidence in the model's prediction so that the detected task information can be better believed and prioritized. In this study, we present a generalized framework named explainable attentive model for crisis task identification (ExACT) to handle the above mentioned challenges, while identifying crisis task relevant tweets as well as provide the model explainability by utilizing a very small corpus of tweets. The novelty of ExACT is two-fold: (1) Data enrichment has been introduced by nondynamic contextual attributes derived from tweets to overcome the sparsity and improve data quality. (2) Feature enrichment has been incorporated using hierarchical attention at both local and global levels using residual self-attention and correlation attention to capture long-range dependencies. Additionally, LIME based explainability approach added to understand the task important tokens. Experiments reveal that ExACT has a competitive performance improvement over various state-of-the-art models in terms of F 1 -score ( 20 % and 14 % respectively) and accuracy ( 14 % and 16 % , respectively) across two different crisis tasks infrastructure damage and support signal identification. Consistent performance improvement for two different tasks considered from publicly available crisis event datasets depicts the model's generalizability. While, LIME supported explainable mechanism in ExACT can identify the important keywords but does not guarantee a high score in terms of plausibility and faithfulness metrics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing.

Author: Shyalika, Chathurangi, Wickramarachchi, Ruwan, El Kalach, Fadi, Harik, Ramy, and Sheth, Amit
Subjects: *DATA distribution, *DATA augmentation, *PREDICTION models, *TIME series analysis, *MANUFACTURING processes
Abstract: Rare events are occurrences that take place with a significantly lower frequency than more common, regular events. These events can be categorized into distinct categories, from frequently rare to extremely rare, based on factors like the distribution of data and significant differences in rarity levels. In manufacturing domains, predicting such events is particularly important, as they lead to unplanned downtime, a shortening of equipment lifespans, and high energy consumption. Usually, the rarity of events is inversely correlated with the maturity of a manufacturing industry. Typically, the rarity of events affects the multivariate data generated within a manufacturing process to be highly imbalanced, which leads to bias in predictive models. This paper evaluates the role of data enrichment techniques combined with supervised machine learning techniques for rare event detection and prediction. We use time series data augmentation and sampling to address the data scarcity, maintaining its patterns, and imputation techniques to handle null values. Evaluating 15 learning models, we find that data enrichment improves the F1 measure by up to 48% in rare event detection and prediction. Our empirical and ablation experiments provide novel insights, and we also investigate model interpretability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Recognition of Logo of Pirated Content Using Deep Learning-Based Regression Classification Algorithm

Author: Patalappa, Kiran Kumar Jakkur, Chandramouli, Supriya Maganahalli, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bandyopadhyay, Sivaji, editor, Balas, Valentina Emilia, editor, Biswas, Saroj Kumar, editor, Saha, Anish Kumar, editor, and Thounaojam, Dalton Meitei, editor
Published: 2024
Full Text: View/download PDF

10. Operational Sustainability Perspective for Fresh-Produce Small-Medium Enterprises (SMEs)

Author: Chaudhry, Maria, Khalilpour, Kaveh, Karimi, Faezeh, Kumar, Chhabi, Section editor, Leal Filho, Walter, Series Editor, Ng, Theam Foo, editor, Iyer-Raniga, Usha, editor, Ng, Artie, editor, and Sharifi, Ayyoob, editor
Published: 2024
Full Text: View/download PDF

11. Artificial Intelligence and MicroRNA: Role in Cancer Evolution

Author: Koroliouk, Dimitri, Mattei, Maurizio, Zoziuk, Maxym, Montesano, Carla, Bernardini, Roberta, Potestà, Marina, Wondeu, Laure Deutou, Pirrò, Stefano, Galgani, Andrea, Colizzi, Vittorio, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Luntovskyy, Andriy, editor, Klymash, Mikhailo, editor, Melnyk, Igor, editor, Beshley, Mykola, editor, and Schill, Alexander, editor
Published: 2024
Full Text: View/download PDF

12. Semantic Enrichment and Analysis of Building Energy Consumption Data for the City of Sofia

Author: Koleva, Teodora, Vitanova, Lidia, Petrova-Antonova, Dessislava, Kostadinov, Alexander, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Carette, Jacques, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Maglogiannis, Ilias, editor, Iliadis, Lazaros, editor, Karydis, Ioannis, editor, Papaleonidas, Antonios, editor, and Chochliouros, Ioannis, editor
Published: 2024
Full Text: View/download PDF

13. Improving Vietnamese Legal Question–Answering System Based on Automatic Data Enrichment

Author: Vuong, Thi-Hai-Yen, Nguyen, Ha-Thanh, Nguyen, Quang-Huy, Nguyen, Le-Minh, Phan, Xuan-Hieu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Bono, Mayumi, editor, Takama, Yasufumi, editor, Satoh, Ken, editor, Nguyen, Le-Minh, editor, and Kurahashi, Setsuya, editor
Published: 2024
Full Text: View/download PDF

14. Enhanced Prediction Model for Blast-Induced Air Over-Pressure in Open-Pit Mines Using Data Enrichment and Random Walk-Based Grey Wolf Optimization–Two-Layer ANN Model.

Author: Nguyen, Hoang, Bui, Xuan-Nam, Drebenstedt, Carsten, and Choi, Yosoon
Subjects: WOLVES, OPTIMIZATION algorithms, ARTIFICIAL neural networks, PREDICTION models, PARTICLE swarm optimization, RANDOM effects model, RANDOM walks
Abstract: In this study, two innovative techniques were introduced, including data enrichment and optimization, with the aim of significantly improving the accuracy of air over-pressure (AOP) prediction models in mine blasting. Firstly, the Extra Trees algorithm was applied to enrich the collected dataset with the goal of enhancing the understanding of the predictive models for AOP prediction. Then, a neural network model with two hidden layers (ANN) was designed to predict AOP using both the original and enriched datasets. Secondly, to further enhance the accuracy of the ANN model, a novel optimization algorithm based on a random walk strategy and the grey wolf optimization algorithm (RWGWO) was employed to optimize the weights of the ANN model. This optimized model, referred to as the RWGWO–ANN model, was developed and evaluated for predicting AOP using both the original and enriched datasets. To comprehensively assess the impact of data enrichment and the proposed RWGWO-ANN model, three other optimization algorithms—particle swarm optimization (PSO), fruit-fly optimization algorithm (FOA), and single-based genetic algorithm (SGA)—were also applied to optimize the ANN model for AOP prediction. These models were named PSO–ANN, FOA–ANN, and SGA–ANN, respectively. The tenfold cross-validation procedure was applied and repeated three times to ensure the objectivity and consistency of the models. Additionally, conventional ANN and the United States Bureau of Mines empirical model were developed for comparison, serving similar purposes to evaluate the efficiency of the optimization algorithms employed in this study. To demonstrate the advantages of the proposed method and models, a dataset comprising 312 blasting events and six input parameters at the Coc Sau open-pit coal mine in Vietnam was gathered and analyzed. These parameters included burden, spacing, rock hardness, powder factor, monitoring distance, and maximum explosive charge per delay. An additional input variable—Extra Trees—was introduced, making the total number of input variables seven in the enriched dataset. The proposed hybrid model, along with others, was developed based on both the original and enriched datasets. The results revealed that the Extra Trees algorithm is robust and effectively enriches the raw dataset, enhancing the understanding of predictive models and providing improved accuracy. Sensitivity analysis results also highlighted the robust contribution of the Extra Trees variable in the enriched dataset. Compared to the original dataset, the performance of AOP predictive models was improved by 7–24% using the enriched dataset enriched by the Extra Trees algorithm. Furthermore, the findings indicated that the RWGWO–ANN model exhibited the highest accuracy in predicting AOP in this study, achieving an accuracy of 96.2%. This marked a 16–20% improvement over the accuracy of the conventional ANN model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Reshaping Smart Cities through NGSI-LD Enrichment

Author: González, Víctor, Martín, Laura, Santana, Juan Ramón, Sotres, Pablo, Lanza, Jorge, and Sánchez, Luis
Abstract: The vast amount of information stemming from the deployment of the Internet of Things and open data portals is poised to provide significant benefits for both the private and public sectors, such as the development of value-added services or an increase in the efficiency of public services. This is further enhanced due to the potential of semantic information models such as NGSI-LD, which enable the enrichment and linkage of semantic data, strengthened by the contextual information present by definition. In this scenario, advanced data processing techniques need to be defined and developed for the processing of harmonised datasets and data streams. Our work is based on a structured approach that leverages the principles of linked-data modelling and semantics, as well as a data enrichment toolchain framework developed around NGSI-LD. Within this framework, we reveal the potential for enrichment and linkage techniques to reshape how data are exploited in smart cities, with a particular focus on citizen-centred initiatives. Moreover, we showcase the effectiveness of these data processing techniques through specific examples of entity transformations. The findings, which focus on improving data comprehension and bolstering smart city advancements, set the stage for the future exploration and refinement of the symbiosis between semantic data and smart city ecosystems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Missing values and data enrichment: an application to social media liking.

Author: Mariani, Paolo, Marletta, Andrea, and Locci, Matteo
Subjects: *MISSING data (Statistics), *SPARSE matrices, *SOCIAL networks, *STATISTICS, *BIG data, *SOCIAL media
Abstract: In the big data context, it is very frequent to manage the analysis of missing values. This is especially relevant in the field of statistical analysis, where this represents a thorny issue. This study proposes a strategy for data enrichment in presence of sparse matrices. The research objective consists in the evaluation of a possible distinction of behaviour among observations in sparse matrices with missing data. After selecting among the multiple imputation methods, an innovative technique will be presented to impute missing observations as a negative position or a neutral opinion. This method has been applied to a dataset measuring the interaction between users and social network pages for some Italian newspapers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Using machine learning and data enrichment in the selection of roads for small-scale maps.

Author: Karsznia, Izabela, Adolf, Albert, Leyk, Stefan, and Weibel, Robert
Subjects: *MACHINE learning, *MAP design, *DATABASES, *STRUCTURAL models, *DECISION making
Abstract: Making decisions about which objects to keep or omit is challenging in map design. This process, called selection, constitutes the first operation in cartographic generalization. In this research, a method of automatic road selection for creating small-scale maps using machine learning and data enrichment is proposed. First, the problem of contextual information scarcity concerning roads in the source database is addressed. Additional information concerning the relations between roads and other objects was added (such as centrality and proximity measures). Second, machine learning is used to design automatic selection models based on enriched information. Third, three different road selection approaches are implemented. The baseline approach is following the official map design guidelines. The second approach is based on machine learning using the enriched road database. The third approach is based on an existing structural model. The results of all approaches are compared to existing atlas maps designed by experienced cartographers. The results of the Machine Learning Approaches were most similar to the atlas maps (between 81% and 90% accuracy). The least efficient approaches were the Structural Approach with 32% and the Guidelines Approach with 44% accuracy. We conclude that enriching road data with new contextual information concerning roads and using machine learning is beneficial as the achieved results outperform both Guidelines and Structural Approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Ricgraph: A flexible and extensible graph to explore research in context from various systems

Author: Rik D.T. Janssen
Subjects: Data harvesting, Data enrichment, Data linkage, Linked data, Knowledge graph, Metadata, Computer software, QA76.75-76.765
Abstract: Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items.Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. Ricgraph is flexible and extensible, and can be adapted to new application areas.In this article, we illustrate how Ricgraph works by applying it to the application area research information.
Published: 2024
Full Text: View/download PDF

19. SHP2SIM: a python pipeline for Modelica based district and urban scale energy simulations

Author: Theresa Boiger and Gerald Schweiger
Subjects: building energy simulation, data enrichment, modelica model, district scale, urban scale, Renewable energy sources, TJ807-830
Abstract: Energy simulation models are crucial to estimate the energy demand of buildings, especially for prospective planning on a district or city scale. As required input data is not available in many cases, an automated model generation workflow is needed. Existing workflows have several disadvantages, including: (i) dependence on large input datasets of existing buildings; (ii) no 3D representation to support the planning process; (iii) they are proprietary solutions. The pipeline ‘SHP2SIM’ is an open-source python pipeline enabling enrichment and generation of building energy simulation models based on little input data for district and urban scale. The pipeline is tested by simulating the heat load for a district with 27 buildings and validated for one building: R squared is 0.9825, CV(RMSE) is 22.10%, and NMBE is 4.06% on a monthly basis. To enable reproducibility and encourage open science, input data, output models, and the pipeline are openly available (https://github.com/tug-cps/shp2sim).
Published: 2023
Full Text: View/download PDF

20. Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing

Author: Chathurangi Shyalika, Ruwan Wickramarachchi, Fadi El Kalach, Ramy Harik, and Amit Sheth
Subjects: rare events, event detection, event prediction, time series, data enrichment, smart manufacturing, Chemical technology, TP1-1185
Abstract: Rare events are occurrences that take place with a significantly lower frequency than more common, regular events. These events can be categorized into distinct categories, from frequently rare to extremely rare, based on factors like the distribution of data and significant differences in rarity levels. In manufacturing domains, predicting such events is particularly important, as they lead to unplanned downtime, a shortening of equipment lifespans, and high energy consumption. Usually, the rarity of events is inversely correlated with the maturity of a manufacturing industry. Typically, the rarity of events affects the multivariate data generated within a manufacturing process to be highly imbalanced, which leads to bias in predictive models. This paper evaluates the role of data enrichment techniques combined with supervised machine learning techniques for rare event detection and prediction. We use time series data augmentation and sampling to address the data scarcity, maintaining its patterns, and imputation techniques to handle null values. Evaluating 15 learning models, we find that data enrichment improves the F1 measure by up to 48% in rare event detection and prediction. Our empirical and ablation experiments provide novel insights, and we also investigate model interpretability.
Published: 2024
Full Text: View/download PDF

21. Using Quantitative Metabolomics and Data Enrichment to Interpret the Biochemistry of a Novel Disease

Author: Wishart, David S., Levatte, Marcia A., Ivanisevic, Julijana, editor, and Giera, Martin, editor
Published: 2023
Full Text: View/download PDF

22. Multi-agent Architecture for Passive Rootkit Detection with Data Enrichment

Author: Trinks, Maickel, Gondim, João, Albuquerque, Robson, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Garcia, Marcelo V., editor, and Gordón-Gallegos, Carlos, editor
Published: 2023
Full Text: View/download PDF

23. SHP2SIM: a python pipeline for Modelica based district and urban scale energy simulations.

Author: Boiger, Theresa and Schweiger, Gerald
Subjects: *ZONING, *PYTHON programming language, *OPEN scholarship, *ENERGY consumption, *HEATING load
Abstract: Energy simulation models are crucial to estimate the energy demand of buildings, especially for prospective planning on a district or city scale. As required input data is not available in many cases, an automated model generation workflow is needed. Existing workflows have several disadvantages, including: (i) dependence on large input datasets of existing buildings; (ii) no 3D representation to support the planning process; (iii) they are proprietary solutions. The pipeline 'SHP2SIM' is an open-source python pipeline enabling enrichment and generation of building energy simulation models based on little input data for district and urban scale. The pipeline is tested by simulating the heat load for a district with 27 buildings and validated for one building: R squared is 0.9825, CV(RMSE) is 22.10%, and NMBE is 4.06% on a monthly basis. To enable reproducibility and encourage open science, input data, output models, and the pipeline are openly available (). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Spatio-historical data enrichment for toponomastics in Bali, The Island of Gods.

Author: Kersapati, Muhamad Iko
Subjects: ONLINE databases, HISTORICAL maps, GEOGRAPHIC names, DATA entry, DATA scrubbing, DISCOURSE analysis, TOPONYMY, GODS
Abstract: This study examines the leverage of big data to enrich a toponymy repository with the objective to facilitate the spatial–historical investigation of toponymy in Bali. The strong Hinduism cultural background becomes the sacred values that generally affect the toponymic formation. The dataset constitutes historic maps from Dutch Royal Tropical Institute digital library and online databases from several sources such as GeoNames, OpenStreetMap, and Indonesian Geospatial Information Agency. The mixed method leverages the data enrichment technique with descriptive and historical analysis to strengthen the discourse of toponymy formation. The geographical variables to enrich the toponyms data are including elevation, geomorphology, land cover, and rivers, prepared in shapefiles. Some technical procedures are conducted to create a geodatabase such as georeferentiation, digitisation, data manipulation and cleaning, and overlay analysis resulting in 16,923 clean toponyms data entries. The overlay analysis indicates different trends in each class of geographic variables. The functional-semantic and lexical–morphological basis of name-giving defines name formation in correlation with the name models that examine the linguistic elements. Some main groups of place names and their etymologies are identified such as Banjar, Gunung, Tukad, Padang, Bukit, and Gili are associated with geomorphological factors such as volcanic formations, freshwaters, and marine. Further explorations are suggested to dig deeper into the discourse of toponymy in Bali as a complex system involves the typical landscapes, cultural as well as political and social characteristics of the region. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Enhancing Causal Analysis through Hypothetical Query Systems and Data Integration

Author: Heravi, Kayvon
Subjects: Computer science, Causal Inference, Data Discovery, Data Enrichment, Hypothetical Queries
Abstract: Causal inference plays a pivotal role in statistical analysis and decision-making across various disciplines, including epidemiology, economics, and social sciences. Database systems, as the backbone of information storage and retrieval across multiple sectors, require sophisticated analytical capabilities to support decision-making processes. However, real-world datasets often present complexities such as redundancy, incompleteness, and the lack of critical attributes. This thesis proposes multifaceted approaches to address these complexities and limitations of traditional causal inference methods in database environments. Specifically, it introduces a graphical user interface that leverages what-if and how-to queries. Additionally, we create a framework that enriches datasets from diverse data sources to uncover complex relationships, ensuring robust causal analysis.
Published: 2024

26. Machine Learning Algorithms for Data Enrichment: A Promising Solution for Enhancing Accuracy in Predicting Blast-Induced Ground Vibration in Open-Pit Mines.

Author: Hoang NGUYEN, Xuan-Nam BUI, and DREBENSTEDT, Carsten
Abstract: The issue of blast-induced ground vibration poses a significant environmental challenge in open-pit mines, necessitating precise prediction and control measures. While artificial intelligence and machine learning models hold promise in addressing this concern, their accuracy remains a notable issue due to constrained input variables, dataset size, and potential environmental impact. To mitigate these challenges, data enrichment emerges as a potential solution to enhance the efficacy of machine learning models, not only in blast-induced ground vibration prediction but also across various domains within the mining industry. This study explores the viability of utilizing machine learning for data enrichment, with the objective of generating an augmented dataset that offers enhanced insights based on existing data points for the prediction of blast-induced ground vibration. Leveraging the support vector machine (SVM), we uncover intrinsic relationships among input variables and subsequently integrate them as supplementary inputs. The enriched dataset is then harnessed to construct multiple machine learning models, including k-nearest neighbors (KNN), classification and regression trees (CART), and random forest (RF), all designed to predict blast-induced ground vibration. Comparative analysis between the enriched models and their original counterparts, established on the initial dataset, provides a foundation for extracting insights into optimizing the performance of machine learning models not only in the context of predicting blast-induced ground vibration but also in addressing broader challenges within the mining industry [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. HBIM TO SUPPORT THE EXECUTIVE DESIGN OF A RESTORATION. CRITICAL ISSUES RELATED TO GEOMETRIC AND SEMANTIC MODELING.

Author: Valentini, Margherita, Battini, Carlo, and Vecchiattini, Rita
Subjects: GEOMETRIC modeling, BUILDING information modeling, SCIENTIFIC literature, STANDARDIZATION, PARAMETRIC modeling, CRITICAL analysis
Abstract: In recent years, the Building Information Modeling (BIM) has been the subject of extensive scientific literature. However, its application in historical construction still requires theoretical research and experimentation. The primary drawback is linked to the fact that historical heritage does not align with the standardization principles inherent in BIM modeling, which is predominantly tailored for new construction endeavors. The application of BIM modeling to the Oratory of S. Giovanni Battista in Bussana Vecchia, Sanremo (Imperia) aims to conduct a critical analysis of the methodology by creating an intelligent parametric 3D model capable of containing all the information necessary for an executive restoration project of the object. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Data Enrichment Toolchain: A Data Linking and Enrichment Platform for Heterogeneous Data

Author: Luis Sanchez, Jorge Lanza, Juan Ramon Santana, Pablo Sotres, Victor Gonzalez, Laura Martin, Gurkan Solmaz, Erno Kovacs, Maren Dietzel, Anja Summa, Amir Reza Jafari, Roberto Minerva, and Noel Crespi
Subjects: Data enrichment, semantic annotation, data linking, data processing, heterogenous data, data interoperability, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Proliferation of data sources associated to Internet of Things (IoT) deployment as well as those bound to Open Data Portals (e.g. European Data Portal, Municipalities Open Data Portals, etc.) and Social Media platforms is creating an abundance of information that is called to bring benefits for both the private and public sectors, through the development of added-value services, increasing administrations’ transparency and availability or fostering efficiency of public services. However, pieces of information without a context are significantly less valuable. Raw data lacks semantics and it is highly heterogeneous from one data-source to another. This poses a challenge to make it useful. To turn all this data into valuable information it is necessary to enable its combination so that meaningful context can be created. Moreover, it is fundamental to define the mechanisms enabling the adoption and orchestration of advanced (typically AI-enabled) data processing techniques to be applied over the harmonized datasets and data-streams. This paper presents the Data Enrichment Toolchain (DET) that provides the necessary harmonization and enrichment to datasets and data-streams coming from heterogeneous sources. The value of the enriched data lies on the one hand in the transfer of the data into a semantically grounded knowledge graph and, on the other hand, in the creation of new data through linking, aggregating and reasoning on the data. In both cases, the benefit of employing linked-data modelling and semantics comes from the extension of the metadata that is associated to every piece of information. Furthermore, the experimental evaluation of the DET implementation that we have carried out is also presented in the paper.
Published: 2023
Full Text: View/download PDF

29. How Your Cultural Dataset is Connected to the Rest Linked Open Data?

Author: Mountantonakis, Michalis, Tzitzikas, Yannis, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Moropoulou, Antonia, editor, Georgopoulos, Andreas, editor, Doulamis, Anastasios, editor, Ioannides, Marinos, editor, and Ronchi, Alfredo, editor
Published: 2022
Full Text: View/download PDF

30. Artificially Intelligent Solutions: Detection, Debunking, and Fact-Checking

Author: Rubin, Victoria L. and Rubin, Victoria L.
Published: 2022
Full Text: View/download PDF

31. Supporting Semantic Data Enrichment at Scale

Author: Ciavotta, Michele, Cutrona, Vincenzo, De Paoli, Flavio, Nikolov, Nikolay, Palmonari, Matteo, Roman, Dumitru, Curry, Edward, editor, Auer, Sören, editor, Berre, Arne J., editor, Metzger, Andreas, editor, Perez, Maria S., editor, and Zillner, Sonja, editor
Published: 2022
Full Text: View/download PDF

32. Review of Literature on Open Data for Scalability and Operation Efficiency of Electric Bus Fleets

Author: Graczyk, Tomasz, Lewańska, Elżbieta, Stróżyna, Milena, Michalak, Dariusz, van der Aalst, Wil, Series Editor, Mylopoulos, John, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Abramowicz, Witold, editor, Auer, Sören, editor, and Stróżyna, Milena, editor
Published: 2022
Full Text: View/download PDF

33. A Machine Learning Pipeline to Forecast the Electricity and Heat Consumption in a City District.

Author: Antonesi, Gabriel, Cioara, Tudor, Toderean, Liana, Anghel, Ionut, and De Mulder, Chaim
Subjects: MULTILAYER perceptrons, ELECTRIC power consumption, DEMAND forecasting, MACHINE learning, CITIES & towns, NONLINEAR statistical models, ENERGY consumption
Abstract: The shift towards renewable energy integration into smart grids has led to complex management processes, which require finer-grained energy and heat generation/ demand forecasting while considering data from monitoring devices and the integration of smaller multi-energy sub-systems at the community, district, or buildings level. However, energy prediction is challenging due to the high variability in the electrical and thermal energy demands of building occupants, the heterogenous characteristics of the energy assets or buildings in a district, and the length of the forecasting horizon. In this paper, we define a data-driven machine-learning pipeline to predict the electricity and thermal consumption of buildings and energy assets from a city district in 24 h intervals. Each pipeline's step is divided into sensors' data processing and model integration, data enrichment and features engineering, and multilayer perceptron model training. To address some of the drawbacks of using the multi-layer perceptron model, such as slow convergence rate and risk of overfitting, and to ensure a lower error in the energy prediction process features, an engineering technique was employed. We incorporated weather data features and interaction features derived from fusing the energy data with statistical models to capture the nonlinear patterns of the electrical and heat demands. The proposed approach was successfully validated in a real-world environment, a city district in Gent, Belgium. It featured good prediction results for electricity and heat production and consumption of various assets without considering the physical characteristics, making it viable and easily applicable in broader urban areas. The evaluation of energy prediction accuracy yielded good results, with a Mean Absolute Error (MAE) falling within the range of 0.003 to 3.27, and a Mean Absolute Scaled Error (MASE) ranging from 7 × 10−5 to 2.57 × 10−3. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

34. LOW-PROCESSING DATA ENRICHMENT AND CALIBRATION FOR PM2.5 LOW-COST SENSORS.

Author: STOJANOVIĆ, Danka B., KLEUTa, Duška N., DAVIDOVIĆ, Miloš D., DE VITO, Saverio, JOVASEVIĆ-STOJANOVIĆ, Milena V., BARTONOVA, Alena, and LEPIOUFLE, Jean-Marie
Subjects: *CALIBRATION, *PARTICULATE matter, *DETECTORS, *RANDOM forest algorithms, *REGRESSION analysis, *AIR pollutants
Abstract: Particulate matter (PM) in air has been proven to be hazardous to human health. Here we focused on analysis of PM data we obtained from the same campaign which was presented in our previous study. Multivariate linear and random forest models were used for the calibration and analysis. In our linear regression model the inputs were PM, temperature and humidity measured with low-cost sensors, and the target was the reference PM measurements obtained from SEPA in the same timeframe. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

35. Improvement of multi-task learning by data enrichment: application for drug discovery.

Author: Sosnina, Ekaterina A., Sosnin, Sergey, and Fedorov, Maxim V.
Subjects: *DRUG discovery, *ARTIFICIAL neural networks, *DEEP learning
Abstract: Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

36. Reshaping Smart Cities through NGSI-LD Enrichment

Author: Víctor González, Laura Martín, Juan Ramón Santana, Pablo Sotres, Jorge Lanza, and Luis Sánchez
Subjects: data enrichment, linked data, data understandability, semantic annotation, data processing, smart cities, Chemical technology, TP1-1185
Abstract: The vast amount of information stemming from the deployment of the Internet of Things and open data portals is poised to provide significant benefits for both the private and public sectors, such as the development of value-added services or an increase in the efficiency of public services. This is further enhanced due to the potential of semantic information models such as NGSI-LD, which enable the enrichment and linkage of semantic data, strengthened by the contextual information present by definition. In this scenario, advanced data processing techniques need to be defined and developed for the processing of harmonised datasets and data streams. Our work is based on a structured approach that leverages the principles of linked-data modelling and semantics, as well as a data enrichment toolchain framework developed around NGSI-LD. Within this framework, we reveal the potential for enrichment and linkage techniques to reshape how data are exploited in smart cities, with a particular focus on citizen-centred initiatives. Moreover, we showcase the effectiveness of these data processing techniques through specific examples of entity transformations. The findings, which focus on improving data comprehension and bolstering smart city advancements, set the stage for the future exploration and refinement of the symbiosis between semantic data and smart city ecosystems.
Published: 2024
Full Text: View/download PDF

37. Data Enrichment as a Method of Data Preprocessing to Enhance Short-Term Wind Power Forecasting.

Author: Zhou, Yingya, Ma, Linwei, Ni, Weidou, and Yu, Colin
Subjects: *WIND power, *WIND forecasting, *WEATHER forecasting, *STANDARD deviations, *FORECASTING, *WIND power plants
Abstract: Wind power forecasting involves data preprocessing and modeling. In pursuit of better forecasting performance, most previous studies focused on creating various wind power forecasting models, but few studies have been published with an emphasis on new types of data preprocessing methods. Effective data preprocessing techniques and the fusion with the physical nature of the wind have been called upon as potential future research directions in recent reviews in this area. Data enrichment as a method of data preprocessing has been widely applied to forecasting problems in the consumer data universe but has not seen application in the wind power forecasting area. This study proposes data enrichment as a new addition to the existing library of data preprocessing methods to improve wind power forecasting performance. A methodological framework of data enrichment is developed with four executable steps: add error features of weather prediction sources, add features of weather prediction at neighboring nodes, add time series features of weather prediction sources, and add complementary weather prediction sources. The proposed data enrichment method takes full advantage of multiple commercially available weather prediction sources and the physical continuity nature of wind. It can cooperate with any existing forecasting models that have weather prediction data as inputs. The controlled experiments on three actual individual wind farms have verified the effectiveness of the proposed data enrichment method: The normalized root mean square error (NRMSE) of the day-ahead wind power forecast of XGBoost and LSTM with data enrichment is 11% to 27% lower than that of XGBoost and LSTM without data enrichment. In the future, variations on the data enrichment methods can be further explored as a promising direction of enhancing short-term wind power forecasting performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

38. An eco-aware framework for AI-based analysis of contextually enriched automotive trip data.

Author: Gace, Ivana, Vdovic, Hrvoje, Babic, Jurica, and Podobnik, Vedran
Subjects: *ARTIFICIAL intelligence, *GREENHOUSE gases, *GREENHOUSE effect, *PASSENGER traffic, *PUBLIC transit
Abstract: The transport sector is one of the main contributors to gas emissions and the greenhouse effect. Mobility as a Service seeks to combine different means of transport (public transport, bicycles, cars, etc.) in an integrated way to solve this problem. Raising drivers' awareness and advising them to drive more efficiently and economically can help reduce environmental impact. Ultimately, eco-driving can help save fuel, money and provide higher passenger satisfaction during the trip. The research uses a contextually enriched automotive data set collected via OBD, during passenger transportation in private cars. A multilevel analysis was conducted using a variety of approaches. With AI-based analysis, using unsupervised learning, we identified five groups of similarities as a function of speed and acceleration. The efficiency of each driver was examined and speed and acceleration/deceleration were noted as the most influential factors. Also, passengers' perception of driving quality is negatively affected by sudden acceleration and deceleration. Passengers have a poor perception of the driving efficiency and additional efforts should be made to educate them about the environmental aspect of driving. Finally, the main contribution is the eco-aware framework, which covers and describes the entire process from data collection to analysis. The collected data are publicly available and include 110 trips represented by more than 65,000 data points. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. Feature/vector entity retrieval and disambiguation techniques to create a supervised and unsupervised semantic table interpretation approach.

Author: Avogadro, Roberto, D'Adda, Fabio, and Cremaschi, Marco
Abstract: Recently, there has been an increasing interest in extracting and annotating tables on the Web. This activity allows the transformation of textual data into machine-readable formats to enable the execution of various artificial intelligence tasks, e. g., semantic search and dataset extension. Semantic Table Interpretation (STI) is the process of annotating elements in a table. The paper explores Semantic Table Interpretation, addressing the challenges of Entity Retrieval and Entity Disambiguation in the context of Knowledge Graphs (KGs). It introduces LamAPI , an Information Retrieval system with string/type-based filtering and s-elBat , an Entity Disambiguation technique that combines heuristic and ML-based approaches. By applying the acquired know-how in the field and extracting algorithms, techniques and components from our previous STI approaches and the state of the art, we have created a new platform capable of annotating any tabular data, ensuring a high level of quality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Vehicle Data Management System for Scenario-Based Validation of Automated Driving Functions

Author: Klitzke, Lars, Koch, Carsten, Haja, Andreas, Köster, Frank, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Helfert, Markus, editor, Klein, Cornel, editor, Donnellan, Brian, editor, and Gusikhin, Oleg, editor
Published: 2021
Full Text: View/download PDF

41. Smart Fleet Analysis with Focus on Target Fulfillment and Test Coverage

Author: Ramschak, Erich, Voegl, Rainer, Quinz, Philipp, Hammer, Michael Erich, Freidekind, Rudolf, and Bertram, Torsten, editor
Published: 2021
Full Text: View/download PDF

42. Imbalanced Learning in Assessing the Risk of Corruption in Public Administration

Author: Vasconcelos, Marcelo Oliveira, Chaim, Ricardo Matos, Cavique, Luís, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Marreiros, Goreti, editor, Melo, Francisco S., editor, Lau, Nuno, editor, Lopes Cardoso, Henrique, editor, and Reis, Luís Paulo, editor
Published: 2021
Full Text: View/download PDF

43. Evaluating Elements of Web-Based Data Enrichment for Pseudo-relevance Feedback Retrieval

Author: Breuer, Timo, Pest, Melanie, Schaer, Philipp, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Candan, K. Selçuk, editor, Ionescu, Bogdan, editor, Goeuriot, Lorraine, editor, Larsen, Birger, editor, Müller, Henning, editor, Joly, Alexis, editor, Maistro, Maria, editor, Piroi, Florina, editor, Faggioli, Guglielmo, editor, and Ferro, Nicola, editor
Published: 2021
Full Text: View/download PDF

44. A Hybrid Deep Learning-Based (HYDRA) Framework for Multifault Diagnosis Using Sparse MDT Reports

Author: Muhammad Sajid Riaz, Haneya Naeem Qureshi, Usama Masood, Ali Rizwan, Adnan Abu-Dayya, and Ali Imran
Subjects: Root cause analysis, cellular data sparsity, data enrichment, multi-fault diagnosis, minimization of drive tests, hybrid deep learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Diminishing viability of manual fault diagnosis in the increasingly complex emerging cellular network has motivated research towards artificial intelligence (AI)-based fault diagnosis using the minimization of drive test (MDT) reports. However, existing AI solutions in the literature remain limited to either diagnosis of faults in a single base station only or the diagnosis of a single fault in a multiple BS scenario. Moreover, lack of robustness to MDT reports spatial sparsity renders these solutions unsuitable for practical deployment. To address this problem, in this paper we present a novel framework named Hybrid Deep Learning-based Root Cause Analysis (HYDRA) that uses a hybrid of convolutional neural networks, extreme gradient boosting, and the MDT data enrichment techniques to diagnose multiple faults in a multiple base station network. Performance evaluation under realistic and extreme settings shows that HYDRA yields an accuracy of 93% and compared to the state-of-the-art fault diagnosis solutions, HYDRA is far more robust to MDT report sparsity.
Published: 2022
Full Text: View/download PDF

45. Ricgraph: A flexible and extensible graph to explore research in context from various systems

Author: Janssen, Rik D.T. and Janssen, Rik D.T.
Abstract: Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items. Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. Ricgraph is flexible and extensible, and can be adapted to new application areas. In this article, we illustrate how Ricgraph works by applying it to the application area research information.
Published: 2024

46. Enriching Product Catalogs with User Opinions

Author: de Melo, Tiago, da Silva, Altigran S., de Moura, Edleno S., Calado, Pável, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Boratto, Ludovico, editor, Faralli, Stefano, editor, Marras, Mirko, editor, and Stilo, Giovanni, editor
Published: 2020
Full Text: View/download PDF

47. Click and Sales Prediction for Digital Advertisements: Real World Application for OTAs

Author: Tekin, Ahmet Tezcan, Cebi, Ferhan, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Kahraman, Cengiz, editor, Cebi, Selcuk, editor, Cevik Onar, Sezi, editor, Oztaysi, Basar, editor, Tolga, A. Cagri, editor, and Sari, Irem Ucal, editor
Published: 2020
Full Text: View/download PDF

48. An interactive approach to semantic enrichment with geospatial data.

Author: De Paoli, Flavio, Ciavotta, Michele, Avogadro, Roberto, Hristov, Emil, Borukova, Milena, Petrova-Antonova, Dessislava, and Krasteva, Iva
Subjects: *LOCATION data, *DATA integration, *URBAN planning, *ARTIFICIAL intelligence, *DATA analytics
Abstract: The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI , a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI's potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. A Machine Learning Pipeline to Forecast the Electricity and Heat Consumption in a City District

Author: Gabriel Antonesi, Tudor Cioara, Liana Toderean, Ionut Anghel, and Chaim De Mulder
Subjects: machine learning pipeline, energy prediction, heat demand prediction, multilayer perceptron, data enrichment, features engineering, Building construction, TH1-9745
Abstract: The shift towards renewable energy integration into smart grids has led to complex management processes, which require finer-grained energy and heat generation/ demand forecasting while considering data from monitoring devices and the integration of smaller multi-energy sub-systems at the community, district, or buildings level. However, energy prediction is challenging due to the high variability in the electrical and thermal energy demands of building occupants, the heterogenous characteristics of the energy assets or buildings in a district, and the length of the forecasting horizon. In this paper, we define a data-driven machine-learning pipeline to predict the electricity and thermal consumption of buildings and energy assets from a city district in 24 h intervals. Each pipeline’s step is divided into sensors’ data processing and model integration, data enrichment and features engineering, and multilayer perceptron model training. To address some of the drawbacks of using the multi-layer perceptron model, such as slow convergence rate and risk of overfitting, and to ensure a lower error in the energy prediction process features, an engineering technique was employed. We incorporated weather data features and interaction features derived from fusing the energy data with statistical models to capture the nonlinear patterns of the electrical and heat demands. The proposed approach was successfully validated in a real-world environment, a city district in Gent, Belgium. It featured good prediction results for electricity and heat production and consumption of various assets without considering the physical characteristics, making it viable and easily applicable in broader urban areas. The evaluation of energy prediction accuracy yielded good results, with a Mean Absolute Error (MAE) falling within the range of 0.003 to 3.27, and a Mean Absolute Scaled Error (MASE) ranging from 7 × 10−5 to 2.57 × 10−3.
Published: 2023
Full Text: View/download PDF

50. Semi-automated Augmentation of Pandas DataFrames

Author: Lynden, Steven, Taveekarn, Waran, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Yuan, Junsong, Editorial Board Member, Tan, Ying, editor, and Shi, Yuhui, editor
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

309 results on '"data enrichment"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources