970 results
Search Results
2. Special Issue of CAiSE 2023 Best Papers.
- Author
-
Reinhartz-Berger, Iris and Indulska, Marta
- Published
- 2025
- Full Text
- View/download PDF
3. Editorial Note to the special issue of the Information Systems journal on Web Engineering with selected papers from ICWE 2021 conference.
- Author
-
Chbeir, Richard, Frasincar, Flavius, and Manolopoulos, Yannis
- Subjects
- *
INFORMATION storage & retrieval systems , *ENGINEERING , *CONFERENCES & conventions - Published
- 2023
- Full Text
- View/download PDF
4. Special issue: BPM 2020 Selected Papers in Foundations and Engineering.
- Author
-
Fahland, Dirk, Ghidini, Chiara, Dumas, Marlon, and Reichert, Manfred
- Subjects
- *
BUILDING foundations - Published
- 2022
- Full Text
- View/download PDF
5. Special issue: Selected papers of ICPM 2019.
- Author
-
Carmona, Josep, Jans, Mieke, and Rosa, Marcello La
- Published
- 2022
- Full Text
- View/download PDF
6. Selected Papers of BPM 2019 - Editorial to the Special Issue.
- Author
-
Hildebrandt, Thomas, Dongen, Boudewijn F. van, Röglinger, Maximilian, and Mendling, Jan
- Published
- 2022
- Full Text
- View/download PDF
7. Special issue: BPM 2018 selected papers in foundations and engineering.
- Author
-
Montali, Marco, Weber, Ingo, Weske, Mathias, and Reichert, Manfred
- Published
- 2022
- Full Text
- View/download PDF
8. Editorial special issue: Selected papers of BPM 2016.
- Author
-
Rosa, Marcello La, Loos, Peter, Pastor, Oscar, and Reichert, Manfred
- Subjects
- *
BUSINESS process management , *INFORMATION technology , *INDUSTRIAL engineering , *OPERATIONS management , *SOCIAL computing , *CLOUD computing - Published
- 2018
- Full Text
- View/download PDF
9. Special issue: Selected papers of BPM 2012.
- Author
-
Barros, Alistair, Gal, Avigdor, and Kindler, Ekkart
- Subjects
- *
PUBLISHED articles , *PUBLISHING , *PERIODICAL articles , *PUBLICATIONS , *PERIODICAL publishing ,INFORMATION storage & retrieval systems periodicals - Published
- 2015
- Full Text
- View/download PDF
10. Information server for highly-connected cross-media publishing
- Author
-
Norrie, Moira C. and Signer, Beat
- Subjects
- *
INFORMATION resources management , *INFORMATION theory , *COMPUTER network architectures , *MASS media - Abstract
Abstract: Over the last decade, we have seen a significant increase in the number of projects aiming for integration of different kinds of media(mixed-media integration). However, most existing approaches tend to focus on the media technologies rather than on concepts for information integration and linking that enable users to move freely back and forth between various media information sources. In this paper, we discuss the issues of information semantics and granularity that arise in the design of highly interactive mixed-media information systems and present a general, flexible information server that meets the requirements of publishing information on different output channels (cross-media publishing). Specifically, we introduce the iServer framework as a generic link management and extensible integration platform and digitally augmented paper is presented as one specific application of the iServer technology. A case study shows how cross-media publishers could profit from using more elaborate information systems and some of the authoring issues of mixed-media information environments are discussed. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
11. Electricity behaviors anomaly detection based on multi-feature fusion and contrastive learning.
- Author
-
Guan, Yongming, Shi, Yuliang, Wang, Gang, Zhang, Jian, Wang, Xinjun, Chen, Zhiyong, and Li, Hui
- Subjects
- *
MULTISENSOR data fusion , *CONSUMPTION (Economics) , *ELECTRICITY , *DATA quality - Abstract
Abnormal electricity usage detection is the process of discovering and diagnosing abnormal electricity usage behavior by monitoring and analyzing the electricity usage in the power system. How to improve the accuracy of anomaly detection is a popular research topic. Most studies use neural networks for anomaly detection, but ignore the effect of missing electricity data on anomaly detection performance. Missing value completion is an important method to improve the quality of electricity data and to optimize the anomaly detection performance. Moreover, most studies have ignored the potential correlation relationship between spatial features by modeling the temporal features of electricity data. Therefore, this paper proposes an electricity anomaly detection model based on multi-feature fusion and contrastive learning. The model integrates the temporal and spatial features to jointly accomplish electricity anomaly detection. In terms of temporal feature representation learning, an improved bi-directional LSTM is designed to achieve the missing value completion of electricity data, and combined with CNN to capture the electricity consumption behavior patterns in the temporal data. In terms of spatial feature representation learning, GCN and Transformer are used to fully explore the complex correlation relationships among data. In addition, in order to improve the performance of anomaly detection, this paper also designs a gated fusion module and combines the idea of contrastive learning to strengthen the representation ability of electricity data. Finally, we demonstrate through experiments that the method proposed in this paper can effectively improve the performance of electricity behavior anomaly detection. • Integrating temporal and spatial features to jointly accomplish electricity anomaly detection. • Designing a gated fusion module to capture the dependencies between multi-features. • Introducing contrastive learning to enhance the representation ability of electricity data. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. Making cyber-human systems smarter.
- Author
-
Alter, Steven
- Subjects
- *
SYSTEMS engineering , *CITIES & towns , *INFORMATION storage & retrieval systems , *DESIGN services , *INFORMATION processing - Abstract
The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. PathEL: A novel collective entity linking method based on relationship paths in heterogeneous information networks.
- Author
-
Zu, Lizheng, Lin Lin, Fu, Song, Liu, Jie, Suo, Shiwei, He, Wenhui, Wu, Jinlei, and Lv, Yancheng
- Subjects
- *
TIME complexity , *INFORMATION networks , *ELECTRONIC data processing , *PROBLEM solving - Abstract
Collective entity linking always outperforms independent entity linking because it considers the interdependencies among entities. However, the existing collective entity linking methods often have high time complexity, do not fully utilize the relationship information in heterogeneous information networks (HIN) and most of them are largely dependent on the special features associated with Wikipedia. Based on the above problems, this paper proposes a novel collective entity linking method based on relationship path in heterogeneous information networks (PathEL). The PathEL classifies complex relationships in HIN into 1-hop paths and 3 types of 2-hop paths, and measures entity correlation by the path information among entities, ultimately combining textual semantic information to realize collective entity linking. In addition, facing the high complexity of collective entity linking, this paper proposes to solve the problem by combining the variable sliding window data processing method and the two-step pruning strategy. The variable sliding window data processing method limits the number of entity mentions in each window and the pruning strategy reduces the number of candidate entities. Finally, the experimental results of three benchmark datasets verify that the model proposed in this paper performs better in entity linking than the baseline models. On the AIDA CoNLL dataset, compared to the second-ranked model, our model has improved P, R, and F1 scores by 1.61%, 1.54%, and 1.57%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Heart failure prognosis prediction: Let's start with the MDL-HFP model.
- Author
-
Ma, Huiting, Li, Dengao, Fu, Jian, Zhao, Guiji, and Zhao, Jumin
- Subjects
- *
PROGNOSTIC models , *HEART failure , *FEATURE selection , *HEART diseases , *BRANCHING processes - Abstract
Heart failure, as a critical symptom or terminal stage of assorted heart diseases, is a world-class public health problem. Establishing a prognostic model can help identify high dangerous patients, save their lives promptly, and reduce medical burden. Although integrating structured indicators and unstructured text for complementary information has been proven effective in disease prediction tasks, there are still certain limitations. Firstly, the processing of single branch modes is easily overlooked, which can affect the final fusion result. Secondly, simple fusion will lose complementary information between modalities, limiting the network's learning ability. Thirdly, incomplete interpretability can affect the practical application and development of the model. To overcome these challenges, this paper proposes the MDL-HFP multimodal model for predicting patient prognosis using the MIMIC-III public database. Firstly, the ADASYN algorithm is used to handle the imbalance of data categories. Then, the proposed improved Deep&Cross Network is used for automatic feature selection to encode structured sparse features, and implicit graph structure information is introduced to encode unstructured clinical notes based on the HR-BGCN model. Finally, the information of the two modalities is fused through a cross-modal dynamic interaction layer. By comparing multiple advanced multimodal deep learning models, the model's effectiveness is verified, with an average F1 score of 90.42% and an average accuracy of 90.70%. The model proposed in this paper can accurately classify the readmission status of patients, thereby assisting doctors in making judgments and improving the patient's prognosis. Further visual analysis demonstrates the usability of the model, providing a comprehensive explanation for clinical decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. TRGST: An enhanced generalized suffix tree for topological relations between paths.
- Author
-
Quijada-Fuentes, Carlos, Rodríguez, M. Andrea, and Seco, Diego
- Subjects
- *
PUBLIC transit , *SUFFIXES & prefixes (Grammar) , *SCALABILITY , *ALGORITHMS , *INSPIRATION - Abstract
This paper introduces the TRGST data structure, which is designed to handle queries related to topological relations between paths represented as sequences of stops in a network. As an example, these paths could correspond to stops on a public transport network, and a query of interest is to retrieve paths that share at least k consecutive stops. While topological relations among spatial objects have received extensive attention, the efficient processing of these relations in the context of trajectory paths, considering both time and space efficiency, remains a relatively less explored domain. Taking inspiration from pattern matching implementations, the TRGST data structure is constructed on the foundation of the Generalized Suffix Tree. Its purpose is to provide a compact representation of a set of paths and to efficiently handle topological relation queries by leveraging the pattern search capabilities inherent in this structure. The paper provides a detailed account of the structure and algorithms of TRGST , followed by a performance analysis utilizing both real and synthetic data. The results underscore the remarkable scalability of the TRGST in terms of both query time and space utilization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. ArZiGo: A recommendation system for scientific articles.
- Author
-
Pinedo, Iratxe, Larrañaga, Mikel, and Arruarte, Ana
- Subjects
- *
RECOMMENDER systems , *HYBRID systems , *OPEN scholarship , *COMPUTER science , *INTERNET searching - Abstract
The large number of scientific publications around the world is increasing at a rate of approximately 4%–5% per year. This fact has resulted in the need for tools that deal with relevant and high-quality publications. To address this necessity, search and reference management tools that include some recommendation algorithms have been developed. However, many of these solutions are proprietary tools and the full potential of recommender systems is rarely exploited. There are some solutions which provide recommendations for specific domains, by using ad-hoc resources. Furthermore, some other systems do not consider any personalization strategy to generate the recommendations. This paper presents ArZiGo , a web-based full prototype system for the search, management, and recommendation of scientific articles, which feeds on the Semantic Scholar Open Research Corpus, a corpus that is growing continually with more than 190M papers from all fields of science so far. ArZiGo combines different recommendation approaches within a hybrid system, in a configurable way, to recommend those papers that best suit the preferences of the users. A group of 30 human experts has participated in the evaluation of 500 recommendations in 10 research areas, 7 of which belong to the area of Computer Science and 3 to the area of Medicine, obtaining quite satisfactory results. Besides the appropriateness of the articles recommended, the execution time of the implemented algorithms has also been analyzed. • A web system for the search, management, and recommendation of scientific articles. • A hybrid and multidisciplinary scientific article recommendation system. • A modular and scalable recommendation system. • Use of the Semantic Scholar Open Research Corpus as a corpus of articles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Secure multi-dimensional data retrieval with access control and range query in the cloud.
- Author
-
Mei, Zhuolin, Yu, Jin, Zhang, Caicai, Wu, Bin, Yao, Shimao, Shi, Jiaoli, and Wu, Zongda
- Subjects
- *
ACCESS control , *INFORMATION retrieval , *DATA encryption , *DATA security - Abstract
Outsourcing data to the cloud offers various advantages, such as improved reliability, enhanced flexibility, accelerated deployment, and so on. However, data security concerns arise due to potential threats such as malicious attacks and internal misuse of privileges, resulting in data leakage. Data encryption is a recognized solution to address these issues and ensure data confidentiality even in the event of a breach. However, encrypted data presents challenges for common operations like access control and range queries. To address these challenges, this paper proposes Secure Multi-dimensional Data Retrieval with Access Control and Range Search in the Cloud (SMDR). In this paper, we propose SMDR policy, which supports both access control and range queries. The design of the SMDR policy cleverly utilizes the minimum and maximum points of buckets, enabling the SMDR policy is highly appropriate for supporting range queries on multi-dimensional data. Additionally, we have made modifications to Ciphertext Policy-Attribute Based Encryption (CP-ABE) to enable effective integration with the SMDR policy, and then constructed a secure index using the SMDR policy and CP-ABE. By utilizing the secure index, access control and range queries can be effectively supported over the encrypted multi-dimensional data. To evaluate the efficiency of SMDR, extensive experiments have been conducted. The experimental results demonstrate the effectiveness and suitability of SMDR in handling encrypted multi-dimensional data. Additionally, we provide a detailed security analysis of SMDR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Process Query Language: Design, Implementation, and Evaluation.
- Author
-
Polyvyanyy, Artem, ter Hofstede, Arthur H.M., La Rosa, Marcello, Ouyang, Chun, and Pika, Anastasiia
- Subjects
- *
PROGRAMMING languages , *BUSINESS process management , *STRUCTURAL models - Abstract
Organizations can benefit from the use of practices, techniques, and tools from the area of business process management. Through the focus on processes, they create process models that require management, including support for versioning, refactoring and querying. Querying thus far has primarily focused on structural properties of models rather than on exploiting behavioral properties capturing aspects of model execution. While the latter is more challenging, it is also more effective, especially when models are used for auditing or process automation. The focus of this paper is to overcome the challenges associated with behavioral querying of process models in order to unlock its benefits. The first challenge concerns determining decidability of the building blocks of the query language, which are the possible behavioral relations between process tasks. The second challenge concerns achieving acceptable performance of query evaluation. The evaluation of a query may require expensive checks in all process models, of which there may be thousands. In light of these challenges, this paper proposes a special-purpose programming language, namely Process Query Language (PQL) for behavioral querying of process model collections. The language relies on a set of behavioral predicates between process tasks, whose usefulness has been empirically evaluated with a pool of process model stakeholders. This study resulted in a selection of the predicates to be implemented in PQL, whose decidability has also been formally proven. The computational performance of the language has been extensively evaluated through a set of experiments against two large process model collections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Introduction to special issue with best papers from KDD 2002
- Author
-
Keim, Daniel and Koudas, Nick
- Published
- 2004
- Full Text
- View/download PDF
20. Introduction to special issue with best papers from CAiSE 2002
- Author
-
Mylopoulos, John and Woo, Carson
- Published
- 2004
- Full Text
- View/download PDF
21. Reproducible experiments with Learned Metric Index Framework.
- Author
-
Slanináková, Terézia, Antol, Matej, Ol'ha, Jaroslav, Dohnal, Vlastislav, Ladra, Susana, and Martínez-Prieto, Miguel A.
- Subjects
- *
MACHINE learning , *METRIC spaces , *LIBRARY software , *SOURCE code - Abstract
This work is a companion reproducible paper of a previous paper (Antol et al., 2021) in which we presented an alternative to the traditional paradigm of similarity searching in metric spaces called the Learned Metric Index. Inspired by the advance in learned indexing of structured data, we used machine learning models to replace index pivots, thus posing similarity search as a classification problem. This implementation proved to be more than competitive with the conventional methods in terms of speed and recall, proving the concept as viable. The aim of this publication is to make our source code, datasets, and experiments publicly available. For this purpose, we create a collection of Python3 software libraries, YAML reproducible experiment files, and JSON ground-truth files, all bundled in a Docker image – the Learned Metric Index Framework (LMIF) – which can be run using any Docker-compatible operating system on a CPU with Advanced vector extensions (AVX). We introduce a reproducibility protocol for our experiments using LMIF and provide a closer look at the experimental process. We introduce new experimental results by running the reproducibility protocol introduced herein and discussing the differences with the results reported in our primary work (Antol et al., 2021). Finally, we make an argument that these results can be considered weakly reproducible (in both of the performance metrics), since they point to the same conclusions derived in the primary paper. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Introduction to special issue with best papers from EDBT 2002
- Author
-
Jensen, Christian S.
- Published
- 2003
- Full Text
- View/download PDF
23. Big data analytics deep learning techniques and applications: A survey.
- Author
-
Selmy, Hend A., Mohamed, Hoda K., and Medhat, Walaa
- Subjects
- *
DEEP learning , *BIG data , *IMAGE recognition (Computer vision) , *MACHINE learning - Abstract
• This paper provides an in-depth review of the latest deep learning methods for use in big data analytics. • Explain the importance of deep learning, its taxonomy, and big data analytics techniques. • Explores deep learning approaches in IoT data applications, including their complexities and limitations. • Suggests using deep learning techniques in many data-intensive applications and benchmarked frameworks and datasets. • A comparison of established approaches with deep learning methods in big data analytics is offered. Deep learning (DL), as one of the most active machine learning research fields, has achieved great success in numerous scientific and technological disciplines, including speech recognition, image classification, language processing, big data analytics, and many more. Big data analytics (BDA), where raw data is often unlabeled or uncategorized, can greatly benefit from DL because of its ability to analyze and learn from enormous amounts of unstructured data. This survey paper tackles a comprehensive overview of state-of-the-art DL techniques applied in BDA. The main target of this survey is intended to illustrate the significance of DL and its taxonomy and detail the basic techniques used in BDA. It also explains the DL techniques used in big IoT data applications as well as their various complexities and challenges. The survey presents various real-world data-intensive applications where DL techniques can be applied. In particular, it concentrates on the DL techniques in accordance with the BDA type for each application domain. Additionally, the survey examines DL benchmarked frameworks used in BDA and reviews the available benchmarked datasets, besides analyzing the strengths and limitations of each DL technique and their suitable applications. Further, a comparative analysis is also presented by comparing existing approaches to the DL methods used in BDA. Finally, the challenges of DL modeling and future directions are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Reproducible experiments for generating pre-processing pipelines for AutoETL.
- Author
-
Giovanelli, Joseph, Bilalli, Besim, Abelló, Alberto, Silva-Coira, Fernando, and de Bernardo, Guillermo
- Subjects
- *
MACHINE learning , *SUPERVISED learning , *SOFTWARE development tools , *EVALUATION methodology - Abstract
This work is a companion reproducibility paper of the experiments and results reported in Giovanelli et al. (2022), where data pre-processing pipelines are evaluated in order to find pipeline prototypes that reduce the classification error of supervised learning algorithms. With the recent shift towards data-centric approaches, where instead of the model, the dataset is systematically changed for better model performance, data pre-processing is receiving a lot of attention. Yet, its impact over the final analysis is not widely recognized, primarily due to the lack of publicly available experiments that quantify it. To bridge this gap, this work introduces a set of reproducible experiments on the impact of data pre-processing by providing a detailed reproducibility protocol together with a software tool and a set of extensible datasets, which allow for all the experiments and results of our aforementioned work to be reproduced. We introduce a set of strongly reproducible experiments based on a collection of intermediate results, and a set of weakly reproducible experiments (Lastra-Dıaz, 0000) that allows reproducing our end-to-end optimization process and evaluation of all the methods reported in our primary paper. The reproducibility protocol is created in Docker and tested in Windows and Linux. In brief, our primary work (i) develops a method for generating effective prototypes, as templates or logical sequences of pre-processing transformations, and (ii) instantiates the prototypes into pipelines, in the form of executable or physical sequences of actual operators that implement the respective transformations. For the first, a set of heuristic rules learned from extensive experiments are used, and for the second techniques from Automated Machine Learning (AutoML) are applied. • Assess the impact of data pre-processing on classification tasks. • Generate pre-processing pipelines that improve the classification accuracy. • Strongly reproduce the experiments that use intermediate results in the optimization. • Weakly reproduce the end-to-end experiments, without reusing intermediate results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. DOLAP data warehouse research over two decades: Trends and challenges.
- Author
-
Wrembel, Robert, Abelló, Alberto, and Song, Il-Yeol
- Subjects
- *
DATA warehousing , *INFORMATION storage & retrieval systems , *DATA integration - Abstract
This paper introduces to the Information Systems special issue, including the four best papers submitted to DOLAP 2018. Additionally, the 20th anniversary of DOLAP motivated the analysis of DOLAP topics, as follows. First, the recent 5-years DOLAP topics were confronted with those of VLDB, SIGMOD, and ICDE. Next, the DOLAP topics were analyzed within its 20 years of history. Finally, the analysis is concluded with the list of the most frequent research topics of the aforementioned conferences and still open research problems. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
26. Approximate OLAP of document-oriented databases: A variety-aware approach.
- Author
-
Gallinucci, Enrico, Golfarelli, Matteo, and Rizzi, Stefano
- Subjects
- *
BIG data , *DATABASES , *RELATIONAL databases , *INFORMATION resources , *DEFINITIONS - Abstract
Schemaless databases, and document-oriented databases in particular, are preferred to relational ones for storing heterogeneous data with variable schemas and structural forms. However, the absence of a unique schema adds complexity to analytical applications, in which a single analysis often involves large sets of data with different schemas. In this paper we propose an original approach to OLAP on collections stored in document-oriented databases. The basic idea is to stop fighting against schema variety and welcome it as an inherent source of information wealth in schemaless sources. Our approach builds on four stages: schema extraction, schema integration, FD enrichment, and querying; these stages are discussed in detail in the paper. To make users aware of the impact of schema variety, we propose a set of indicators inspired by the definition of attribute density. Finally, we experimentally evaluate our approach in terms of efficiency and effectiveness. • The inherent variety of documents hinders proper OLAP analyses. • We propose an approximated OLAP approach that captures and exploits schema variety. • A multidimensional view is given by detecting approximated functional dependencies. • We propose indicators to predict and evaluate the quality of OLAP queries. • We show that the approach improves the coverage and precision of OLAP queries. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. Process-related user interaction logs: State of the art, reference model, and object-centric implementation.
- Author
-
Abb, Luka and Rehse, Jana-Rebecca
- Subjects
- *
ROBOTIC process automation , *USER interfaces , *ACQUISITION of data - Abstract
User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in such a log represents an interaction between the user and the interface, such as clicking a button, ticking a checkbox, or typing into a text field. UI logs are used in many different application contexts for purposes such as usability analysis, task mining, or robotic process automation (RPA). However, UI logs suffer from a lack of standardization. Each research study and processing tool relies on a different conceptualization and implementation of the elements and attributes of user interactions. This exacerbates or even prohibits the integration of UI logs from different sources or the combination of UI data collection tools with downstream analytics or automation solutions. In this paper, our objective is to address this issue and facilitate the exchange and analysis of UI logs in research and practice. Therefore, we first review process-related UI logs in scientific publications and industry tools to determine commonalities and differences between them. Based on our findings, we propose a universally applicable reference data model for process-related UI logs, which includes all core attributes but remains flexible regarding the scope, level of abstraction, and case notion. Finally, we provide exemplary implementations of the reference model in XES and OCED. • We review existing user interaction logs to find common attributes and differences. • We develop a reference data model that includes the core attributes of UI logs. • We provide implementations in XES and two Object-Centric Event Data formats. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads.
- Author
-
Shioi, Takamitsu, Kambayashi, Takashi, Arakawa, Suguru, Kurosawa, Ryoji, Hikida, Satoshi, and Yokota, Haruo
- Subjects
- *
ONLINE data processing , *OLAP technology - Abstract
This paper proposes Read-Safe Snapshots (RSS), a concurrency control method that ensures reading the latest serializable version on multiversion concurrency control (MVCC) for read-only transactions without creating any serializability anomaly, thereby enhancing the transaction processing throughput under mixed workloads of online transactional processing (OLTP) and online analytical processing (OLAP). Ensuring serializability for data consistency between OLTP and OLAP is vital to prevent OLAP from obtaining nonserializable results. Existing serializability methods achieve this consistency by making OLTP or OLAP transactions aborts or waits, but these can lead to throughput degradation when implemented for large read sets in read-only OLAP transactions under mixed workloads of the recent real-time analysis applications. To deal with this problem, we present an RSS construction algorithm that does not affect the conventional OLTP performance and simultaneously avoids producing additional aborts and waits. Moreover, the RSS construction method can be easily applied to the read-only replica of a multinode system as well as a single-node system because no validation for serializability is required. Our experimental findings showed that RSS could prevent read-only OLAP transactions from creating anomaly cycles under a multinode environment of master-copy replication, which led to the achievement of serializability with the low overhead of about 15% compared to baseline OLTP/OLAP throughputs under snapshot isolation (SI). The OLTP throughput under our proposed method in a mixed OLTP/OLAP workload was about 45% better than SafeSnapshots, a serializable snapshot isolation (SSI) equipped with a read-only optimization method, and did not degrade the OLAP throughput. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Efficiently using contextual influence to recommend new items to ephemeral groups.
- Author
-
Quintarelli, Elisa, Rabosio, Emanuele, and Tanca, Letizia
- Subjects
- *
RECOMMENDER systems , *INFLUENCE - Abstract
Group recommender systems suggest items to groups of users that want to utilize those items together. These systems can support several activities that can be performed together with other people and are typically social, like watching TV or going to the restaurant. In this paper we study ephemeral groups, i.e., groups constituted by users who are together for the first time, and for which therefore there is no history of past group activities. Recent works have studied ephemeral group recommendations proposing techniques that learn complex models of users and items. These techniques, however, are not appropriate to recommend items that are new in the system, while we propose a method able to deal with new items too. Specifically, our technique determines the preference of a group for a given item by combining the individual preferences of the group members on the basis of their contextual influence , the contextual influence representing the ability of an individual, in a given situation, to guide the group's decision. Moreover, while many works on recommendations do not consider the problem of efficiently producing recommendation lists at runtime, in this paper we speed up the recommendation process by applying techniques conceived for the top- K query processing problem. Finally, we present extensive experiments, evaluating: (i) the accuracy of the recommendations, using a real TV dataset containing a log of viewings performed by real groups, and (ii) the efficiency of the online recommendation task, exploiting also a bigger partially synthetic dataset. • An approach to recommend new items to ephemeral groups is proposed. • Our strategy relies on contextual influence. • Contextual influence is the ability of a user, in a given context, to guide the group's decision. • An algorithm to efficiently generate recommendation lists at runtime is also presented. • Accuracy and efficiency are experimentally evaluated using a real TV dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. A modular approach to the specification and management of time duration constraints in BPMN.
- Author
-
Combi, Carlo, Oliboni, Barbara, and Zerbato, Francesca
- Subjects
- *
PETRI nets , *BUSINESS process management , *TECHNICAL specifications , *TIME management - Abstract
The modeling and management of business processes deals with temporal aspects both in the inherent representation of activity coordination and in the specification of activity properties and constraints. In this paper, we address the modeling and specification of constraints related to the duration of process activities. In detail, we consider the Business Process Model and Notation (BPMN) standard and propose an approach to define re-usable duration-aware process models that make use of existing BPMN elements for representing different nuances of activity duration at design time. Moreover, we show how advanced event-handling techniques may be exploited for detecting the violation of duration constraints during the process run-time. The set of process models specified in this paper suitably captures duration constraints at different levels of abstraction, by allowing designers to specify the duration of atomic tasks and of selected process regions in a way that is conceptually and semantically BPMN-compliant. Without loss of generality, we refer to real-world clinical working environments to exemplify our approach, as their intrinsic complexity makes them a particularly challenging and rewarding application environment. • We represent different kinds of duration constraints through re-usable BPMN process models. • The presented processes provide a clear conceptualization of duration-aware process activities. • A formal description of the proposed patterns is provided along with real-world motivating examples. • Being fully BPMN-compliant, the proposed approach benefits of existing tool support. • The soundness of the obtained process models is verified by mapping them to time Petri nets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. On the reproducibility of experiments of indexing repetitive document collections.
- Author
-
Fariña, Antonio, Martínez-Prieto, Miguel A., Claude, Francisco, Navarro, Gonzalo, Lastra-Díaz, Juan J., Prezza, Nicola, and Seco, Diego
- Subjects
- *
SOURCE code , *SYSTEMS on a chip , *COLLECTIONS - Abstract
This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections) , that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package. • We summarize the original results and motivate the proposed experimental setup. • We explain the replication framework, including datasets, query patterns, source code and scripts. • We detail all configuration parameters for each solution, explaining the better configurations. • We host the framework at GitHub, and publish it through Mendeley Data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. From BPMN process models to DMN decision models.
- Author
-
Bazhenova, Ekaterina, Zerbato, Francesca, Oliboni, Barbara, and Weske, Mathias
- Subjects
- *
FLOW control (Data transmission systems) , *BUSINESS process management , *GAS separation membranes - Abstract
The interplay between process and decision models plays a crucial role in business process management, as decisions may be based on running processes and affect process outcomes. Often process models include decisions that are encoded through process control flow structures and data flow elements, thus reducing process model maintainability. The Decision Model and Notation (DMN) was proposed to achieve separation of concerns and to possibly complement the Business Process Model and Notation (BPMN) for designing decisions related to process models. Nevertheless, deriving decision models from process models remains challenging, especially when the same data underlie both process and decision models. In this paper, we explore how and to which extent the data modeled in BPMN processes and used for decision-making may be represented in the corresponding DMN decision models. To this end, we identify a set of patterns that capture possible representations of data in BPMN processes and that can be used to guide the derivation of decision models related to existing process models. Throughout the paper we refer to real-world healthcare processes to show the applicability of the proposed approach. • A set of BPMN patterns characterizing process-related data used for decision-making. • Derive a DMN decision model from the data perspective a BPMN process model. • Support decision analysts in identifying process data relevant for decision-making. • Improve understanding of integrated process and decision models through shared data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
33. Approaches and challenges of privacy preserving search over encrypted data.
- Author
-
Siva Kumar, D.V.N. and Santhi Thilagam, P.
- Subjects
- *
ENCRYPTION protocols , *DATA encryption , *BOOLEAN searching , *PRIVACY , *MAINTENANCE costs - Abstract
Abstract More and more data owners are encouraged to outsource their data onto cloud servers for reducing infrastructure, maintenance cost and also to get ubiquitous access to their stored data. However, security is one issue that discourages data owners from adopting cloud servers for data storage. Searchable Encryption (SE) is one of the few ways of assuring privacy and confidentiality of such data by storing them in encrypted form at the cloud servers. SE enables the data owners and users to search over encrypted data through trapdoors. Most of the user information requirements are fulfilled either through Boolean or Ranked search approaches. This paper aims at understanding how the confidentiality and privacy of information can be guaranteed while processing single and multi-keyword queries over encrypted data using Boolean and Ranked search approaches. This paper presents all possible leakages that happen in SE and also specifies which privacy preserving approach to be adopted in SE schemes to prevent those leakages to help the practitioners and researchers to design and implement secure searchable encryption systems. It also highlights various application scenarios where SE could be utilized. This paper also explores the research challenges and open problems that need to be focused in future. Highlights • The security requirements of Searchable Encryption (SE) approaches. • The possible information leakages in SE and the defensive approaches. • Security analysis of both Boolean and Ranked search approaches. • Research challenges for developing deployable solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. Knowledge triple mining via multi-task learning.
- Author
-
Zhang, Zhao, Zhuang, Fuzhen, Li, Xuebing, Niu, Zheng-Yu, He, Jia, He, Qing, and Xiong, Hui
- Subjects
- *
EXPERT systems , *INFORMATION storage & retrieval systems , *ALGORITHMS , *LEARNING , *CLUSTER analysis (Statistics) - Abstract
Abstract Recent years have witnessed the rapid development of knowledge bases (KBs) such as WordNet, Yago and DBpedia, which are useful resources in AI-related applications. However, most of the existing KBs are suffering from incompleteness and manually adding knowledge into KBs is inefficient. Therefore, automatically mining knowledge becomes a critical issue. To this end, in this paper, we propose to develop a model (S 2 AMT) to extract knowledge triples, such as < Barack Obama, wife, Michelle Obama > , from the Internet and add them to KBs to support many downstream applications. Particularly, because the seed instances 1 1 In this paper, seed instances refer to labeled positive instances. for every relation is difficult to obtain, our model is capable of mining knowledge triples with limited available seed instances. To be more specific, we treat the knowledge triple mining task for each relation as a single task and use multi-task learning (MTL) algorithms to solve the problem, because MTL algorithms can often get better results than single-task learning (STL) ones with limited training data. Moreover, since finding proper task groups is a fatal problem in MTL which can directly influences the final results, we adopt a clustering algorithm to find proper task groups to further improve the performance. Finally, we conduct extensive experiments on real-world data sets and the experimental results clearly validate the performance of our MTL algorithms against STL ones. Highlights • We propose S 2 AMT to solve the problem of KTM with limited seed instances. • Our framework obtain better performance using MTL methods. • Our framework jointly use labeled and unlabeled instances during the training stage. • We give a fast method to find related tasks to further improve the performance. • Our work provides a new perspective for KTM when having limited seed instances. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. A framework for modeling, executing, and monitoring hybrid multi-process specifications with bounded global–local memory.
- Author
-
Alman, Anti, Maggi, Fabrizio Maria, Montali, Marco, Patrizi, Fabio, and Rivkin, Andrey
- Subjects
- *
BUSINESS process modeling , *COMORBIDITY , *MULTISCALE modeling , *ROWING - Abstract
So far, approaches for business process modeling, enactment and monitoring have mainly been based on process specifications consisting of a single process model. This setting aptly captures monolithic scenarios from domains in which all possible behaviors can be folded into a single model. However, the same strategy cannot be applied to domains where multiple interacting (procedural) processes simultaneously work over the same objects, in the presence of additional (declarative) constraints relating activities from the same or different processes. A relevant example for this setting is that of healthcare, where co-morbid patients may be subject to multiple clinical pathways at once, in the presence of additional, general constraints capturing basic medical knowledge. To fill this gap, we have previously presented the M3 Framework and an accompanying monitoring technique, which allows for a hybrid representation of a process using both procedural and declarative models, and supports the modular creation of multi-process specifications where domain experts can focus on specific procedures and domain constraints without being forced to merge them into one single specification. In this paper, we make significant extensions to this framework, allowing us to go from simple toy examples towards addressing practical real-life scenarios. We achieve this by introducing a richer form of integration between the interacting process components, in particular supporting asynchronous and synchronous activities that may operate over local and global (shared) data variables. This is framed by a discussion of the business meaning of these concepts, the introduction of the corresponding modeling patterns, and the application of our approach to real-life business processes, the latter being the driving-force behind this paper. • A new model for data-aware, hybrid multi-process specifications (HMPS). • HMPS enable modular business processes with procedural and declarative components. • Synchronous and asynchronous semantics for both activities and variables in HMPS. • Formalization of HPMS semantics in the context of process enactment and monitoring. • HMPS modeling patterns and their application to real life business processes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. A large reproducible benchmark on text classification for the legal domain based on the ECHR-OD repository.
- Author
-
Quemy, Alexandre, Wrembel, Robert, Łopuszyńska, Natalia, Papadakis, George, and Delgado, Agustín D.
- Subjects
- *
WEB-based user interfaces , *LEGAL documents , *SCIENTIFIC community , *MACHINE learning , *CLASSIFICATION algorithms , *LEGAL judgments - Abstract
This work is a companion reproducible paper of our experiments and results reported in a previous work Quemy and Wrembel (2022) introducing an open repository of legal documents, called ECHR-OD , together with a large benchmark of Machine Learning (ML) methods for text classification. Machine Learning (ML) algorithms are used in various domains, including banking, healthcare, manufacturing, energy management, security, trade or insurance. However, building reliable ML models is challenging. First, because in order to build prediction models by ML algorithms, massive amounts of pre-processed data are needed, but in practice, such datasets are scarce or require a tremendous amount of time to be prepared. Second, because once a model is built, its performance needs to be assessed. To this end, benchmarks are needed, but their availability is limited as well. Despite the fact that ML algorithms are used in multiple domains, their application to the legal domain so far has received little attention from research communities. This fact motivated us to run a project to build and make available an open repository called the European Court of Human Rights Open Data (ECHR-OD) of judgment documents. In this paper, we describe a step-by-step Extract, Transform, and Load (ETL) process, supported with code snippets, for building ECHR-OD , so that it can be easily reproduced. The process produces (almost) exhaustive datasets that have been transformed, homogenized, re-organized, cleaned beforehand, and made available in a suitable format for ML algorithms. The ECHR-OD repository makes available tabular descriptive features as well as features extracted from natural language documents, accessible via a web user interface. Moreover, we provide a self-contained and easily reproducible set of experiments assessing ML classification algorithms on the content of the ECHR-OD repository. To the best of our knowledge, the ETL process and the set of experiments form the first fully end-to-end, from ingesting and pre-processing legal documents to obtaining high quality ML models, open, and reproducible benchmark on the prediction of the European Court of Human Rights judgments. Both components, the ETL and the experiments, leverage Docker for reproducibility. The content of this paper weakly reproduces the original results and provides a new weakly reproducible set of experiments. • We predict the European Court of Human Rights' decisions using previous judgments. • The experiments are fully reproducible, including the dataset generation. • We establish robust baselines using 12 machine learning algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Process fragments discovery from emails: Functional, data and behavioral perspectives discovery.
- Author
-
Elleuch, Marwa, Ismaili, Oumaima Alaoui, Laga, Nassim, and Gaaloul, Walid
- Subjects
- *
PROCESS mining , *EMAIL systems , *EMAIL , *SCHEDULING , *INFORMATION storage & retrieval systems , *DATA logging - Abstract
Significant research work has been conducted in the area of business process (BP) mining leading to mature solutions for discovering process knowledge. These solutions were generally limited to the analysis of structured event logs generated by BP management systems (BPMS). Given the recent spread of digital workplaces, there have been several initiatives to extend the scope of these analysis to consider other information systems (IS) supporting BP execution informally. More precisely, emailing systems have attracted much attention as they are widely used as alternative tools to collaboratively perform BP activities. However, due to the unstructured nature of email logs data, traditional process mining techniques could not be applied or at least not directly applied. Existing approaches that discovered BP from emails are usually supervised or at least require significant human intervention. They focused on discovering BP with respect to their behavioral perspective (i.e. that defines the conditions for activity execution) while neglecting the discovery of their data perspective (i.e. that defines the informational entities manipulated by BP activities). In addition, they did not studied how emailing systems are used in the context of BP executions. They assume that emailing systems are used in the same way employees use ordinary BPMS. However, employees actually use emails to perform poorly structured BP fragments (i.e. parts) rather than complete and well-structured ones. These BP fragments are not necessary defined in advance as in the case of BPMS. This induces the need to discover BP functional perspective (i.e. that defines what a BP performs and what are its activities). Furthermore, employees use emails with different purposes when talking about BP activities (e.g. information about activity execution, request or planning activity execution, etc.). This results in the occurrence of new event types referring to the purpose of considering activities in emails rather than events referring only to their execution. In this paper, we propose to discover BP from email logs with respect to their functional, data and behavioral perspectives. The paper first formalizes these perspectives. Then, it introduces a completely non-supervised approach for discovering them based on: (i) speech act detection for recognizing the purposes of considering activities in emails, (ii) overlapping clustering of activities to discover their manipulated artifacts (i.e. informational entities), (iii) overlapping clustering of BP elements (i.e. activities, artifacts and activity actors) to discover BP fragments and, (iv) mining sequencing constraints between event types deduced from activities and speech acts to discover behavioral perspective. Our approach is finally validated using the public email dataset Enron. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Business process simulation: Probabilistic modeling of intermittent resource availability and multitasking behavior.
- Author
-
López-Pintado, Orlenys and Dumas, Marlon
- Subjects
- *
PROCESS mining , *CALENDAR , *SIMULATION methods & models , *PROBABILITY theory , *ALGORITHMS - Abstract
In business process simulation, resource availability is typically modeled by assigning a calendar to each resource, e.g., Monday–Friday, 9:00–18:00. Resources are assumed to be always available during each time slot in their availability calendar. This assumption often becomes invalid due to interruptions, breaks, or time-sharing across processes. In other words, existing approaches fail to capture intermittent availability. Another limitation of existing approaches is that they either do not consider multitasking behavior, or if they do, they assume that resources always multitask (up to a maximum capacity) whenever available. However, studies have shown that the multitasking patterns vary across days. This paper introduces a probabilistic approach to model resource availability and multitasking behavior for business process simulation. In this approach, each time slot in a resource calendar has an associated availability probability and a multitasking probability per multitasking level. For example, a resource may be available on Fridays between 14:00–15:00 with 90% probability, and given that they are performing one task during this slot, they may take on a second concurrent task with 60% probability. We propose algorithms to discover probabilistic calendars and probabilistic multitasking capacities from event logs. An evaluation shows that, with these enhancements, simulation models discovered from event logs better replicate the distribution of activities and cycle times, relative to approaches with crisp calendars and monotasking assumptions. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
39. Temporal graph processing in modern memory hierarchies.
- Author
-
Baumstark, Alexander, Jibril, Muhammad Attahir, and Sattler, Kai-Uwe
- Subjects
- *
MEMORY , *STORAGE , *TIME - Abstract
Updates in graph DBMS lead to structural changes in the graph over time with different intermediate states. Capturing these changes and their time is one of the main purposes of temporal DBMS. Most DBMSs built their temporal features based on their non-temporal processing and storage without considering the memory hierarchy of the underlying system. This leads to slower temporal processing and poor storage utilization. In this paper, we propose a storage and processing strategy for (bi-) temporal graphs using temporal materialized views (TMV) while exploiting the memory hierarchy of a modern system. Further, we show a solution to the query containment problem for certain types of temporal graph queries. Finally, we evaluate the overhead and performance of the presented approach. The results show that using TMV reduces the runtime of temporal graph queries while using less memory. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
40. Proactive conformance checking: An approach for predicting deviations in business processes.
- Author
-
Grohs, Michael, Pfeiffer, Peter, and Rehse, Jana-Rebecca
- Subjects
- *
MACHINE learning , *PROCESS mining , *REGULATORY compliance , *LEARNING strategies - Abstract
Modern business processes are subject to an increasing number of external and internal regulations. Compliance with these regulations is crucial for the success of organizations. To ensure this compliance, process managers can identify and mitigate deviations between the predefined process behavior and the executed process instances by means of conformance checking techniques. However, these techniques are inherently reactive, meaning that they can only detect deviations after they have occurred. It would be desirable to detect and mitigate deviations before they occur, enabling managers to proactively ensure compliance of running process instances. In this paper, we propose Business Process Deviation Prediction (BPDP), a novel predictive approach that relies on a supervised machine learning model to predict which deviations can be expected in the future of running process instances. BPDP is able to predict individual deviations as well as deviation patterns. Further, it provides the user with a list of potential reasons for predicted deviations. Our evaluation shows that BPDP outperforms existing methods for deviation prediction. Following the idea of action-oriented process mining, BPDP thus enables process managers to prevent deviations in early stages of running process instances. • A new approach to predict individual deviations and deviation patterns. • Addresses challenge of label imbalance by undersampling the training data. • Addresses challenge of action orientation with weighted loss function. • Experimentally derives the best supervised machine learning strategy. • Demonstrates applicability by providing managers with information on non-conformity. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
41. Data Lakehouse: A survey and experimental study.
- Author
-
Harby, Ahmed A. and Zulkernine, Farhana
- Subjects
- *
DATA warehousing , *INFORMATION storage & retrieval systems , *DATA management , *BIG data , *SCIENTIFIC community - Abstract
Efficient big data management is a dire necessity to manage the exponential growth in data generated by digital information systems to produce usable knowledge. Structured databases, data lakes, and warehouses have each provided a solution with varying degrees of success. However, a new and superior solution, the data Lakehouse, has emerged to extract actionable insights from unstructured data ingested from distributed sources. By combining the strengths of data warehouses and data lakes, the data Lakehouse can process and merge data quickly while ingesting and storing high-speed unstructured data with post-storage transformation and analytics capabilities. The Lakehouse architecture offers the necessary features for optimal functionality and has gained significant attention in the big data management research community. In this paper, we compare data lake, warehouse, and lakehouse systems, highlight their strengths and shortcomings, identify the desired features to handle the evolving challenges in big data management and analysis and propose an advanced data Lakehouse architecture. We also demonstrate the performance of three state-of-the-art data management systems namely HDFS data lake, Hive data warehouse, and Delta lakehouse in managing data for analytical query responses through an experimental study. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
42. Tri-AL: An open source platform for visualization and analysis of clinical trials.
- Author
-
Nahed, Pouyan, Kambar, Mina Esmail Zadeh Nojoo, Taghva, Kazem, and Golab, Lukasz
- Subjects
- *
MACHINE learning , *ALZHEIMER'S disease , *DATA mining , *MEDICAL research personnel , *ONLINE databases , *MEDICAL equipment - Abstract
ClinicalTrials.gov hosts an online database with over 440,000 medical studies (as of 2023) evaluating drugs, supplements, medical devices, and behavioral treatments. Target users include scientists, medical researchers, pharmaceutical companies, and other public and private institutions. Although ClinicalTrials has some filtering ability, it does not provide visualization tools, reporting tools or historical data; only the most recent state of each trial is visible to users. To fill these functionality gaps, we present Tri-AL : an open-source data platform for clinical trial visualization, information extraction, historical analysis, and reporting. This paper describes the design and functionality of Tri-AL , including a programmable module to incorporate machine learning models and extract disease-specific data from unstructured trial reports, which we demonstrate using Alzheimer's disease reporting as a case study. We also highlight the use of Tri-AL for trial participation analysis in terms of sex, gender, race and ethnicity. The source code is publicly available at https://github.com/pouyan9675/Tri-AL. [Display omitted] • We present an open-source data platform to analyze and visualize clinical trial data. • We describe the design of our system. • We illustrate our system using Alzheimer's Disease trials and diversity reporting as case studies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
43. A universal approach for simplified redundancy-aware cross-model querying.
- Author
-
Koupil, Pavel, Crha, Daniel, and Holubová, Irena
- Subjects
- *
CATEGORIES (Mathematics) , *EXPRESSIVE language , *DATA management , *PROOF of concept , *ELECTRONIC data processing - Abstract
Numerous challenges and open problems have appeared with the dawn of multi-model data. In most cases, single-model solutions cannot be straightforwardly extended, and new, efficient approaches must be found. In addition, since there are no standards related to combining and managing multiple models, the situation is even more complicated and confusing for users. This paper deals with the most important aspect of data management — querying. To enable the user to grasp all the popular models, we base our solution on the abstract categorical representation of multi-model data, which can be viewed as a graph. To unify the querying of multi-model data, we enable the user to query the categorical graph using a SPARQL-based model-agnostic query language called MMQL. The query is then decomposed and translated into languages of the underlying systems. The intermediate results are then combined into the final categorical result that can be expressed in any selected format. The support for cross-model redundancy enables one to create distinct query plans and choose the optimal one. We also introduce a proof-of-concept implementation of our solution called MM-quecat. • The unifying abstract representation of various data simpli es the data processing. • Abstract multi-model query language shields the user from system-specific languages. • An abstract query language with expressive power similar to popular query languages. • Efficient planning of data transformations during multi-model query evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
44. An incremental algorithm for repairing denial constraint violations.
- Author
-
Bian, Lingfeng, Yang, Weidong, Xu, Ting, and Tan, Zijing
- Subjects
- *
DATA quality , *ALGORITHMS , *MOTIVATION (Psychology) , *INDEXING - Abstract
Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance I consistent with a set Σ of DCs, and a set △ I of tuple insertions to I , our aim is to find a set △ I ′ of tuple insertions such that Σ is satisfied on I + △ I ′. We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by △ I w.r.t. Σ , and further develop an efficient repairing algorithm to compute △ I ′ by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Unveiling the causes of waiting time in business processes from event logs.
- Author
-
Lashkevich, Katsiaryna, Milani, Fredrik, Chapela-Campa, David, Suvorau, Ihar, and Dumas, Marlon
- Subjects
- *
PROCESS mining , *SOFTWARE development tools , *UPLOADING of data - Abstract
Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times. • A process mining approach to discover the causes of waiting time from event logs. • The approach decomposes waiting time in each activity transition into direct causes. • 5 causes considered: batching, prioritization, contention, unavailability, extraneous. • The approach is embodied in a software tool. • The approach helps analysts identify improvement opportunities to reduce waiting time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. SFTe: Temporal knowledge graphs embedding for future interaction prediction.
- Author
-
Jia, Wei, Ma, Ruizhe, Niu, Weinan, Yan, Li, and Ma, Zongmin
- Subjects
- *
KNOWLEDGE graphs , *RECURRENT neural networks , *RECOMMENDER systems , *SOCIAL network analysis , *TIME-varying networks - Abstract
Interaction prediction is a crucial task in the Social Internet of Things (SIoT), serving diverse applications including social network analysis and recommendation systems. However, the dynamic nature of items, users, and their interactions over time poses challenges in effectively capturing and analyzing these changes. Existing interaction prediction models often overlook the temporal aspect and lack the ability to model multi-relational user-item interactions over time. To address these limitations, in this paper, we propose a S tructure, F acticity, and T emporal information preservation e mbedding model (SFTe) to predict future interaction. Our model leverages the advantages of Temporal Knowledge Graphs (TKGs) that can capture both the multi-relations and evolution. We begin by modeling user-item interactions over time by constructing a Temporal Interaction Knowledge Graph (TIKG). We then employ Structure Embedding (SE), Facticity Embedding (FE), and Temporal Embedding (TE) to capture topological structure, facticity consistency, and temporal dependence, respectively. In SE, we focus on preserving the first-order relationships to capture the topological structure of TIKG. In the FE component, given the distinct nature of SIoT, we introduce an attention mechanism to capture the effect of entities with the same additional information for generating subgraph embeddings. Lastly, TE utilizes recurrent neural networks to model the temporal dependencies among subgraphs and capture the evolving dynamics of the interactions over time. Experimental results on standard future interaction prediction demonstrate the superiority of the SFTe model compared with the state-of-the-art methods. Our model effectively addresses the challenges of time-aware interaction prediction, showcasing the potential of TKGs to enhance prediction performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. An efficient approach for discovering Graph Entity Dependencies (GEDs).
- Author
-
Liu, Dehua, Kwashie, Selasi, Zhang, Yidi, Zhou, Guangtong, Bewong, Michael, Wu, Xiaoying, Guo, Xi, He, Keqing, and Feng, Zaiwen
- Subjects
- *
PARALLEL algorithms , *DATA quality , *FACT checking , *DATA management , *SOCIAL networks - Abstract
Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs—finding a minimal cover of valid GEDs in a given graph data. We formalise the problem, and propose an effective and efficient approach to overcome major bottlenecks in GED discovery. In particular, we leverage existing graph partitioning algorithms to enable fast GED-scope discovery, and employ effective pruning strategies over the prohibitively large space of candidate dependencies. Furthermore, we define an interestingness measure for GEDs based on the minimum description length principle, to score and rank the mined cover set of GEDs. Finally, we demonstrate the scalability and effectiveness of our GED discovery approach through extensive experiments on real-world benchmark graph data sets; and present the usefulness of the discovered rules in different downstream data quality management applications. • A study of the discovery problem of Graph Entity Dependencies (GEDs). • A new and efficient approach for the discovery of GEDs in property graphs. • A minimum description length inspired definition of interestingness of GEDs to rank discovered rules. • A thorough empirical evaluation of the proposed technique, with examples of useful rules (mined) that are relevant in data quality/management applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Detecting the adversarially-learned injection attacks via knowledge graphs.
- Author
-
Hao, Yaojun, Wang, Haotian, Zhao, Qingshan, Feng, Liping, and Wang, Jian
- Subjects
- *
KNOWLEDGE graphs , *PRINCIPAL components analysis , *POPULARITY , *DETECTORS - Abstract
Over the past two decades, many studies have devoted a good deal of attention to detect injection attacks in recommender systems. However, most of the studies mainly focus on detecting the heuristically-generated injection attacks, which are heuristically fabricated by hand-engineering. In practice, the adversarially-learned injection attacks have been proposed based on optimization methods and enhanced the ability in the camouflage and threat. Under the adversarially-learned injection attacks, the traditional detection models are likely to be fooled. In this paper, a detection method is proposed for the adversarially-learned injection attacks via knowledge graphs. Firstly, with the advantages of wealth information from knowledge graphs, item-pairs on the extension hops of knowledge graphs are regarded as the implicit preferences for users. Also, the item-pair popularity series and user item-pair matrix are constructed to express the user's preferences. Secondly, the word embedding model and principal component analysis are utilized to extract the user's initial vector representations from the item-pair popularity series and item-pair matrix, respectively. Moreover, the Variational Autoencoders with the improved R-drop regularization are used to reconstruct the embedding vectors and further identify the shilling profiles. Finally, the experiments on three real-world datasets indicate that the proposed detector has superior performance than benchmark methods when detecting the adversarially-learned injection attacks. In addition, the detector is evaluated under the heuristically-generated injection attacks and demonstrates the outstanding performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. FDM: Effective and efficient incident detection on sparse trajectory data.
- Author
-
Han, Xiaolin, Grubenmann, Tobias, Ma, Chenhao, Li, Xiaodong, Sun, Wenya, Wong, Sze Chun, Shang, Xuequn, and Cheng, Reynold
- Subjects
- *
LOCATION data , *GPS receivers , *TRAFFIC monitoring , *DATA mining , *MACHINE learning - Abstract
Incident detection (ID), or the automatic discovery of anomalies from road traffic data (e.g., road sensor and GPS data), enables emergency actions (e.g., rescuing injured people) to be carried out in a timely fashion. Existing ID solutions based on data mining or machine learning often rely on dense traffic data; for instance, sensors installed in highways provide frequent updates of road information. In this paper, we ask the question: can ID be performed on sparse traffic data (e.g., location data obtained from GPS devices equipped on vehicles)? As these data may not be enough to describe the state of the roads involved, they can undermine the effectiveness of existing ID solutions. To tackle this challenge, we borrow an important insight from the transportation area, which uses trajectories (i.e., moving histories of vehicles) to derive incident patterns. We study how to obtain incident patterns from trajectories and devise a new solution (called F ilter- D iscovery- M atch (FDM)) to detect anomalies in sparse traffic data. We have also developed a fast algorithm to support FDM. Experiments on a taxi dataset in Hong Kong and a simulated dataset show that FDM is more effective than state-of-the-art ID solutions on sparse traffic data, and is also efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Evaluating the quality of a set of modelling languages used in combination: A method and a tool.
- Author
-
Giraldo, Fáber D., España, Sergio, Giraldo, William J., and Pastor, Óscar
- Subjects
- *
ARTIFICIAL languages , *INFORMATION storage & retrieval systems , *SEMANTICS , *SUBSET selection , *ELECTRONIC data processing - Abstract
Modelling languages have proved to be an effective tool to specify and analyse various perspectives of enterprises and information systems. In addition to modelling language designs, works on model quality and modelling language quality evaluation have contributed to the maturity of the model-driven engineering (MDE) field. Although consolidated knowledge on quality evaluation is still relevant to this scenario, in previous works, we have identified misalignments between the topics that academia is addressing and the needs of industry in applying MDE, thus identifying some remaining challenges. In this paper, we focus on the need for a method to evaluate the quality of a set of modelling languages used in combination within a MDE environment. This paper presents MMQEF ( Multiple Modelling language Quality Evaluation Framework ), describing its foundations, presenting its method components and discussing its trade-offs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.