Publisher: springer berlin heidelberg / Topic: computer.software_genre and data mining - Searchworks@Jio Institute Digital Library Search Results

1. An Evaluation Framework for Analytical Methods of Integrating Electronic Word-of-Mouth Information: Position Paper

Author: Kazunori Fujimoto
Subjects: Set (abstract data type), Probabilistic classification, Information retrieval, Computer science, Encoding (memory), Models of communication, Probabilistic logic, Position paper, Data mining, computer.software_genre, Representation (mathematics), computer, Decoding methods
Abstract: This paper presents an evaluation framework for analytical methods of integrating eWOM Information. This framework involves a communication model that assumes a set of human subjective probabilities called an belief source and includes two translation processes: (1) encoding the belief source into a representation to communicate with a computer; these encoded messages are called eWOM messages, and (2) in the computer, decoding the eWOM messages to estimate the probabilities in the belief source. The efficiency of reducing the difficulty of describing the belief source and the accuracy of reconstructing the belief source are quantitated using this model. The evaluation processes are illustrated with an analytical method of integrating eWOM messages for probabilistic classification problems.
Published: 2010

2. A Short Paper on Blind Signatures from Knowledge Assumptions

Author: Lucjan Hanzlik and Kamil Kluczniak
Subjects: Theoretical computer science, Computer science, String (computer science), 0102 computer and information sciences, 02 engineering and technology, Okamoto–Uchiyama cryptosystem, Approx, computer.software_genre, 01 natural sciences, Signature (logic), Random oracle, 010201 computation theory & mathematics, 0202 electrical engineering, electronic engineering, information engineering, Blind signature, 020201 artificial intelligence & image processing, Data mining, Impossibility, computer, Standard model (cryptography)
Abstract: This paper concerns blind signature schemes. We focus on two moves constructions, which imply concurrent security. There are known efficient blind signature schemes based on the random oracle model and on the common reference string model. However, constructing two move blind signatures in the standard model is a challenging task, as shown by the impossibility results of Fischlin et al. The recent construction by Garg et al. (Eurocrypt’14) bypasses this result by using complexity leveraging, but it is impractical due to the signature size (\(\approx \) 100 kB). Fuchsbauer et al. (Crypto’15) presented a more practical construction, but with a security argument based on interactive assumptions. We present a blind signature scheme that is two-move, setup-free and comparable in terms of efficiency with the results of Fuchsbauer et al. Its security is based on a knowledge assumption.
Published: 2017

3. The Research and Application of Fuzzy Entropy Weight Comprehensive Evaluation Method in Paper Quality Evaluation

Author: Baoxiang Liu and Cuilan Mi
Subjects: Fuzzy entropy, Computer science, Test quality, Evaluation methods, Weight distribution, Entropy (information theory), Paper quality, Weight coefficient, Data mining, computer.software_genre, computer, Simulation, Standard deviation
Abstract: According to the fuzziness of Each index in Test quality evaluation, The entropy value theory of information will be used to test quality evaluation, use The difficulty, degree of differentiate, believe degree, validity and the standard deviation. As the impact of the test quality evaluation index. Establish a comprehensive evaluation index system, Using the information entropy as evaluation index weight coefficient, which can effectively solve the weight distribution difficulties. Weight is an objectivity, This method is a new test quality evaluation method, and connecting with the example of application, The results show that the method was simple, practical and reliable.
Published: 2011

4. Multi-objective Test Paper Evaluation in the Process of Composing with Computer

Author: Yan Li, Jiqiang Tang, and Min Fu
Subjects: Computer science, Process (computing), Key (cryptography), Objective test, Sample (statistics), Objective evaluation, Data mining, computer.software_genre, computer, Test (assessment)
Abstract: Since traditional statistics evaluation can’t completely be used to evaluate test paper composing because it is hardly to get score sample for no testing held, the multi-objective test paper evaluation is proposed to evaluate test paper composing with computer. The key method of proposed evaluation is to use the constraints defined in the outline of examination to establish the multi-objective of test and to use the absolute distance between temporary paper and final test paper to compute the degree of approximation. The experiments show that the proposed evaluation can evaluate test paper composing with computer, and the tradeoff can be got between the objective evaluation with computer and subjective evaluation with person.
Published: 2012

5. JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Author: Hung Son Nguyen, Sebastian Stawicki, Adam Krasuski, Andrzej Janusz, and Dominik Ślęzak
Subjects: Multi-label classification, Information retrieval, Scope (project management), Test data generation, Computer science, computer.software_genre, CONTEST, Data science, Task (project management), Competition (economics), Explicit semantic analysis, Scalability, Data mining, GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), computer
Abstract: We summarize the JRS’2012 Data Mining Competition on “Topical Classification of Biomedical Research Papers”, held between January 2, 2012 and March 30, 2012 as an interactive on-line contest hosted on the TunedIT platform ( http://tunedit.org ). We present the scope and background of the challenge task, the evaluation procedure, the progress, and the results. We also present a scalable method for the contest data generation from biomedical research papers.
Published: 2012

6. Experiments with Filtered Detection of Similar Academic Papers

Author: Aharon Tayeb and Yaakov HaCohen-Kerner
Subjects: Measure (data warehouse), Heuristic, Computer science, Fingerprint (computing), Filter (signal processing), Data mining, computer.software_genre, computer
Abstract: In this research, we investigate the issue of efficient detection of similar academic papers. Given a specific paper, and a corpus of academic papers, most of the papers from the corpus are filtered out using a fast filter method. Then, 47 methods (baseline methods and combinations of them) are applied to detect similar papers, where 34 of the methods are variants of new methods. These 34 methods are divided into three new method sets: rare words, combinations of at least two methods, and compare methods between portions of the papers. Results achieved by some of the 34 heuristic methods are better than the results of previous heuristic methods, comparing to the results of the “Full Fingerprint” (FF) method, an expensive method that served as an expert. Nevertheless, the run time of the new methods is much more efficient than the run time of the FF method. The most interesting finding is a method called CWA(1) that computes the frequency of rare words that appear only once in both compared papers. This method has been found as an efficient measure to check whether two papers are similar.
Published: 2012

7. Paper Retrieval Based on Specific Paper Features: Chain and Laid Lines

Author: Pavel Paclík, J.C.A. van der Lubbe, M. van Staalduinen, and E. Backer
Subjects: Similarity (geometry), business.industry, Computer science, Image processing, Similarity measure, computer.software_genre, Similitude, Set (abstract data type), Metric (mathematics), Visual Word, Artificial intelligence, Data mining, business, computer
Abstract: This paper presents paper retrieval using the specific paper features chain and laid lines. Paper features are detected in digitized paper images and they are represented such that they could be used for retrieval. Optimal retrieval performance is achieved by means of a trainable similarity measure for a given set of paper features. By means of these methods a retrieval system is developed that art experts could use real-time in order to speed up their paper research.
Published: 2006

8. The Icecite Research Paper Management System

Author: Claudius Korzen and Hannah Bast
Subjects: Conditional random field, Information retrieval, Computer science, media_common.quotation_subject, Digital library, computer.software_genre, Metadata, Upload, Index (publishing), Management system, Quality (business), Data mining, Baseline (configuration management), computer, media_common
Abstract: We present Icecite, a new fully web-based research paper management system (RPMS). Icecite facilitates the following otherwise laborious and time-consuming steps typically involved in literature research: automatic metadata and reference extraction, on-click reference downloading, shared annotations, offline availability, and full-featured search in metadata, full texts, and annotations. None of the many existing RPMSs provides this feature set. For the metadata and reference extraction, we use a rule-based approach combined with an index-based approximate search on a given reference database. An extensive quality evaluation, using DBLP and PubMed as reference databases, shows extraction accuracies of above 95%. We also provide a small user study, comparing Icecite to the state-of-the-art RPMS Mendeley as well as to an RPMS-free baseline.
Published: 2013

9. Application Research of the Genetic Algorithm on the Intelligent Test Paper Composition of Examination Database

Author: Li Jinhui, Zhang Fang, and Li Na
Subjects: Database, Computer science, media_common.quotation_subject, Function (mathematics), Genetic operator, computer.software_genre, Adaptability, Test (assessment), Genetic algorithm, Chromosome encoding, Data mining, computer, Realization (systems), Composition (language), media_common
Abstract: This paper, we come up a new improved genetic algorithm (GA) which suitable to the issue of test paper composition after analyzing the common algorithm of test paper composition. Focuses on the design and realization of test paper composition model established, chromosome encoding method of test paper composition, adaptability function and genetic operator. Experimental results show that this improved genetic algorithm is a practical and effective test paper composition algorithm with high-performance and high-efficiency.
Published: 2011

10. Recyclable Waste Paper Sorting Using Template Matching

Author: Hassan Basri, Edgar Scavino, Mohammad Osiur Rahman, Mahammad A. Hannan, and Aini Hussain
Subjects: Identification (information), Sorting algorithm, Pixel, Computer science, Template matching, Sorting, RGB color model, Image processing, Data mining, computer.software_genre, Throughput (business), computer
Abstract: This paper explores the application of image processing techniques in recyclable waste paper sorting. In recycling, waste papers are segregated into various grades as they are subjected to different recycling processes. Highly sorted paper streams will facilitate high quality end products, and save processing chemicals and energy. Since 1932 to 2009, different mechanical and optical paper sorting methods have been developed to fill the demand of paper sorting. Still, in many countries including Malaysia, waste papers are sorted into different grades using manual sorting system. Due to inadequate throughput and some major drawbacks of mechanical paper sorting systems, the popularity of optical paper sorting systems is increased. Automated paper sorting systems offer significant advantages over human inspection in terms of fatigue, throughput, speed, and accuracy. This research attempts to develop a smart vision sensing system that able to separate the different grades of paper using Template Matching. For constructing template database, the RGB components of the pixel values are used to construct RGBString for template images. Finally, paper object grade is identified based on the maximum occurrence of a specific template image in the search image. The outcomes from the experiment in classification for White Paper, Old Newsprint Paper and Old Corrugated Cardboard are 96%, 92% and 96%, respectively. The remarkable achievement obtained with the method is the accurate identification and dynamic sorting of all grades of papers using simple image processing techniques.
Published: 2009

11. Screening Paper Runnability in a Web-Offset Pressroom by Data Mining

Author: Ahmad Alzghoul, Magnus Hållander, Antanas Verikas, Adas Gelzinis, and Marija Bacauskiene
Subjects: Offset (computer science), Computer science, business.industry, Data classification, Information and Computer Science, Feature selection, Machine learning, computer.software_genre, Data mapping, Search engine, Test set, Data mining, Artificial intelligence, business, computer, Classifier (UML)
Abstract: This paper is concerned with data mining techniques for identifying the main parameters of the printing press, the printing process and paper affecting the occurrence of paper web breaks in a pressroom. Two approaches are explored. The first one treats the problem as a task of data classification into "break " and "non break " classes. The procedures of classifier design and selection of relevant input variables are integrated into one process based on genetic search. The search process results in a set of input variables providing the lowest average loss incurred in taking decisions. The second approach, also based on genetic search, combines procedures of input variable selection and data mapping into a low dimensional space. The tests have shown that the web tension parameters are amongst the most important ones. It was also found that, provided the basic off-line paper parameters are in an acceptable range, the paper related parameters recorded online contain more information for predicting the occurrence of web breaks than the off-line ones. Using the selected set of parameters, on average, 93.7% of the test set data were classified correctly. The average classification accuracy of the break cases was equal to 76.7%.
Published: 2009

12. Entropy Estimation for Real-Time Encrypted Traffic Identification (Short Paper)

Author: Peter Dorfinger, Wolfgang John, and Georg Panholzer
Subjects: File Transfer Protocol, business.industry, Network packet, Computer science, Traffic identification, Real-time computing, Detector, Encryption, computer.software_genre, Entropy estimation, Entropy (information theory), Data mining, business, computer
Abstract: This paper describes a novel approach to classify network traffic into encrypted and unencrypted traffic. The classifier is able to operate in real-time as only the first packet of each flow is processed. The main metric used for classification is an estimation of the entropy of the first packet payload. The approach is evaluated based on encrypted ground truth traces and on real network traces. Encrypted traffic such as Skype, or encrypted eDonkey traffic are detected as encrypted with probability higher than 94%. Unencrypted protocols such as SMTP, HTTP, POP3 or FTP are detected as unencrypted with probability higher than 99.9%. The presented approach, named real-time encrypted traffic detector (RT-ETD), is well suited to operate as pre-filter for advanced classification approaches to enable their applicability on increased bandwidth.
Published: 2011

13. A Reliable Classification Method for Paper Currency Based on LVQ Neural Network

Author: Xiaofeng Li, Xuedong Li, Hongling Gou, and Jing Yi
Subjects: Learning vector quantization, Artificial neural network, business.industry, Computer science, Feature vector, Pattern recognition, computer.software_genre, Kernel principal component analysis, Principal component analysis, Classification methods, Artificial intelligence, Data mining, business, Classifier (UML), computer, Lvq neural network
Abstract: To increase the reliability of currency classification, a classification method using neural networks with multi-pattern vectors is proposed in this paper. The data space of samples are divided into three blocks, then the latter are further divided into four sub-pattern vectors, and kernel principal component analysis is applied to extract features and assemble feature vectors to train LVQ neural network classifier. We draw the conclusion by testing new fifth edition RMB including four kinds of inputting directions of 1 Yuan, 5 Yuan, 10 Yuan and 20 Yuan RMB, up to 800 samples that PCA can compress data and decrease dimension of input vectors, extract the feature vectors effectively, thus the high-level reliability can be achieved by using the LVQ network classifier.
Published: 2011

14. S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

Author: Xiaoyong Du, Pei Li, Jun He, Yuanzhe Cai, and Hongyan Liu
Subjects: SimRank, Computer science, Content analysis, business.industry, Graph (abstract data type), Artificial intelligence, Data mining, business, Machine learning, computer.software_genre, Cluster analysis, computer, Link analysis
Abstract: Both Content analysis and link analysis have its advantages in measuring relationships among documents. In this paper, we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted on ACM Data Set show that our new algorithm, S-SimRank,outperforms other algorithms.
Published: 2008

15. Modelling Citation Networks for Improving Scientific Paper Classification Performance

Author: Mengjie Zhang, Minh Duc Cao, Xiaoying Gao, and Yuejin Ma
Subjects: Computer science, business.industry, Probabilistic logic, Bayesian network, Hyperlink, computer.software_genre, Machine learning, Class (biology), Data set, Naive Bayes classifier, Content analysis, Data mining, Artificial intelligence, Citation, business, computer
Abstract: This paper describes an approach to the use of citation links to improve the scientific paper classification performance. In this approach, we develop two refinement functions, a linear label refinement (LLR) and a probabilistic label refinement (PLR), to model the citation link structures of the scientific papers for refining the class labels of the documents obtained by the content-based Naive Bayes classification method. The approach with the two new refinement models is examined and compared with the content-based Naive Bayes method on a standard paper classification data set with increasing training set sizes. The results suggest that both refinement models can significantly improve the system performance over the content-based method for all the training set sizes and that PLR is better than LLR when the training examples are sufficient.
Published: 2006

16. Data Integration Hub for a Hybrid Paper Search

Author: Geoffrey C. Fox, Seong Joon Yoo, and Jungkee Kim
Subjects: Information retrieval, Concept search, Web search query, Computer science, business.industry, computer.internet_protocol, Search analytics, Semantic search, Unstructured data, computer.software_genre, Metadata, Search engine, Relational database management system, The Internet, Data mining, business, computer, XML, Information integration, Data integration
Abstract: In this paper we describe the design of a hybrid search that combines simple metadata search with a traditional keyword search over unstructured context data. This paradigm provides the inquirer additional options to narrow the search with some semantic aspects through the XML metadata query. We demonstrate a paper search for a case study of the hybrid search, and describe a data integration hub to integrate those data dispersed on the Net.
Published: 2005

17. Research Paper Recommender Systems: A Subspace Clustering Approach

Author: Huan Liu, Ehtesham Haque, Lance Parsons, and Nitin Agarwal
Subjects: Search engine, Information retrieval, Computer science, Scalability, Collaborative filtering, Leverage (statistics), Data mining, Recommender system, Cluster analysis, computer.software_genre, computer, Hash table
Abstract: Researchers from the same lab often spend a considerable amount of time searching for published articles relevant to their current project. Despite having similar interests, they conduct independent, time consuming searches. While they may share the results afterwards, they are unable to leverage previous search results during the search process. We propose a research paper recommender system that avoids such time consuming searches by augmenting existing search engines with recommendations based on previous searches performed by others in the lab. Most existing recommender systems were developed for commercial domains with millions of users. The research paper domain has relatively few users compared to the large number of online research papers. The two major challenges with this type of data are the large number of dimensions and the sparseness of the data. The novel contribution of the paper is a scalable subspace clustering algorithm (SCuBA) that tackles these problems. Both synthetic and benchmark datasets are used to evaluate the clustering algorithm and to demonstrate that it performs better than the traditional collaborative filtering approaches when recommending research papers.
Published: 2005

18. Working Group IV — Analysis — Position Paper: Spatial Data Analysis in 3D GIS

Author: Jiyeong Lee
Subjects: Metadata, Data model, Computer science, Metric (mathematics), Data mining, Data structure, Geometric modeling, computer.software_genre, computer, Spatial analysis, Visualization, Data modeling
Abstract: One of major challenging tasks of 3D GIS is to support spatial analysis among different types of real 3D objects. The analysis functions in 3D require more complex algorithms than 2D functions, and have a considerable influence on the computational complexity. In order to maintain a good performance, not only are the algorithms implemented efficiently, but also the 3D spatial objects are represented by a suitable 3D data model. However, it is a difficult task to select an appropriate data structure designed for the characteristics of the applications, for example, objects of interest, resolution, required spatial analysis, etc. (Zlatanova et al. 2004). A model designed for 3D spatial analysis may not exhibit good performance on 3D visualization and navigation. In other words, different data models might be suitable for the execution of specific tasks but not others. In order to maximize efficiency and effectiveness in the provision of operations, Oosterom et al. (2002) proposed multiple topological models maintained in one database by describing the objects, rules and constraints of each model in a metadata table. Metric and position operations such as area or volume computations are realised on the geometric model, while spatial relationship operations such as ‘meet’ and ‘overlap’ are performed on the topological model. However, it is necessary to find out whether the developed 3D data models are designed for 3D spatial analysis.
Published: 2008

19. DAN: An Automatic Segmentation and Classification Engine for Paper Documents

Author: F. G. De Rosa, Alessio Malizia, Stefano Levialdi, and Luigi Cinque
Subjects: Information retrieval, Data acquisition, computer.internet_protocol, Computer science, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Office automation, Automatic segmentation, Segmentation, Data mining, Document processing, computer.software_genre, computer, XML
Abstract: The paper documents recognition is fundamental for office automation becoming every day a more powerful tool in those fields where information is still on paper. Document recognition follows from data acquisition, from both journals, and entire books in order to transform them in digital objects. We present a new system DAN (Document Analysis on Network) for Document recognition that follows the Open Source methodologies, XML description for documents segmentation and classification, which turns to be beneficial in terms of classification precision, and general-purpose availability.
Published: 2002

20. Quantifying Information Leakage in Tree-Based Hash Protocols (Short Paper)

Author: Karsten Nohl and David Evans
Subjects: Information privacy, Computer science, business.industry, Hash function, computer.software_genre, Tree (data structure), Attack model, Threat model, Information leakage, Cryptographic hash function, Radio-frequency identification, Data mining, business, computer
Abstract: Radio Frequency Identification (RFID) systems promise large scale, automated tracking solutions but also pose a threat to customer privacy. The tree-based hash protocol proposed by Molnar and Wagner presents a scalable, privacy-preserving solution. Previous analyses of this protocol concluded that an attacker who can extract secrets from a large number of tags can compromise privacy of other tags. We propose a new metric for information leakage in RFID protocols along with a threat model that more realistically captures the goals and capabilities of potential attackers. Using this metric, we measure the information leakage in the tree-based hash protocol and estimate an attacker's probability of success in tracking targeted individuals, considering scenarios in which multiple information sources can be combined to track an individual. We conclude that an attacker has a reasonable chance of tracking tags when the tree-based hash protocol is used.
Published: 2006

21. Document Reverse Engineering: From Paper to XML

Author: Kyong-Ho Lee, Xiao Tang, V. R. McCrary, Yoon-Chul Choy, and Sung-Bae Cho
Subjects: Document Structure Description, XML Encryption, Hierarchy, Information retrieval, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, Well-formed document, XML validation, computer.file_format, Document type definition, computer.software_genre, XML framework, XML Schema (W3C), XML database, Simple API for XML, Document Schema Definition Languages, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Data mining, computer, XML
Abstract: Since XML has the advantage of embedding logical structure information into documents, it is widely used as the universal format for structured documents on the Web. This makes it attractive to convert paper-based documents with logical hierarchy into XML representations automatically. Document image analysis and understanding [1] consists of two phases: geometric and logical structure analysis. Because the two phases take different kinds of data as input, it may not be desirable to apply the same method to them. Targeting technical journal document with multiple pages, we present a hybridization of knowledge-based and syntactic methods for geometric and logical structure analysis of document images.
Published: 2002

22. Data mining and data visualization: Position paper for the second IEEE workshop on database issues for data visualization

Author: Bhavani Thuraisingham and Georges Grinstein
Subjects: Information retrieval, Database, business.industry, Computer science, computer.software_genre, Data science, Data warehouse, Metadata, Identification (information), Information visualization, Data visualization, Data access, Knowledge extraction, Data mining, business, Cluster analysis, computer
Abstract: The government, corporate, and industrial communities are faced with an ever increasing number of databases. These databases need not only to be managed, but also explored. The first requires secure access to distributed heterogeneous multimedia databases with rich metadata and having to meet timing constraints. The second requires exploratory tools supporting the identification of domain and mission critical elements such as patterns in data access (e.g., security breach determinations), patterns in data (e.g., marketing and clustering), or for patterns in transactions (e.g., data compression), to site a few. Knowledge Discovery in Databases is a relatively new research area that employs a variety of tools to explore and identify structure and patterns in these large databases. Often the data is preprocessed to facilitate such computations (data warehousing). The data is then mined for specific rules that are built incrementally and often steered by users with a specific set of goals in mind.
Published: 1996

23. Using data mining techniques to fight and control epidemics: A scoping review

Author: Soheila Saeedi, Reza Safdari, Marsa Gholamzadeh, Sorayya Rezayi, and Mozhgan Tanhapour
Subjects: medicine.medical_specialty, Review Paper, business.industry, Public health, Biomedical Engineering, Scopus, COVID-19, Bioengineering, Disease, Review, computer.software_genre, Applied Microbiology and Biotechnology, Checklist, Systematic review, Knowledge extraction, Pandemic, Health care, medicine, Data mining, Psychology, business, computer, Pandemics, Biotechnology
Abstract: The main objective of this survey is to study the published articles to determine the most favorite data mining methods and gap of knowledge. Since the threat of pandemics has raised concerns for public health, data mining techniques were applied by researchers to reveal the hidden knowledge. Web of Science, Scopus, and PubMed databases were selected for systematic searches. Then, all of the retrieved articles were screened in the stepwise process according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist to select appropriate articles. All of the results were analyzed and summarized based on some classifications. Out of 335 citations were retrieved, 50 articles were determined as eligible articles through a scoping review. The review results showed that the most favorite DM belonged to Natural language processing (22%) and the most commonly proposed approach was revealing disease characteristics (22%). Regarding diseases, the most addressed disease was COVID-19. The studies show a predominance of applying supervised learning techniques (90%). Concerning healthcare scopes, we found that infectious disease (36%) to be the most frequent, closely followed by epidemiology discipline. The most common software used in the studies was SPSS (22%) and R (20%). The results revealed that some valuable researches conducted by employing the capabilities of knowledge discovery methods to understand the unknown dimensions of diseases in pandemics. But most researches will need in terms of treatment and disease control.
Published: 2021

24. Multi-purpose, multi-level feature modeling of large-scale industrial software systems

Author: Paul Grünbacher, Daniela Rabiser, Herbert Prähofer, Andreas Grimmer, Florian Angerer, Klaus Eder, Michael Petruzelka, and Mario Kromoser
Subjects: Computer science, Modeling language, business.industry, Special Section Paper, Case study, 020207 software engineering, 02 engineering and technology, computer.software_genre, Modularity, Automation, Feature model, Feature modeling, Consistency (database systems), Feature (computer vision), Modelling and Simulation, Modeling and Simulation, 0202 electrical engineering, electronic engineering, information engineering, Product management, Large-scale software systems, 020201 artificial intelligence & image processing, Software system, Data mining, business, computer, Software
Abstract: Feature models are frequently used to capture the knowledge about configurable software systems and product lines. However, feature modeling of large-scale systems is challenging as models are needed for diverse purposes. For instance, feature models can be used to reflect the perspectives of product management, technical solution architecture, or product configuration. Furthermore, models are required at different levels of granularity. Although numerous approaches and tools are available, it remains hard to define the purpose, scope, and granularity of feature models. This paper first reports results and experiences of an exploratory case study on developing feature models for two large-scale industrial automation software systems. We report results on the characteristics and modularity of the feature models, including metrics about model dependencies. Based on the findings from the study, we developed FORCE, a modeling language, and tool environment that extends an existing feature modeling approach to support models for different purposes and at multiple levels, including mappings to the code base. We demonstrate the expressiveness and extensibility of our approach by applying it to the well-known Pick and Place Unit example and an injection molding subsystem of an industrial product line. We further show how our approach supports consistency between different feature models. Our results and experiences show that considering the purpose and level of features is useful for modeling large-scale systems and that modeling dependencies between feature models is essential for developing a system-wide perspective.
Published: 2016

25. Matching events and activities by integrating behavioral aspects and label analysis

Author: Jan Mendling, Claudio Di Ciccio, Thomas Baier, and Mathias Weske
Subjects: Matching (statistics), Process (engineering), Business process, Computer science, 102013 Human-computer interaction, Process mining, 02 engineering and technology, Constraint satisfaction, computer.software_genre, Conformance checking, Business Process Model and Notation, Business process discovery, 502050 Wirtschaftsinformatik, 020204 information systems, 102001 Artificial intelligence, 0202 electrical engineering, electronic engineering, information engineering, ddc:00, Declare, Business process intelligence, 102022 Softwareentwicklung, Institut für Informatik und Computational Science, Special Section Paper, Natural language processing, Business process modeling, 502050 Business informatics, 102022 Software development, Event mapping, Modeling and Simulation, 020201 artificial intelligence & image processing, Data mining, computer, process mining / event mapping / business process intelligence / constraint satisfaction / DECLARE / natural language processing, Software
Abstract: Nowadays, business processes are increasingly supported by IT services that produce massive amounts of event data during the execution of a process. These event data can be used to analyze the process using process mining techniques to discover the real process, measure conformance to a given process model, or to enhance existing models with performance information. Mapping the produced events to activities of a given process model is essential for conformance checking, annotation and understanding of process mining results. In order to accomplish this mapping with low manual effort, we developed a semi-automatic approach that maps events to activities using insights from behavioral analysis and label analysis. The approach extracts Declare constraints from both the log and the model to build matching constraints to efficiently reduce the number of possible mappings. These mappings are further reduced using techniques from natural language processing, which allow for a matching based on labels and external knowledge sources. The evaluation with synthetic and real-life data demonstrates the effectiveness of the approach and its robustness toward non-conforming execution logs.
Published: 2018

26. Navigating Interpretability Issues in Evolving Fuzzy Systems

Author: Edwin Lughofer
Subjects: Basis (linear algebra), Point (typography), Data stream mining, Computer science, Position paper, Context (language use), Fuzzy control system, Data mining, computer.software_genre, Data science, Fuzzy logic, computer, Interpretability
Abstract: In this position paper, we are investigating interpretability issues in the context of evolving fuzzy systems (EFS). Current EFS approaches, developed during the last years, are basically providing methodologies for precise modeling tasks, i.e. relations and system dependencies implicitly contained in on-line data streams are modeled as accurately as possible. This is achieved by permanent dynamic updates and evolution of structural components. Little attention has been paid to the interpretable power of these evolved systems, which, however, originally was one fundamental strength of fuzzy models over other (data-driven) model architectures. This paper will present the (little) achievements already made in this direction, discuss new concepts and point out open issues for future research. Various well-known and important interpretability criteria will serve as basis for our investigations.
Published: 2012

27. When Is a Confidence Measure Good Enough?

Author: Daniel Kondermann, Aura Hernández-Sabaté, Debora Gil, and Patricia Márquez-Valle
Subjects: Measure (data warehouse), Computer science, media_common.quotation_subject, Optical flow, Image processing, computer.software_genre, Data science, Field (computer science), Term (time), Position paper, Quality (business), Data mining, Meaning (existential), computer, media_common
Abstract: Confidence estimation has recently become a hot topic in image processing and computer vision. Yet, several definitions exist of the term "confidence" which are sometimes used interchangeably. This is a position paper, in which we aim to give an overview on existing definitions, thereby clarifying the meaning of the used terms to facilitate further research in this field. Based on these clarifications, we develop a theory to compare confidence measures with respect to their quality.
Published: 2013

28. Ontologies and Similarity

Author: Steffen Staab
Subjects: business.industry, Computer science, Differentia, Short paper, Disjoint sets, computer.software_genre, Intersection, Stepping stone, Similarity (psychology), Artificial intelligence, Data mining, business, computer, Natural language processing
Abstract: Ontologies [9] comprise a definition of concepts describing their commonalities (genus proximum) as well as their differences (differentia specifica). One might think that with the definition of commonalities and differences, the definition of similarities in and for ontologies should follow immediately. Traditionally, however, the contrary is true, because the method background of ontologies, i.e. logics-based representations, and similarity, i.e. geometry-based representations, have been explored in disjoint communities that have mixed only to a limited extent. In this short paper we survey how our own work touches on the intersection between ontologies and similarity. While this cannot be a comprehensive account of the interrelationship between ontologies and similarity, we aim it to be a stepping stone for inspiration and for indicating entry points for future investigations.
Published: 2011

29. Automatic Recognition Algorithm of Traffic Signs in Road Tunnel

Author: Liu Bing-han and Wang Wei-zhi
Subjects: Engineering, business.industry, Pattern recognition (psychology), Feature extraction, Decision tree, Feature selection, Paper based, Data mining, business, Recognition algorithm, computer.software_genre, computer
Abstract: We conduct feature extraction and feature selection of the pattern of traffic signs based on environmental characteristics of the road tunnel, and the color and shape information of traffic signs, then further accomplish multi-level classification of traffic signs using decision tree method. The method proposed in this paper based on decision tree classification algorithm can convert a complex multi-class problem into several simple classifications. Experimental results show that the algorithm has good recognition results.
Published: 2011

30. Context Semantic Filtering for Mobile Advertisement

Author: Andrés Moreno and Harold Castro
Subjects: Information retrieval, Computer science, Advertising, Semantic filtering, Recommender system, computer.software_genre, Information overload, Semantic similarity, Collaborative filtering, Profiling (information science), Position paper, Data mining, computer, Information filtering system
Abstract: Mobile advertisement causes an information overload problem that is addressed by information filtering systems. Semantical filtering systems stand out in comparison to traditional approaches thanks to their use of ontologies as knowledge model improving automatic user profiling and content matching processes in filtering. This position paper identifies some enhancement opportunities related to these two processes, manifold: The formulation of a semantic similarity metric that points out the importance of the relations and properties present in the knowledge domain and a extension in the contextual information included so far in filtering systems. The expected result of the work is to improve the overall effectiveness of semantic information filtering systems, tested in the mobile advertisement scenario.
Published: 2010

31. Introduction to 'Rule Transformation and Extraction' Track

Author: Mark H. Linehan and Eric Putrycz
Subjects: Transformation (function), Information retrieval, Computer science, Business rule, Short paper, Data mining, computer.software_genre, Track (rail transport), computer
Abstract: In this short paper, we summarize the "Rule Transformation and Extraction" topic, defining the terms, describing some of the main approaches to the topic, and reviewing the current challenges for both rule transformation and extraction.
Published: 2009

32. How Protective Are Synthetic Data?

Author: Lars Vilhuber and John M. Abowd
Subjects: Information privacy, Computer science, Short paper, Differential privacy, Probability distribution, Confidentiality, Data mining, Conditional probability distribution, computer.software_genre, computer, Synthetic data, Laplace distribution
Abstract: This short paper provides a synthesis of the statistical disclosure limitation and computer science data privacy approaches to measuring the confidentiality protections provided by fully synthetic data. Since all elements of the data records in the release file derived from fully synthetic data are sampled from an appropriate probability distribution, they do not represent "real data," but there is still a disclosure risk. In SDL this risk is summarized by the inferential disclosure probability. In privacy-protected database queries, this risk is measured by the differential privacy ratio. The two are closely related. This result (not new) is demonstrated and examples are provided from recent work.
Published: 2008

33. Imprecise Probability as an Approach to Improved Dependability in High-Level Information Fusion

Author: Ronnie Johansson, Alexander Karlsson, and Sten F. Andler
Subjects: Decision support system, Situation awareness, Computer science, business.industry, Bayesian probability, Bayesian network, Machine learning, computer.software_genre, Imprecise probability, Position paper, Dependability, Artificial intelligence, Data mining, business, computer, Reliability (statistics)
Abstract: The main goal of information fusion can be seen as improving human or automatic decision-making by exploiting diversities in information from multiple sources. High-level information fusion aims specifically at decision support regarding situations, often expressed as “achieving situation awareness”. A crucial issue for decision making based on such support is trust that can be defined as “accepted dependence”, where dependence or dependability is an overall term for many other concepts, e.g., reliability. This position paper reports on ongoing and planned research concerning imprecise probability as an approach to improved dependability in high-level information fusion. We elaborate on high-level information fusion from a generic perspective and a partial mapping from a taxonomy of dependability to high-level information fusion is presented. Three application domains: defense, manufacturing, and precision agriculture, where experiments are planned to be implemented are depicted. We conclude that high-level information fusion as an application-oriented research area, where precise probability (Bayesian theory) is commonly adopted, provides an excellent evaluation ground for imprecise probability.
Published: 2008

34. SMS and ASP: Hype or TST?

Author: Thomas Eiter
Subjects: World Wide Web, Answer set programming, Reflection (computer programming), Personal account, Computer science, InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS, Position paper, Data mining, Logic program, computer.software_genre, computer, Logic programming, Stable model semantics
Abstract: Twenty years of stable model semantics (SMS) and almost ten years of Answer Set Programming (ASP) are a good reason for a moment of reflection on these important concepts. This position paper gives a personal account of their history, aspects of ASP, and emphasizes the role of theory and practice in this area.
Published: 2008

35. Detecting Conserved RNA Secondary Structures in Viral Genomes: The RADAR Approach

Author: Mugdha Khaladkar and Jason T. L. Wang
Subjects: Web server, Computer science, business.industry, Short paper, Structural alignment, RNA, computer.software_genre, law.invention, ComputingMethodologies_PATTERNRECOGNITION, Viral genomes, law, The Internet, Data mining, Radar, business, Structural motif, computer
Abstract: Conserved regions, or motifs, present among RNA secondary structures serve as a useful indicator for predicting the functionality of the RNA molecules. Automated detection or discovery of these conserved regions is emerging as an important research topic in health and disease informatics. In this short paper we present a new approach for detecting conserved regions in RNA secondary structures by the use of constrained alignment and apply the approach to finding structural motifs in some viral genomes. Our experimental results show that the proposed approach is capable of efficiently detecting conserved regions in the viral genomes and is comparable to existing methods. We implement our constrained structure alignment algorithm into a web server, called RADAR. This web server is fully operational and accessible on the Internet at http://datalab.njit.edu/biodata/rna/RSmatch/server.htm.
Published: 2007

36. Online Mining in Sensor Networks

Author: Shuangfeng Li, Dongqing Yang, Dehui Zhang, Shiwei Tang, Qiong Luo, and Xiuli Ma
Subjects: Work (electrical), Computer science, Position paper, Data mining, Cluster analysis, computer.software_genre, computer, Wireless sensor network
Abstract: Online mining in large sensor networks just starts to attract interest. Finding patterns in such an environment is both compelling and challenging. The goal of this position paper is to understand the challenges and to identify the research problems in online mining for sensor networks. As an initial step, we identify the following three problems to work on: (1) sensor data irregularities detection; (2) sensor data clustering; and (3) sensory attribute correlations discovery. We also outline our preliminary proposal of solutions to these problems.
Published: 2004

37. A Survey of Data Mining Techniques

Author: José A. Sanandrés and Victor Maojo
Subjects: Inductive logic programming, Rule induction, Computer science, Short paper, State (computer science), Data mining, computer.software_genre, Data science, computer
Abstract: In this short paper we have resumed a keynote speech, to be given at the ISMDA 2000 conference, about data mining research and tools. We state a brief summary of the main concepts associated to data mining and some of the methods and tools used in the scientific world, mainly those that can associated to medical applications. Finally, some practical projects and conclusions are presented.
Published: 2000

38. Improving Sentiment Analysis for Social Media Applications Using an Ensemble Deep Learning Language Model

Author: Ahmed Alsayat
Subjects: Word embedding, Computer science, Context (language use), Machine learning, computer.software_genre, Research Article-Computer Engineering and Computer Science, Social media, Sentiment analysis, Ensemble algorithms, Classifier (linguistics), Feature (machine learning), Data mining, Multidisciplinary, Pandemic, business.industry, Deep learning, COVID-19, Coronavirus, Statistical classification, Language model, Artificial intelligence, business, computer
Abstract: As data grow rapidly on social media by users' contributions, specially with the recent coronavirus pandemic, the need to acquire knowledge of their behaviors is in high demand. The opinions behind posts on the pandemic are the scope of the tested dataset in this study. Finding the most suitable classification algorithms for this kind of data is challenging. Within this context, models of deep learning for sentiment analysis can introduce detailed representation capabilities and enhanced performance compared to existing feature-based techniques. In this paper, we focus on enhancing the performance of sentiment classification using a customized deep learning model with an advanced word embedding technique and create a long short-term memory (LSTM) network. Furthermore, we propose an ensemble model that combines our baseline classifier with other state-of-the-art classifiers used for sentiment analysis. The contributions of this paper are twofold. (1) We establish a robust framework based on word embedding and an LSTM network that learns the contextual relations among words and understands unseen or rare words in relatively emerging situations such as the coronavirus pandemic by recognizing suffixes and prefixes from training data. (2) We capture and utilize the significant differences in state-of-the-art methods by proposing a hybrid ensemble model for sentiment analysis. We conduct several experiments using our own Twitter coronavirus hashtag dataset as well as public review datasets from Amazon and Yelp. For concluding results, a statistical study is carried out indicating that the performance of these proposed models surpasses other models in terms of classification accuracy.
Published: 2021

39. SARS-CoV-2: a systematic review of indoor air sampling for virus detection

Author: Liane Yuri Kondo Nakada, José Roberto Guimarães, João Tito Borges, and Milena Guedes Maniero
Subjects: Impactor, Coronavirus disease 2019 (COVID-19), Indoor air, Health, Toxicology and Mutagenesis, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Air sampler, Sample (statistics), 010501 environmental sciences, computer.software_genre, 01 natural sciences, Biological air sampler, Environmental Factors and the Epidemics of COVID-19, Environmental Chemistry, Humans, Pandemics, Selection (genetic algorithm), 0105 earth and related environmental sciences, Aerosols, Impinger, SARS-CoV-2, Sampling (statistics), COVID-19, General Medicine, Pollution, Cyclone, Virus detection, Air Pollution, Indoor, Environmental science, Data mining, computer
Abstract: In a post-pandemic scenario, indoor air monitoring may be required seeking to safeguard public health, and therefore well-defined methods, protocols, and equipment play an important role. Considering the COVID-19 pandemic, this manuscript presents a literature review on indoor air sampling methods to detect viruses, especially SARS-CoV-2. The review was conducted using the following online databases: Web of Science, Science Direct, and PubMed, and the Boolean operators "AND" and "OR" to combine the following keywords: air sampler, coronavirus, COVID-19, indoor, and SARS-CoV-2. This review included 25 published papers reporting sampling and detection methods for SARS-CoV-2 in indoor environments. Most of the papers focused on sampling and analysis of viruses in aerosols present in contaminated areas and potential transmission to adjacent areas. Negative results were found in 10 studies, while 15 papers showed positive results in at least one sample. Overall, papers report several sampling devices and methods for SARS-CoV-2 detection, using different approaches for distance, height from the floor, flow rates, and sampled air volumes. Regarding the efficacy of each mechanism as measured by the percentage of investigations with positive samples, the literature review indicates that solid impactors are more effective than liquid impactors, or filters, and the combination of various methods may be recommended. As a final remark, determining the sampling method is not a trivial task, as the samplers and the environment influence the presence and viability of viruses in the samples, and thus a case-by-case assessment is required for the selection of sampling systems.
Published: 2021

40. Computing Precision and Recall with Missing or Uncertain Ground Truth

Author: Tao Sun, Bart Lamiroy, Querying Graphics through Analysis and Recognition (QGAR), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Computer Science & Engineering Department (CSE), Lehigh University [Bethlehem], Young-Bin Kwon and Jean-Marc Ogier, Lehigh University, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: Ground truth, Majority rule, Interpretation (logic), Recall, Computer science, recall, Probabilistic logic, Stability (learning theory), [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 02 engineering and technology, Coherence (statistics), computer.software_genre, performance evaluation, document image analysis, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, precision, 020201 artificial intelligence & image processing, Data mining, Precision and recall, ground truth, computer
Abstract: International audience; In this paper we present a way to use precision and recall measures in total absence of ground truth. We develop a probabilistic interpretation of both measures and show that, provided a sufficient number of data sources are available, it offers a viable performance measure to compare methods if no ground truth is available. This paper also shows the limitations of the approach, in case a systematic bias is present in all compared methods, but shows that it maintains a very high level of overall coherence and stability. It opens broader perspectives and can be extended to handling partial or unreliable ground truth, as well as levels of prior confidence in the methods it aims to compare.
Published: 2013

41. Comparing Business Processes to Determine the Feasibility of Configurable Models: A Case Study

Author: Vogelaar, J.J.C.L., Verbeek, H.M.W., Luka, B., Aalst, van der, W.M.P., Daniel, F., Barkaoui, K., Dustdar, S., and Process Science
Subjects: Process management (computing), Similarity (geometry), Process modeling, Standardization, Business process, Computer science, Metric (mathematics), Data mining, Process configuration, computer.software_genre, Industrial engineering, computer, Disadvantage
Abstract: Organizations are looking for ways to collaborate in the area of process management. Common practice until now is the (partial) standardization of processes. This has the main disadvantage that most organizations are forced to adapt their processes to adhere to the standard. In this paper we analyze and compare the actual processes of ten Dutch municipalities. Configurable process models provide a potential solution for the limitations of classical standardization processes as they contain all the behavior of individual models, while only needing one model. The question rises where the limits are though. It is obvious that one configurable model containing all models that exist is undesirable. But are company-wide configurable models feasible? And how about cross-organizational configurable models, should all partners be considered or just certain ones? In this paper we apply a similarity metric on individual models to determine means of answering questions in this area. This way we propose a new means of determining beforehand whether configurable models are feasible. Using the selected metric we can identify more desirable partners and processes before computing configurable process models.
Published: 2012

42. Application of Density Clustering Algorithm Based on Greedy Strategy in Hot Spot Mining of Taxi Passengers

Author: Jianglin Luo, Qingqing Wang, and Yiping Bao
Subjects: Density distribution, Computer science, Taxis, Hot spot (veterinary medicine), Noise (video), Data mining, Cluster analysis, computer.software_genre, computer
Abstract: In this paper, the greedy strategy is used to improve the density clustering algorithm, which can separate the noise points and deal with the uneven density distribution. In order to further improve the efficiency of density clustering algorithm based on greedy strategy, in this paper, it is applied to mining hot spots of taxi passengers. Firstly, large-scale data are processed, and large-scale data sets are sampled by reservoir, and effective hot data are obtained. Then, the data of 8,000 taxis in an urban area during December 4–8, 2018 are clustered to verify the validity of the proposed algorithm.
Published: 2020

43. Orion: A Generic Model and Tool for Data Mining

Author: Julien Soler, Cédric Buche, and Cindy Even
Subjects: Computer science, Control (management), InformationSystems_DATABASEMANAGEMENT, Behavior Trees, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, computer.software_genre, Range (mathematics), Unified Modeling Language, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, computer.programming_language
Abstract: This paper focuses on the design of autonomous behaviors based on humans behaviors observation. In this context, the contribution of the Orion model is to gather and to take advantage of two approaches: data mining techniques (to extract knowledge from the human) and behavior models (to control the autonomous behaviors). In this paper, the Orion model is described by UML diagrams. More than a model, Orion is an operational tool allowing to represent, transform, visualize and predict data; it also integrates operational standard behavioral models. Orion is illustrated to control a bot in the game Unreal Tournament. Thanks to Orion, we can collect data of low level behaviors through three scenarios performed by human players: movement, long range aiming and close combat. We can easily transform the data and use some data mining techniques to learn behaviors from human players observation. Orion allows us to build a complete behavior using an extension of a Behavior Tree integrating ad hoc features in order to manage aspects of behavior that we have not been able to learn automatically.
Published: 2020

44. Towards Improving the Representational Bias of Process Mining

Author: Aalst, van der, W.M.P., Buijs, J.C.A.M., Dongen, van, B.F., Aberer, K., Damiani, E., Dillon, T., and Process Science
Subjects: Soundness, Process (engineering), Computer science, business.industry, Event (computing), Process mining, Work in process, Machine learning, computer.software_genre, Business process discovery, Genetic algorithm, Information system, Data mining, Artificial intelligence, business, computer
Abstract: Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. Process discovery—discovering a process model from example behavior recorded in an event log—is one of the most challenging tasks in process mining. A variety of process discovery techniques have been proposed. Most techniques suffer from the problem that often the discovered model is internally inconsistent (i.e., the model has deadlocks, livelocks or other behavioral anomalies). This suggests that the search space should be limited to sound models. In this paper, we propose a tree representation that ensures soundness. We evaluate the impact of the search space reduction by implementing a simple genetic algorithm that discovers such process trees. Although the result can be translated to conventional languages, we ensure the internal consistency of the resulting model while mining, thus reducing the search space and allowing for more efficient algorithms.
Published: 2012

45. Efficient Mining Top-k Regular-Frequent Itemset Using Compressed Tidsets

Author: Komate Amphawan, Athasit Surarerks, Philippe Lenca, Département Logique des Usages, Sciences sociales et Sciences de l'Information ( LUSSI ), Université européenne de Bretagne ( UEB ) -Télécom Bretagne-Institut Mines-Télécom [Paris], Engineering Laboratory in Theoretical Enumerable System (University of Chulalongkorn) ( ELITE ), Lab-STICC_TB_CID_DECIDE, Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance ( Lab-STICC ), École Nationale d'Ingénieurs de Brest ( ENIB ) -Université de Bretagne Sud ( UBS ) -Université de Brest ( UBO ) -Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques ( IBNM ), Université de Brest ( UBO ) -Université européenne de Bretagne ( UEB ) -ENSTA Bretagne-Institut Mines-Télécom [Paris]-Centre National de la Recherche Scientifique ( CNRS ) -École Nationale d'Ingénieurs de Brest ( ENIB ) -Université de Bretagne Sud ( UBS ) -Université de Brest ( UBO ) -Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques ( IBNM ), Université de Brest ( UBO ) -Université européenne de Bretagne ( UEB ) -ENSTA Bretagne-Institut Mines-Télécom [Paris]-Centre National de la Recherche Scientifique ( CNRS ), Département Logique des Usages, Sciences sociales et Sciences de l'Information (LUSSI), Université européenne de Bretagne - European University of Brittany (UEB)-Télécom Bretagne-Institut Mines-Télécom [Paris] (IMT), Engineering Laboratory in Theoretical Enumerable System (University of Chulalongkorn) (ELITE), Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC), École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques (IBNM), Université de Brest (UBO)-Université européenne de Bretagne - European University of Brittany (UEB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Télécom Bretagne-Institut Brestois du Numérique et des Mathématiques (IBNM), Université de Brest (UBO)-Université européenne de Bretagne - European University of Brittany (UEB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Télécom Bretagne, Bibliothèque, Institut Mines-Télécom [Paris] (IMT)-Télécom Bretagne-Université européenne de Bretagne - European University of Brittany (UEB), Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (UMR 3192) (Lab-STICC), Université européenne de Bretagne - European University of Brittany (UEB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Institut Brestois du Numérique et des Mathématiques (IBNM), Université de Brest (UBO)-Télécom Bretagne-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), and Télécom Bretagne (devenu IMT Atlantique), Ex-Bibliothèque
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Single pass, Association rule learning, Computer science, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Regular itemsets, [INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS], 02 engineering and technology, Top-k itemsets, computer.software_genre, [ INFO.INFO-LG ] Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Set (abstract data type), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, [INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB], [ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI], Representation (mathematics), [ INFO.INFO-DS ] Computer Science [cs]/Data Structures and Algorithms [cs.DS], TRACE (psycholinguistics), [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Frequent itemsets, Task (computing), Regular pattern, Key (cryptography), 020201 artificial intelligence & image processing, Data mining, computer
Abstract: International audience; Association rule discovery based on support-confidence frame-work is an important task in data mining. However, the occurrence frequency (support) of a pattern (itemset) may not be a sufficient criterion for discovering interesting patterns. Temporal regularity, which can be a trace of behavior, with frequency behavior can be revealed as an important key in several applications. A pattern can be regarded as a regular pattern if it occurs regularly in a user-given period. In this paper, we consider the problem of mining top-k regular-frequent itemsets from transactional databases without support threshold. A new concise representation, called compressed transaction-ids set (compressed tidset), and a single pass algorithm, called TR-CT (Top-k Regular frequent itemset mining based on Compressed Tidsets), are proposed to maintain occurrence information of patterns and discover k regular itemsets with highest supports, respectively. Experimental results show that the use of the compressed tidset representation achieves highly efficiency in terms of execution time and memory consumption, especially on dense datasets.
Published: 2012

46. Robust Numeric Set Watermarking: Numbers Don’t Lie

Author: Mohan S. Kankanhalli, Josef Pieprzyk, and Gaurav Gupta
Subjects: Scheme (programming language), Theoretical computer science, Computer science, Data_MISCELLANEOUS, Watermark, High capacity, computer.software_genre, Data structure, Set (abstract data type), Distortion, False positive paradox, Data mining, computer, Digital watermarking, computer.programming_language
Abstract: Ever since Cox et. al published their paper, “A Secure, Robust Watermark for Multimedia” in 1996 [6], there has been tremendous progress in multimedia watermarking. The same pattern re-emerged with Agrawal and Kiernan publishing their work “Watermarking Relational Databases” in 2001 [1]. However, little attention has been given to primitive data collections with only a handful works of research known to the authors [11, 10]. This is primarily due to the absence of an attribute that differentiates marked items from unmarked item during insertion and detection process. This paper presents a distribution-independent, watermarking model that is secure against secondary-watermarking in addition to conventional attacks such as data addition, deletion and distortion. The low false positives and high capacity provide additional strength to the scheme. These claims are backed by experimental results provided in the paper.
Published: 2011

47. Clarification of the slope mass rating parameters assisted by SMRTool, an open-source software

Author: José Luis Pastor, Roberto Tomás, Miguel Cano, Adrián Riquelme, Universidad de Alicante. Departamento de Ingeniería Civil, and Ingeniería del Terreno y sus Estructuras (InTerEs)
Subjects: Computer science, 0211 other engineering and technologies, 02 engineering and technology, Classification of discontinuities, 010502 geochemistry & geophysics, computer.software_genre, 01 natural sciences, Stability (probability), Software, Geomechanics, Rock mechanics, Geomechanical classification, Slope mass rating, Representation (mathematics), Rock mass classification, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences, Matlab, business.industry, Orientation (computer vision), Geology, Geotechnical Engineering and Engineering Geology, Ingeniería del Terreno, Data mining, business, computer
Abstract: Geomechanics classifications are used to perform a preliminary assessment of rock slope stability for different purposes in civil and mining engineering. Among all existing rock mass classifications, slope mass rating (SMR) is one of the most commonly used for slopes. Although SMR is a geomechanics classification applied worldwide, often some misapprehensions and inaccuracies are made when professionally and scientifically used. Nearly all these miscalculations involve the influence of slope geometry and the dip and direction of the discontinuities. These problems can be overcome by a systematic assessment of SMR, which allows users to understand and visualize the relative orientation between discontinuities and slope. To fulfil this purpose, a complete and detailed definition of the angular relationships between discontinuities and slope are included in this paper, clarifying the assessment of the SMR parameters. Additionally, a Matlab-based open-source software for SMR (SMRTool) calculation is presented, avoiding miscalculations by automating the calculations and showing the graphical representation of slope and discontinuities. Finally, a general explanation of the method for the use of SMR is reviewed, stressing the common source of errors when applying this classification. The performance, benefits and usefulness of SMRTool are also illustrated in this paper through a specific case study. This work has been supported by the University of Alicante under the projects GRE14-04 and GRE17-11, the Spanish Ministry of Economy and Competitiveness (MINECO), the State Agency of Research (AEI) and the European Funds for Regional Development (FEDER) under projects TEC2017-85244-C2-1-P and TIN2014-55413-C2-2-P, and the Spanish Ministry of Education, Culture and Sport under project PRX17/00439 and CAS17/00392.
Published: 2019

48. Identifying Disease-Centric Subdomains in Very Large Medical Ontologies: A Case-Study on Breast Cancer Concepts in SNOMED CT. Or: Finding 2500 Out of 300.000

Author: Milian, K., Aleksovski, Z., Vdovjak, R., ten Teije, A.C.M., van Harmelen, F.A.H., Riano, D., Miksch, S., Peleg, M., Network Institute, Knowledge Representation and Reasoning, and Artificial intelligence
Subjects: SNOMED CT, Information retrieval, Computer science, Disease related concepts, Unified Medical Language System, Mapping medical terminologies, Ontology subsetting, Ontology (information science), Seed queries, computer.software_genre, Medical guideline, Set (abstract data type), SDG 3 - Good Health and Well-being, Complementarity (molecular biology), Identifying ontology subdomain, Upper ontology, Data mining, Medical guidelines, computer, Strengths and weaknesses
Abstract: Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.
Published: 2010

49. Activity Mining by Global Trace Segmentation

Author: Günther, C.W., Rozinat, A., Aalst, van der, W.M.P., Rinderle-Ma, S., Sadiq, S., Leymann, F., Information Systems IE&IS, and Process Science
Subjects: Process modeling, Event (computing), Computer science, Process (engineering), media_common.quotation_subject, Process mining, computer.software_genre, Segmentation, Quality (business), Data mining, computer, TRACE (psycholinguistics), media_common, Abstraction (linguistics)
Abstract: Process Mining is a technology for extracting non-trivial and useful information from execution logs. For example, there are many process mining techniques to automatically discover a process model describing the causal dependencies between activities . Unfortunately, the quality of a discovered process model strongly depends on the quality and suitability of the input data. For example, the logs of many real-life systems do not refer to the activities an analyst would have in mind, but are on a much more detailed level of abstraction. Trace segmentation attempts to group low-level events into clusters, which represent the execution of a higher-level activity in the (available or imagined) process meta-model. As a result, the simplified log can be used to discover better process models. This paper presents a new activity mining approach based on global trace segmentation. We also present an implementation of the approach, and we validate it using a real-life event log from ASML’s test process.
Published: 2010

50. Collecting, Analyzing, and Publishing Massive Data about the Hypertrophic Cardiomyopathy

Author: Lorenzo Montserrat, Miguel R. Luaces, Diego Seco, and Jose Antonio Cotelo-Lema
Subjects: Exploit, Computer science, business.industry, media_common.quotation_subject, Hypertrophic cardiomyopathy, Document management system, computer.software_genre, medicine.disease, Data science, Workflow, Knowledge base, Publishing, medicine, Quality (business), Data mining, Architecture, business, computer, media_common
Abstract: We present in this paper the architecture and some implementation details of a Document Management System and Workflow to help in the diagnosis of the hypertrophic cardiomyopathy, one of the most frequent genetic cardiovascular diseases. The system allows a gradual and collaborative creation of a knowledge base about the mutations associated with this disease. The system manages both the original documents of the scientific papers and the data extracted from these papers by the experts. Furthermore, a semiautomatic report generation module exploits this knowledge base to create high quality reports about the studied mutations.
Published: 2010

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

13,591 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources