95 results on '"Tomáš Skopal"'
Search Results
2. Modular framework for similarity-based dataset discovery using external knowledge
- Author
-
Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek, and Tomáš Skopal
- Subjects
Library and Information Sciences ,Information Systems - Abstract
PurposeSemantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.Design/methodology/approachIn this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.FindingsThe study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.Originality/valueTo the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.
- Published
- 2022
- Full Text
- View/download PDF
3. Person Authentication using Visual Representations of Keyboard Typing Dynamics
- Author
-
Ladislav Peška, Patrik Veselý, Tomáš Skopal, and Krisztian Buza
- Published
- 2022
- Full Text
- View/download PDF
4. Videolytics
- Author
-
Tomáš Skopal, Petr Pechman, Dominika Ďurišková, Marek Dobranský, and Vladislav Khachaturian
- Subjects
SQL ,Multimedia ,Computer science ,business.industry ,Deep learning ,Process (computing) ,Ranging ,computer.software_genre ,Visualization ,Public space ,Analytics ,Data analysis ,Artificial intelligence ,business ,computer ,computer.programming_language - Abstract
We present Videolytics, a web-based system for advanced analytics over recorded video streams. Video cameras have become widely used for indoor and outdoor surveillance. Covering even more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing. The goal is to obtain effective and efficient models to process the video data automatically and produce the desired features for data analytics. Videolytics combines the best of deep learning and hand-designed analytical models to create a solution applicable in real-life situations. The architecture of the Videolytics framework is centered around a database of video features and detected objects, where new higher-level objects result from fusion of (lower-level) objects and features already stored in the database. The system provides a number of visualization options, an SQL-based analytics module as well as a real-time surveillance mode.
- Published
- 2021
- Full Text
- View/download PDF
5. Similarity vs. Relevance: From Simple Searches to Complex Discovery
- Author
-
Tomáš Skopal, David Bernhauer, Jakub Klímek, Martin Nečaský, and Petr Škoda
- Subjects
Theoretical computer science ,Similarity (network science) ,Computer science ,Nearest neighbor search ,Content (measure theory) ,Joins ,Relevance (information retrieval) ,Function (mathematics) ,Object (computer science) ,Similitude - Abstract
Similarity queries play the crucial role in content-based retrieval. The similarity function itself is regarded as the function of relevance between a query object and objects from database; the most similar objects are understood as the most relevant. However, such an automatic adoption of similarity as relevance leads to limited applicability of similarity search in domains like entity discovery, where relevant objects are not supposed to be similar in the traditional meaning. In this paper, we propose the meta-model of data-transitive similarity operating on top of a particular similarity model and a database. This meta-model enables to treat directly non-similar objects \(\mathbf{x} \), \(\mathbf{y} \) as similar if there exists a chain of objects \(\mathbf{x} \), \(i_1\), ..., \(i_n\), \(\mathbf{y} \) having the neighboring members similar enough. Hence, this approach places the similarity in the role of relevance, where objects do not need to be directly similar but still remain relevant to each other (transitively similar). The data-transitive similarity concept allows to use standard similarity-search methods (queries, joins, rankings, analytics) in more complex tasks, like the entity discovery, where relevant results are often complementary or orthogonal to the query, rather than directly similar. Moreover, we show the data-transitive similarity is inherently self-explainable and non-metric. We discuss the approach in the domain of open dataset discovery.
- Published
- 2021
- Full Text
- View/download PDF
6. On Fusion of Learned and Designed Features for Video Data Analytics
- Author
-
Marek Dobranský and Tomáš Skopal
- Subjects
Computational complexity theory ,Computer science ,business.industry ,Process (engineering) ,Deep learning ,Feature extraction ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Public space ,0202 electrical engineering, electronic engineering, information engineering ,Data analysis ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Abstraction (linguistics) - Abstract
Video cameras have become widely used for indoor and outdoor surveillance. Covering more and more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing. However, with the increasing quantity of utilized cameras and recorded streams, manual video monitoring and analysis becomes too laborious. The goal is to obtain effective and efficient artificial intelligence models to process the video data automatically and produce the desired features for data analytics. To this end, we propose a framework for real-time video feature extraction that fuses both learned and hand-designed analytical models and is applicable in real-life situations. Nowadays, state-of-the-art models for various computer vision tasks are implemented by deep learning. However, the exhaustive gathering of labeled training data and the computational complexity of resulting models can often render them impractical. We need to consider the benefits and limitations of each technique and find the synergy between both deep learning and analytical models. Deep learning methods are more suited for simpler tasks on large volumes of dense data while analytical modeling can be sufficient for processing of sparse data with complex structures. Our framework follows those principles by taking advantage of multiple levels of abstraction. In a use case, we show how the framework can be set for an advanced video analysis of urban life.
- Published
- 2021
- Full Text
- View/download PDF
7. Evaluation Framework for Search Methods Focused on Dataset Findability in Open Data Catalogs
- Author
-
David Bernhauer, Tomáš Skopal, Jakub Klímek, Petr Škoda, and Martin Nečaský
- Subjects
0303 health sciences ,Matching (statistics) ,Ground truth ,Information retrieval ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Findability ,02 engineering and technology ,Domain (software engineering) ,Set (abstract data type) ,Metadata ,03 medical and health sciences ,Open data ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Query by Example ,computer ,030304 developmental biology ,computer.programming_language - Abstract
Many institutions publish datasets as Open Data in catalogs, however, their retrieval remains problematic issue due to the absence of dataset search benchmarking. We propose a framework for evaluating findability of datasets, regardless of retrieval models used. As task-agnostic labeling of datasets by ground truth turns out to be infeasible in the general domain of open data datasets, the proposed framework is based on evaluation of entire retrieval scenarios that mimic complex retrieval tasks. In addition to the framework we present a proof of concept specification and evaluation on several similarity-based retrieval models and several dataset discovery scenarios within a catalog, using our experimental evaluation tool. Instead of traditional matching of query with metadata of all the datasets, in similarity-based retrieval the query is formulated using a set of datasets (query by example) and the most similar datasets to the query set are retrieved from the catalog as a result.
- Published
- 2020
- Full Text
- View/download PDF
8. Visualizer of Dataset Similarity Using Knowledge Graph
- Author
-
Petr Škoda, Tomáš Skopal, and Jakub Matějík
- Subjects
Metadata ,Open data ,Information retrieval ,Similarity (network science) ,Knowledge graph ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Findability ,Context (language use) ,02 engineering and technology ,Similitude ,Visualization - Abstract
Many institutions choose to make their datasets available as Open Data. Open Data datasets are described by publisher-provided metadata and are registered in catalogs such as the European Data Portal. In spite of that, findability still remain a major issue. One of the main reasons is that metadata is captured in different contexts and with different background knowledge, so that keyword-based search provided by the catalogs is insufficient. A solution is to use an enriched querying that employs a dataset similarity model built on a shared context represented by a knowledge graph. However, the “black-box” dataset similarity may not fit well the user needs. If an explainable similarity model is used, then the issue can be tackled by providing users with a visualisation of the dataset similarity. This paper introduces a web-based tool for dataset similarity visualisation called ODIN (Open Dataset INspector). ODIN visualises knowledge graph-based dataset similarity, offering thus an explanation to the user. To understand the similarity, users can discover additional datasets that match their needs or reformulate the query to better reflect the knowledge graph. Last but not least, the user can analyze and/or design the similarity model itself.
- Published
- 2020
- Full Text
- View/download PDF
9. Analysing Indexability of Intrinsically High-Dimensional Data Using TriGen
- Author
-
David Bernhauer and Tomáš Skopal
- Subjects
Clustering high-dimensional data ,Reduction (complexity) ,Quadrilateral ,Distribution (mathematics) ,Computer science ,Nearest neighbor search ,Metric (mathematics) ,Search engine indexing ,Algorithm ,Physics::History of Physics ,Curse of dimensionality - Abstract
The TriGen algorithm is a general approach to transform distance spaces in order to provide both exact and approximate similarity search in metric and non-metric spaces. This paper focuses on the reduction of intrinsic dimensionality using TriGen. Besides the well-known intrinsic dimensionality based on distance distribution, we inspect properties of triangles used in metric indexing (the triangularity) as well as properties of quadrilaterals used in ptolemaic indexing (the ptolemaicity). We also show how LAESA with triangle and ptolemaic filtering behaves on several datasets with respect to the proposed indicators.
- Published
- 2020
- Full Text
- View/download PDF
10. Improving Findability of Open Data Beyond Data Catalogs
- Author
-
Jakub Klímek, Tomáš Skopal, and Martin Nečaský
- Subjects
Measure (data warehouse) ,Information retrieval ,Computer science ,05 social sciences ,Findability ,02 engineering and technology ,Metadata ,Open data ,Data model (ArcGIS) ,Context type ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Representation (mathematics) ,050104 developmental & child psychology - Abstract
There is a vast amount of datasets available as Open Data on the Web. However, it is challenging for consumers to find datasets relevant to their goals. This is because the available metadata in catalogs is not descriptive enough. Nevertheless, datasets exist in various types of contexts not expressed in the metadata. These may include information about the data publisher, the legislation related to dataset publication, etc. In this paper we describe an idea of a data model that enables consumers to better understand the data. We propose to define a formal model for representation of the datasets and their contexts, and we propose to apply existing similarity techniques, adjust them to fit each identified dataset context type and combine them together to measure similarity of datasets in new ways, improving their findability.
- Published
- 2019
- Full Text
- View/download PDF
11. Similarity Search and Applications : 15th International Conference, SISAP 2022, Bologna, Italy, October 5–7, 2022, Proceedings
- Author
-
Tomáš Skopal, Fabrizio Falchi, Jakub Lokoč, Maria Luisa Sapino, Ilaria Bartolini, Marco Patella, Tomáš Skopal, Fabrizio Falchi, Jakub Lokoč, Maria Luisa Sapino, Ilaria Bartolini, and Marco Patella
- Subjects
- Information storage and retrieval systems
- Abstract
This book constitutes the refereed proceedings of the 15th International Conference on Similarity Search and Applications, SISAP 2022, held in Bologna, Italy in October 2022.SISAP 2022 is an annual international conference for researchers focusing on similarity search challenges and related theoretical/practical problems, as well as the design of content-based similarity search applications. The 15 full papers presented together with 8 short and 2 doctoral symposium papers were carefully reviewed and selected from 34 submissions. They were organized in topical sections as follows: Applications; Foundations; Indexing and Clustering; Learning; Doctoral Symposium.
- Published
- 2022
12. MultiMedia Modeling : 27th International Conference, MMM 2021, Prague, Czech Republic, June 22–24, 2021, Proceedings, Part II
- Author
-
Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, Ioannis Patras, Jakub Lokoč, Tomáš Skopal, Klaus Schoeffmann, Vasileios Mezaris, Xirong Li, Stefanos Vrochidis, and Ioannis Patras
- Subjects
- Database management, Machine learning, Artificial intelligence, Image processing—Digital techniques, Computer vision, Application software, Computers, Special purpose
- Abstract
The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021. Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were accepted as well as 2 papers for a demo presentation and 17 papers for participation at the Video Browser Showdown 2021. The papers cover topics such as: multimedia indexing; multimedia mining; multimedia abstraction and summarization; multimedia annotation, tagging and recommendation; multimodal analysis for retrieval applications; semantic analysis of multimedia and contextual data; multimedia fusion methods; multimedia hyperlinking; media content browsing and retrieval tools; media representation and algorithms; audio, image, video processing, coding and compression; multimedia sensors and interaction modes; multimedia privacy, security and content protection; multimedia standards and related issues; advances in multimedia networking and streaming; multimedia databases, content delivery and transport; wireless and mobile multimedia networking; multi-camera and multi-view systems; augmented and virtual reality, virtual environments; real-time and interactive multimedia applications; mobile multimedia applications; multimedia web applications; multimedia authoring and personalization; interactive multimedia and interfaces; sensor networks; social and educational multimedia applications; and emerging trends.
- Published
- 2021
13. Structural XML Query Processing
- Author
-
Michal Krátký, Martin Svoboda, Tomáš Skopal, Sherif Sakr, Irena Holubová, Martin Nečaský, and Radim Baca
- Subjects
Document Structure Description ,XML Encryption ,Information retrieval ,General Computer Science ,Database ,Computer science ,Efficient XML Interchange ,XML Signature ,XML validation ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Theoretical Computer Science ,XML database ,XML Schema Editor ,020204 information systems ,Streaming XML ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,computer - Abstract
Since the boom in new proposals on techniques for efficient querying of XML data is now over and the research world has shifted its attention toward new types of data formats, we believe that it is crucial to review what has been done in the area to help users choose an appropriate strategy and scientists exploit the contributions in new areas of data processing. The aim of this work is to provide a comprehensive study of the state-of-the-art of approaches for the structural querying of XML data. In particular, we start with a description of labeling schemas to capture the structure of the data and the respective storage strategies. Then we deal with the key part of every XML query processing: a twig query join, XML query algebras, optimizations of query plans, and selectivity estimation of XML queries. To the best of our knowledge, this is the first work that provides such a detailed description of XML query processing techniques that are related to structural aspects and that contains information about their theoretical and practical features as well as about their mutual compatibility and general usability.
- Published
- 2017
- Full Text
- View/download PDF
14. Inferred Social Networks: A Case Study
- Author
-
Martin Svoboda, Petr Pascenko, Irena Holubová, David Bernhauer, and Tomáš Skopal
- Subjects
Social network ,business.industry ,Computer science ,Process (engineering) ,Nearest neighbor search ,Set (psychology) ,business ,Data science ,Financial sector ,Domain (software engineering) - Abstract
The behavior, environment, and characteristics of clients form a crucial source of information for various businesses. There exists a number of supervised as well as unsupervised data mining or other approaches that allow analyzing the respective data. In our ongoing project, focusing primarily on the financial sector, we suggest an innovative strategy that will overcome persisting shortcomings of the state-of-the-art methods using an analysis of a social network of clients. In addition, we do not assume the existence of such a network, but from a given set of client financial activities, we are able to infer a social network representing their relationships and behavior. Using real-world data and selected use cases from our domain, we show (a part of) the process of construction of an inferred social network, i.e., what kind of "hidden" information can, for example, be found and exploited.
- Published
- 2019
- Full Text
- View/download PDF
15. Recommender System as the Support for Binaural Audio
- Author
-
Tomáš Skopal and David Bernhauer
- Subjects
Computer science ,Human–computer interaction ,Collaborative filtering ,Virtual reality ,Recommender system ,User interface ,Binaural recording ,Task (project management) - Abstract
Virtual reality devices nowadays can effectively utilise other senses besides vision, too. The most often used secondary sense is hearing with binaural audio as VR engine. Currently, practical usage of binaural audio as the source of VR is impossible because of the inaccuracy of a general model. On the contrary, measuring the personalised parameters can be time-consuming. Our task was to prove the possibility of reconstruction of the binaural audio parameters in domestic conditions. We have focused on the design of the user interface that can be used independently on the platform. Our proposed browser-based application uses collaborative filtering as a recommender system. We have proven that sound-based navigation in axial plane is possible with 6.6° inaccuracy. The gamification and browser-based implementation make it easier for all people to find the best possible parameters. The resulting profile can be used both with fully VR environment and with semi-VR games.
- Published
- 2019
- Full Text
- View/download PDF
16. Advanced Behavioral Analyses Using Inferred Social Networks: A Vision
- Author
-
Ladislav Peska, Irena Holubová, Tomáš Skopal, David Bernhauer, and Martin Svoboda
- Subjects
050101 languages & linguistics ,Social network ,business.industry ,Computer science ,05 social sciences ,Perspective (graphical) ,02 engineering and technology ,Data science ,Order (exchange) ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,business ,Set (psychology) ,Reliability (statistics) - Abstract
The success of many businesses is based on a thorough knowledge of their clients. There exists a number of supervised as well as unsupervised data mining or other approaches that allow to analyze data about clients, their behavior or environment. In our ongoing project focusing primarily on bank clients, we propose an innovative strategy that will overcome shortcomings of the existing methods. From a given set of user activities, we infer their social network in order to analyze user relationships and behavior. For this purpose, not just the traditional direct facts are incorporated, but also relationships inferred using similarity measures and statistical approaches, with both possibly limited measures of reliability and validity in time. Such networks would enable analyses of client characteristics from a new perspective and could provide otherwise impossible insights. However, there are several research and technical challenges making the outlined pursuit novel, complex and challenging as we outline in this vision paper.
- Published
- 2019
- Full Text
- View/download PDF
17. Non-metric Similarity Search Using Genetic TriGen
- Author
-
Tomáš Skopal and David Bernhauer
- Subjects
Theoretical computer science ,Computer science ,Nearest neighbor search ,Search engine indexing ,Genetic variants ,02 engineering and technology ,Piecewise linear function ,Metric space ,Index (publishing) ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Non metric - Abstract
The metric space model is a popular and extensible model for indexing data for fast similarity search. However, there is often need for broader concepts of similarities (beyond the metric space model) while these cannot directly benefit from metric indexing. This paper focuses on approximate search in semi-metric spaces using a genetic variant of the TriGen algorithm. The original TriGen algorithm generates metric modifications of semi-metric distance functions, thus allowing metric indexes to index non-metric models. However, “analytic” modifications provided by TriGen are not stable in predicting the retrieval error. In our approach, the genetic variant of TriGen – the TriGenGA – uses genetically learned semi-metric modifiers (piecewise linear functions) that lead to better estimates of the retrieval error. Additionally, the TriGenGA modifiers result in better overall performance than original TriGen modifiers.
- Published
- 2019
- Full Text
- View/download PDF
18. Explainable Similarity of Datasets Using Knowledge Graph
- Author
-
Tomáš Skopal, Martin Nečaský, Petr Škoda, and Jakub Klímek
- Subjects
Metadata ,Data portal ,Open data ,Information retrieval ,Knowledge graph ,Computer science ,Graph (abstract data type) ,Similitude - Abstract
There is a large quantity of datasets available as Open Data on the Web. However, it is challenging for users to find datasets relevant to their needs, even though the datasets are registered in catalogs such as the European Data Portal. This is because the available metadata such as keywords or textual description is not descriptive enough. At the same time, datasets exist in various types of contexts not expressed in the metadata. These may include information about the dataset publisher, the legislation related to dataset publication, language and cultural specifics, etc. In this paper we introduce a similarity model for matching datasets. The model assumes an ontology/knowledge graph, such as Wikidata.org, that serves as a graph-based context to which individual datasets are mapped based on their metadata. A similarity of the datasets is then computed as an aggregation over paths among nodes in the graph. The proposed similarity aims at addressing the problem of explainability of similarity, i.e., providing the user a structured explanation of the match which, in a broader sense, is nowadays a hot topic in the field of artificial intelligence.
- Published
- 2019
- Full Text
- View/download PDF
19. Scalable 3D shape retrieval using local features and the signature quadratic form distance
- Author
-
Benjamin Bustos, Ivan Sipiran, Tomáš Skopal, and Jakub Lokoč
- Subjects
Matching (graph theory) ,business.industry ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Set (abstract data type) ,Discriminative model ,Robustness (computer science) ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Representation (mathematics) ,business ,Software ,Blossom algorithm ,Mathematics - Abstract
We present a scalable and unsupervised approach for content-based retrieval on 3D model collections. Our goal is to represent a 3D shape as a set of discriminative local features, which is important to maintain robustness against deformations such as non-rigid transformations and partial data. However, this representation brings up the problem on how to compare two 3D models represented by feature sets. For solving this problem, we apply the signature quadratic form distance (SQFD), which is suitable for comparing feature sets. Using SQFD, the matching between two 3D objects involves only their representations, so it is easy to add new models to the collection. A key characteristic of the feature signatures, required by the SQFD, is that the final object representation can be easily obtained in a unsupervised manner. Additionally, as the SQFD is an expensive distance function, to make the system scalable we present a novel technique to reduce the amount of features by detecting clusters of key points on a 3D model. Thus, with smaller feature sets, the distance calculation is more efficient. Our experiments on a large-scale dataset show that our proposed matching algorithm not only performs efficiently, but also its effectiveness is better than state-of-the-art matching algorithms for 3D models.
- Published
- 2016
- Full Text
- View/download PDF
20. Interactive Product Search Based on Global and Local Visual-Semantic Features
- Author
-
Tomáš Grošup, Tomáš Skopal, and Ladislav Peska
- Subjects
Focus (computing) ,Information retrieval ,Computer science ,Process (engineering) ,Interface (Java) ,business.industry ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Metadata ,Search engine ,Product (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Web application ,020201 artificial intelligence & image processing ,Relevance (information retrieval) ,business ,0105 earth and related environmental sciences - Abstract
In this paper, we present a prototype web application of a product search engine of a fashion e-shop. Today, e-shop product metadata consist of text description, simple attributes (price, size, color, fabric, etc.) and visual information (product photo). Search engines used in e-shops mostly provide text and attribute/category interface for product filtering. In our model, we focus on the visual information applied in an interactive query-by-example scenario. The global visual descriptors may be often ambiguous and may not correspond well with the intended mental query of the user. Therefore, we proposed and evaluated model and GUI allowing user to guide the query process by selecting image regions (patches) of interest within the query. In the demo evaluation, we show that allowing user to specify relevant image patches led to a significant improvement of the results’ relevance in the vast majority of tested queries.
- Published
- 2018
- Full Text
- View/download PDF
21. Advanced Analytics of Large Connected Data Based on Similarity Modeling
- Author
-
Jan Hučín, Irena Holubová, Petr Pascenko, Tomáš Skopal, and Ladislav Peska
- Subjects
business.industry ,Computer science ,Big data ,02 engineering and technology ,Linked data ,Data science ,Data type ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Applied research ,business ,Transaction data ,Risk management - Abstract
Collecting various types of data about users/clients in order to improve the services and competitiveness of companies has a long history. However, these approaches are often based on classical statistical methods and an assumption of limited computational power. In this paper we introduce the vision of our applied research project targeting to the financial sector. Our main goal is to develop an automated software solution for similarity modeling over big and semi-structured graph data representing behavior of bank clients. The main aim of similarity models is to improve the decision process in risk management, marketing, security and related areas.
- Published
- 2018
- Full Text
- View/download PDF
22. Efficient extraction of clustering-based feature signatures using GPU architectures
- Author
-
Tomáš Skopal, Martin Kruliš, and Jakub Lokoăź
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Nearest neighbor search ,Multimedia database ,Feature extraction ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Similarity (network science) ,Hardware and Architecture ,Feature (computer vision) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,Cluster analysis ,business ,Massively parallel ,computer ,Software - Abstract
Similarity search and content-based retrieval have become widely used in multimedia database systems that often manage huge data collections. Unfortunately, many effective content-based similarity models cannot be fully utilized for larger datasets, as they are computationally demanding and require massive parallel processing for both feature extraction and query evaluation tasks. In this work, we address the performance issues of effective similarity models based on feature signatures, where we focus on fast feature extraction from image thumbnails using affordable hardware. More specifically, we propose a multi-GPU implementation that increases the extraction speed by two orders of magnitude with respect to a single-threaded CPU implementation. Since the extraction algorithm is not directly parallelizable, we propose a modification of the algorithm embracing the SIMT execution model. We have experimentally verified that our GPU extractor can be successfully used to index large image datasets comprising millions of images. In order to obtain optimal extraction parameters, we employed the GPU extractor in an extensive empirical investigation of the parameter space. The experimental results are discussed from the perspectives of both performance and similarity precision.
- Published
- 2015
- Full Text
- View/download PDF
23. Product Exploration based on Latent Visual Attributes
- Author
-
Gregor Kovalčík, Ladislav Peska, Jakub Lokoč, Tomáš Grošup, and Tomáš Skopal
- Subjects
Information retrieval ,business.industry ,Computer science ,02 engineering and technology ,Convolutional neural network ,Fuzzy logic ,Search engine ,020204 information systems ,Schema (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Web application ,020201 artificial intelligence & image processing ,Product topology ,business - Abstract
In this demo paper, we present a prototype web application of a product search engine of a fashion e-shop. Although e-shop products consist of full-text description, relational attributes (e.g., price, type, size, color, etc.) as well as visual information (product photo), traditional search engines in e-shops only provide full-text and relational attributes for product filtering. In our retrieval model, we incorporate also the visual information into the search by extracting visual-semantic features using deep convolutional neural networks. Furthermore, visual exploration of the product space using the visual-semantic features (multi-example queries) is used to dynamically discover latent visual attributes that could enhance the original relational schema by fuzzy attributes (e.g., a floral pattern in product). In the demo, we show how these latent attributes could be used to recommend the user preferred products and even outfits (e.g., shoes, bag, jacket) that fit a certain visual style.
- Published
- 2017
- Full Text
- View/download PDF
24. Analyzing Mathematical Content to Detect Academic Plagiarism
- Author
-
Norman Meuschke, Felix Hamborg, Moritz Schubotz, Tomáš Skopal, and Bela Gipp
- Subjects
Source code ,Information retrieval ,Computer science ,media_common.quotation_subject ,05 social sciences ,Rank (computer programming) ,Feature selection ,02 engineering and technology ,Data science ,Pipeline (software) ,Test case ,Feature (computer vision) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Plagiarism detection ,ddc:004 ,0509 other social sciences ,050904 information & library sciences ,media_common - Abstract
This paper presents, to our knowledge, the first study on analyzing mathematical expressions to detect academic plagiarism. We make the following contributions. First, we investigate confirmed cases of plagiarism to categorize the similarities of mathematical content commonly found in plagiarized publications. From this investigation, we derive possible feature selection and feature comparison strategies for developing math-based detection approaches and a ground truth for our experiments. Second, we create a test collection by embedding confirmed cases of plagiarism into the NTCIR-11 MathIR Task dataset, which contains approx. 60 million mathematical expressions in 105,120 documents from arXiv.org. Third, we develop a first math-based detection approach by implementing and evaluating different feature comparison approaches using an open source parallel data processing pipeline built using the Apache Flink framework. The best performing approach identifies all but two of our real-world test cases at the top rank and achieves a mean reciprocal rank of 0.86. The results show that mathematical expressions are promising text-independent features to identify academic plagiarism in large collections. To facilitate future research on math-based plagiarism detection, we make our source code and data available. published
- Published
- 2017
- Full Text
- View/download PDF
25. Malware Discovery Using Behaviour-Based Exploration of Network Traffic
- Author
-
Tomáš Grošup, Tomáš Skopal, Tomáš Pevný, Jakub Lokoč, and Přemysl Čech
- Subjects
Focus (computing) ,business.industry ,Computer science ,Fingerprint (computing) ,Code word ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Domain (software engineering) ,Annotation ,Identification (information) ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Malware ,020201 artificial intelligence & image processing ,Data mining ,business ,computer ,Computer network - Abstract
We present a demo of behaviour-based similarity retrieval in network traffic data. The underlying framework is intended to support domain experts searching for network nodes (computers) infected by malicious software, especially in cases when single client-server communication does not have to be sufficient to reliably identify the infection. The focus is on interactive browsing enabling dynamic changes of the retrieval model, which is based on a recently proposed statistical description (fingerprint) of a communication between two network hosts and the bag of features approach. The demo/framework provides unique insight into the data and enables annotation of the data and model modifications during the search for more effective identification of infected hosts.
- Published
- 2017
- Full Text
- View/download PDF
26. On indexing metric spaces using cut-regions
- Author
-
Přemysl Čech, Juraj Moško, Jakub Lokoč, and Tomáš Skopal
- Subjects
Metric space ,Hardware and Architecture ,Computer science ,Nearest neighbor search ,Search engine indexing ,Structural index ,Metric tree ,Equivalence of metrics ,Data mining ,computer.software_genre ,computer ,Software ,Information Systems - Abstract
After two decades of research, the techniques for efficient similarity search in metric spaces have combined virtually all the available tricks resulting in many structural index designs. As the representative state-of-the-art metric access methods (also called metric indexes) that vary in the usage of filtering rules and in structural designs, we could mention the M-tree, the M-Index and the List of Clusters, to name a few. In this paper, we present the concept of cut-regions that could heavily improve the performance of metric indexes that were originally designed to employ simple ball-regions. We show that the shape of cut-regions is far more compact than that of ball-regions, yet preserving simple and concise representation. We present three re-designed metric indexes originating from the above-mentioned ones but utilizing cut-regions instead of ball-regions. We show that cut-regions can be fully utilized in the index structure, positively affecting not only query processing but also the index construction. In the experiments we show that the re-designed metric indexes significantly outperform their original versions. HighlightsThe new cut-region formalism that is suitable for simplified description of compact metric regions.New cheap dynamic construction techniques for the PM-tree that can compete with expensive strategies of the original PM-tree (e.g., multi-way leaf selection).Adaptation of M-Index and List of Clusters to operate with cut-regions.Thorough experimental evaluation also including comparison with the state-of-the-art MAMs.
- Published
- 2014
- Full Text
- View/download PDF
27. Ptolemaic access methods: Challenging the reign of the metric space model
- Author
-
Tomáš Skopal, Magnus Lie Hetland, Jakub Lokoč, and Christian Beecks
- Subjects
Theoretical computer science ,Triangle inequality ,Computer science ,business.industry ,Nearest neighbor search ,Search engine indexing ,Similarity measure ,Machine learning ,computer.software_genre ,Database index ,Full table scan ,Metric space ,Hardware and Architecture ,Metric (mathematics) ,Artificial intelligence ,business ,computer ,Software ,Information Systems - Abstract
Metric indexing is the state of the art in general distance-based retrieval. Relying on the triangular inequality, metric indexes achieve significant online speed-up beyond a linear scan. Recently, the idea of Ptolemaic indexing was introduced, which substitutes Ptolemy's inequality for the triangular one, potentially yielding higher efficiency for the distances where it applies. In this paper we have adapted several metric indexes to support Ptolemaic indexing, thus establishing a class of Ptolemaic access methods (PtoAM). In particular, we include Ptolemaic Pivot tables, Ptolemaic PM-Trees and the Ptolemaic M-Index. We also show that the most important and promising family of distances suitable for Ptolemaic indexing is the signature quadratic form distance , an adaptive similarity measure which can cope with flexible content representations of multimedia data, among other things. While this distance has shown remarkable qualities regarding the search effectiveness, its high computational complexity underscores the need for efficient search methods. We show that these distances are Ptolemaic metrics and present a study where we apply Ptolemaic indexing methods on real-world image databases, resolving exact queries nearly four times as fast as the state-of-the-art metric solution, and up to three orders of magnitude times as fast as sequential scan.
- Published
- 2013
- Full Text
- View/download PDF
28. Towards efficient indexing of arbitrary similarity
- Author
-
Tomáš Bartoš, Tomáš Skopal, and Juraj Moško
- Subjects
Set (abstract data type) ,Metric space ,Information retrieval ,Theoretical computer science ,Computer science ,Nearest neighbor search ,Similarity (psychology) ,Search engine indexing ,Unstructured data ,Genetic programming ,Software ,Similitude ,Information Systems - Abstract
The popularity of similarity search expanded with the increased interest in multimedia databases, bioinformatics, or social networks, and with the growing number of users trying to find information in huge collections of unstructured data. During the exploration, the users handle database objects in different ways based on the utilized similarity models, ranging from simple to complex models. Efficient indexing techniques for similarity search are required especially for growing databases. In this paper, we study implementation possibilities of the recently announced theoretical framework SIMDEX, the task of which is to algorithmically explore a given similarity space and find possibilities for efficient indexing. Instead of a fixed set of indexing properties, such as metric space axioms, SIMDEX aims to seek for alternative properties that are valid in a particular similarity model (database) and, at the same time, provide efficient indexing. In particular, we propose to implement the fundamental parts of SIMDEX by means of the genetic programming (GP) which we expect will provide highquality resulting set of expressions (axioms) useful for indexing.
- Published
- 2013
- Full Text
- View/download PDF
29. k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach
- Author
-
Tomáš Pevný, Jan Kohout, Jakub Lokoăź, Tomáš Skopal, and Přemysl Čech
- Subjects
business.industry ,Computer science ,Nearest neighbor search ,Pattern recognition ,02 engineering and technology ,State (functional analysis) ,Intrusion detection system ,computer.software_genre ,Metric space ,ComputingMethodologies_PATTERNRECOGNITION ,020204 information systems ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Malware ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,False positive rate ,business ,Focus (optics) ,computer - Abstract
In this paper, we present detection of malware in HTTPS traffic using k-NN classification. We focus on the metric space approach for approximate k-NN searches over dataset of sparse high-dimensional descriptors of network traffic. We show the classification based on approximate k-NN search using metric index exhibits false positive rate reduced by an order of magnitude when compared to the state of the art method, while keeping the classification fast enough.
- Published
- 2016
- Full Text
- View/download PDF
30. Multi-sketch Semantic Video Browser
- Author
-
Tomáš Skopal, David Kuboň, Adam Blažek, and Jakub Lokoč
- Subjects
Information retrieval ,Semantic similarity ,Simple (abstract algebra) ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Semantic technology ,020207 software engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Semantic Web Stack ,Sketch - Abstract
This paper presents a tool for interactive filtering and browsing of up to hundreds of hours of video content. In particular, we address the known-item search, i.e., searching for a short video clip known visually or by textual description. Video content is filtered with simple user-defined sketches of the searched scenes consisting of its distinct color regions and significant edges. Furthermore, the filtered content might be browsed with the query-by-example paradigm utilizing either visual or semantic similarity.
- Published
- 2016
- Full Text
- View/download PDF
31. Combining CPU and GPU architectures for fast similarity search
- Author
-
Tomáš Skopal, Martin Kruliš, Jakub Lokoč, and Christian Beecks
- Subjects
Information Systems and Management ,Speedup ,Computer science ,Nearest neighbor search ,Search engine indexing ,Parallel computing ,Data structure ,Similitude ,Database index ,Hardware and Architecture ,Metric (mathematics) ,Central processing unit ,Software ,Information Systems - Abstract
The Signature Quadratic Form Distance on feature signatures represents a flexible distance-based similarity model for effective content-based multimedia retrieval. Although metric indexing approaches are able to speed up query processing by two orders of magnitude, their applicability to large-scale multimedia databases containing billions of images is still a challenging issue. In this paper, we propose a parallel approach that balances the utilization of CPU and many-core GPUs for efficient similarity search with the Signature Quadratic Form Distance. In particular, we show how to process multiple distance computations and other parts of the search procedure in parallel, achieving maximal performance of the combined CPU/GPU system. The experimental evaluation demonstrates that our approach implemented on a common workstation with 2 GPU cards outperforms traditional parallel implementation on a high-end 48-core NUMA server in terms of efficiency almost by an order of magnitude. If we consider also the price of the high-end server that is ten times higher than that of the GPU workstation then, based on price/performance ratio, the GPU-based similarity search beats the CPU-based solution by almost two orders of magnitude. Although proposed for the SQFD, our approach of fast GPU-based similarity search is applicable for any distance function that is efficiently parallelizable in the SIMT execution model.
- Published
- 2012
- Full Text
- View/download PDF
32. Non-metric similarity search of tandem mass spectra including posttranslational modifications
- Author
-
Jiří Novák, Jakub Lokoč, Tomáš Skopal, and David Hoksza
- Subjects
Physics ,Tandem mass spectrometry ,Peptide identification ,Similarity search ,Nearest neighbor search ,Search engine indexing ,Parameterized complexity ,Bioinformatics ,Mass spectrometry ,Theoretical Computer Science ,Hausdorff distance ,Posttranslational modifications ,Computational Theory and Mathematics ,Similarity (network science) ,Metric (mathematics) ,Mass spectrum ,Discrete Mathematics and Combinatorics ,Metric access methods ,Algorithm - Abstract
In biological applications, the tandem mass spectrometry is a widely used method for determining protein and peptide sequences from an ''in vitro'' sample. The sequences are not determined directly, but they must be interpreted from the mass spectra, which is the output of the mass spectrometer. This work is focused on a similarity-search approach to mass spectra interpretation, where the parameterized Hausdorff distance (d"H"P) is used as the similarity. In order to provide an efficient similarity search under d"H"P, the metric access methods and the TriGen algorithm (controlling the metricity of d"H"P) are employed. Moreover, the search model based on the d"H"P supports posttranslational modifications (PTMs) in the query mass spectra, what is typically a problem when an indexing approach is used. Our approach can be utilized as a coarse filter by any other database approach for mass spectra interpretation.
- Published
- 2012
- Full Text
- View/download PDF
33. Beyond the metric space model
- Author
-
Tomáš Skopal and Benjamin Bustos
- Subjects
Metric space ,Information retrieval ,Semantic similarity ,Similarity (network science) ,Nearest neighbor search ,Metric (mathematics) ,Normalized compression distance ,General Medicine ,Multimedia information retrieval ,Data structure ,Mathematics - Abstract
The metric space model has represented a reasonable trade-off concerning the efficiency and effectiveness problem in similarity search. However, complex similarity models that do not satisfy the metric properties have been used in a wide variety of research domains like multimedia information retrieval, digital libraries, biological and chemical databases, time series analysis, and biometry [2]. All these domains require the management of very large data collections, but the algorithms and data structures for searching in metric spaces cannot be used directly, as they require to use nonmetric similarity measures. As the term nonmetric simply means that a similarity function does not satisfy some (or all) the properties of a metric, we restrict its definition to nonmetric similarity functions that are "context-free and static", that is, the similarity between two objects is constant regardless of the context (time, user, query, other objects in the collection, etc.).
- Published
- 2010
- Full Text
- View/download PDF
34. New dynamic construction techniques for M-tree
- Author
-
Tomáš Skopal and Jakub Lokoč
- Subjects
M-tree ,Theoretical computer science ,Logarithm ,Selection (relational algebra) ,Computer science ,Forced reinsertions ,Search engine indexing ,Access method ,Dynamic insertion ,Object (computer science) ,Theoretical Computer Science ,Index (publishing) ,Computational Theory and Mathematics ,Metric (mathematics) ,Discrete Mathematics and Combinatorics ,Metric access methods ,Algorithm - Abstract
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.
- Published
- 2009
- Full Text
- View/download PDF
35. What are the salient keyframes in short casual videos? an extensive user study using a new video dataset
- Author
-
Tomáš Skopal, Klaus Schoeffmann, J. Lansky, Jakub Lokoč, M. Del Fabro, Manfred Jürgen Primus, and Bernd Münzer
- Subjects
Ground truth ,Information retrieval ,Casual ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Automatic summarization ,Annotation ,Salient ,Histogram ,Selection (linguistics) ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Understanding the saliency of keyframes in short casual/home-made videos containing redundant information is an important step towards the design of successful keyframe selection and summarization techniques for such videos. Therefore, we present an extensive user study focusing on saliency of keyframes in such short redundant videos. In our study, more than 200 users annotated 32 videos, altogether selecting more than 20.000 keyframes. We present the description of the user study, the utilized annotation tool and we discuss the results. We provide also a preliminary comparison of several popular keyframe selection techniques using the ground truth derived from the annotations.
- Published
- 2015
- Full Text
- View/download PDF
36. Enhanced Signature-Based Video Browser
- Author
-
Tomáš Skopal, Filip Matzner, Adam Blažek, and Jakub Lokoč
- Subjects
Information retrieval ,Homogeneous ,Feature (computer vision) ,Computer science ,Search model ,Signature (logic) ,Linear search ,Video retrieval - Abstract
The success of our Signature-Based Video Browser presented last year at Video Browser Showdown 2014 (now renamed to Video Search Showcase) was mainly based on effective filtering using position-color feature signatures, while browsing in the results comprising matched keyframes was based just on a simple sequential search approach. Since the results can consist of highly similar keyframes (e.g., news studio scenes) making the browsing more difficult, we have enhanced our tool with more advanced browsing techniques considering also homogeneous result sets obtained after filtering phase. Furthermore, we have utilized improved search models based on feature signatures to make the filtering phase more effective.
- Published
- 2015
- Full Text
- View/download PDF
37. MLES: Multilayer Exploration Structure for Multimedia Exploration
- Author
-
Juraj Moško, Přemysl Čech, Jan Lánský, Jakub Lokoč, Tomáš Grošup, and Tomáš Skopal
- Subjects
Structure (mathematical logic) ,Information retrieval ,Multimedia ,Horizontal and vertical ,Computer science ,Nearest neighbor search ,Multimedia database ,Space (commercial competition) ,Object (computer science) ,computer.software_genre ,computer ,Similarity query ,Content based retrieval - Abstract
The traditional content-based retrieval approaches usually use flat querying, where whole multimedia database is searched for a result of some similarity query with a user specified query object. However, there are retrieval scenarios (e.g., multimedia exploration), where users may not have a clear search intents in their minds, they just want to inspect a content of the multimedia collection. In such scenarios, flat querying is not suitable for the first phases of browsing, because it retrieves the most similar objects and does not consider a view on part of a multimedia space from different perspectives. Therefore, we defined a new Multilayer Exploration Structure (MLES), that enables exploration of a multimedia collection in different levels of details. Using the MLES, we formally defined popular exploration operations (zoom-in/out, pan) to enable horizontal and vertical browsing in explored space and we discussed several problems related to the area of multimedia exploration.
- Published
- 2015
- Full Text
- View/download PDF
38. Evaluating Multilayer Multimedia Exploration
- Author
-
Tomáš Skopal, Jakub Lokoăź, Přemysl Čech, Juraj Moško, Tomáš Grošup, and Jan Lánský
- Subjects
User studies ,Structure (mathematical logic) ,Lead (geology) ,Multimedia ,Process (engineering) ,Computer science ,Nearest neighbor search ,Object (computer science) ,computer.software_genre ,computer ,Content based retrieval - Abstract
Multimedia exploration is an entertaining approach for multimedia retrieval enabling users to interactively browse and navigate through multimedia collections in a content-based way. The multimedia exploration approach extends the traditional query-by-example retrieval scenario to be a more intuitive approach for obtaining a global overview over an explored collection. However, novel exploration scenarios require many user studies demonstrating their benefits. In this paper, we present results of an extensive user study focusing on the comparison of 3-layer Multilayer Exploration Structure MLES structure with standard flat k-NN browsing. The results of the user study show that principles of the MLES lead to better effectiveness of the exploration process, especially when searching for a first object of the searched concept in an unknown collection.
- Published
- 2015
- Full Text
- View/download PDF
39. A Web Portal for Effective Multi-model Exploration
- Author
-
Tomáš Grošup, Tomáš Skopal, Přemysl Čech, and Jakub Lokoč
- Subjects
Information retrieval ,Recall ,Similarity (network science) ,Feature (computer vision) ,Computer science ,Nearest neighbor search - Abstract
During last decades, there have emerged various similarity models suitable for specific similarity search tasks. In this paper, we present a web-based portal that combines two popular similarity models (based on feature signatures and SURF descriptors) in order to improve the recall of multimedia exploration. Comparing to single-model approach, we demonstrate in the game-like fashion that a multi-model approach could provide users with more diverse and still relevant results.
- Published
- 2015
- Full Text
- View/download PDF
40. A new range query algorithm for Universal B-trees
- Author
-
Vaclav Snasel, Tomáš Skopal, Michal Krátký, and Jaroslav Pokorný
- Subjects
Structure (mathematical logic) ,Theoretical computer science ,Relation (database) ,Range query (data structures) ,Computer science ,Window (computing) ,Query optimization ,computer.software_genre ,Hardware and Architecture ,Simple (abstract algebra) ,Data mining ,computer ,Algorithm ,Computer Science::Databases ,Software ,Information Systems - Abstract
In multi-dimensional databases the essential tool for accessing data is the range query (or window query). In this paper we introduce a new algorithm of processing range query in universal B-tree (UB-tree), which is an index structure for searching in multi-dimensional databases. The new range query algorithm (called the DRU algorithm) works efficiently, even for processing high-dimensional databases. In particular, using the DRU algorithm many of the UB-tree inner nodes need not to be accessed. We explain the DRU algorithm using a simple geometric model, providing a clear insight into the problem. More specifically, the model exploits an interesting relation between the Z-curve and generalized quad-trees. We also present experimental results for the DRU algorithm implementation.
- Published
- 2006
- Full Text
- View/download PDF
41. Towards efficient multimedia exploration using the metric space approach
- Author
-
Tomáš Grošup, Tomáš Skopal, Premysl Cech, and Jakub Lokoč
- Subjects
Class (computer programming) ,Metric space ,Information retrieval ,Multimedia ,Computer science ,Search engine indexing ,computer.software_genre ,computer - Abstract
In this paper, we investigate the content-based multimedia exploration techniques benefiting from the metric space indexing approach. We present two orthogonal approaches for browsing multimedia collections and discuss their strong and weak points. We also provide an implementation of the two approaches in our publicly available demo application where users can try to find as much objects of a predefined class as possible, given a limited time and/or a number of clicks.
- Published
- 2014
- Full Text
- View/download PDF
42. On Effective Known Item Video Search Using Feature Signatures
- Author
-
Adam Blažek, Jakub Lokoč, and Tomáš Skopal
- Subjects
Information retrieval ,Visual perception ,business.industry ,Computer science ,Feature (computer vision) ,Video tracking ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer vision ,Artificial intelligence ,business ,Representation (mathematics) ,Video retrieval - Abstract
In this demo paper, we present a video retrieval and browsing tool inspired by the natural human ability to memorize visual stimuli of color regions in video frames. Our tool utilizes feature signatures that can be used to represent both significant color regions in the key-frames and simple query sketches. As recently shown at the video browser showdown, such simple representation enables both effective end efficient interactive retrieval and browsing in video.
- Published
- 2014
- Full Text
- View/download PDF
43. Signature-Based Video Browser
- Author
-
Adam Blažek, Tomáš Skopal, and Jakub Lokoă
- Subjects
Visual perception ,Computer science ,business.industry ,Nearest neighbor search ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Signature (logic) ,Feature (computer vision) ,Key (cryptography) ,Preprocessor ,Computer vision ,Artificial intelligence ,business ,Representation (mathematics) - Abstract
In this paper, we present a new signature-based video browser tool relying on the natural human ability to perceive and memorize visual stimuli of color regions in video frames. The tool utilizes feature signatures based on color and position extracted from the key frames in the preprocessing phase. Such content representation facilitates users in drawing simple query sketches and enables also effective and efficient processing of the query sketches. Besides user drawn simple sketches of desired scenes, the tool supports also several additional automatic content-based analysis techniques enabling restrictions to various concepts like faces or shapes.
- Published
- 2014
- Full Text
- View/download PDF
44. Video Retrieval with Feature Signature Sketches
- Author
-
Tomáš Skopal, Adam Blažek, and Jakub Lokoč
- Subjects
Colored ,Range query (data structures) ,Feature (computer vision) ,business.industry ,Computer science ,Nearest neighbor search ,Search engine indexing ,Pattern recognition ,Artificial intelligence ,Pruning (decision trees) ,business ,Signature (logic) ,Vector space - Abstract
In this paper, we present an effective yet efficient approach for known-item search in video data. The approach employs feature signatures based on color distribution to represent video key-frames. At the same time, the feature signatures enable users to intuitively draw simple colored sketches of the desired scene. We describe in detail the video retrieval model and also discuss and carefully optimize its parameters. Furthermore, several indexing techniques suitable for the model are presented and their performance is empirically evaluated in the experiments. Apart from that, we also investigate a bounding-sphere pruning technique suitable for similarity search in vector spaces.
- Published
- 2014
- Full Text
- View/download PDF
45. Real-Time Exploration of Multimedia Collections
- Author
-
Juraj Moško, Tomáš Skopal, Jakub Lokoč, and Tomáš Bartoš
- Subjects
Multimedia ,Computer science ,Process (engineering) ,Nearest neighbor search ,Scalability ,Similarity (psychology) ,State (computer science) ,User interface ,computer.software_genre ,computer - Abstract
With the huge expansion of smart devices and mobile applications, the ordinary users are consistently changing the conventional similarity search model. The users want to explore the multimedia data, so the typical query-by-example principle and the well-known keyword searching have become just a part of more complex retrieval processes. The emerging multimedia exploration systems with robust back-end retrieval system based on state of the art similarity search techniques provide a good solution. They enable interactive exploration process and implement exploration queries tightly connected with the user interface. However, they do not consider larger response times that might occur. To overcome this, we propose a scalable exploration system RTExp that allows evaluating the similarity queries in the near real time depending on user preferences (speed / precision). We describe building parts of the system and discuss various real-time characteristics for the exploration process. Also we provide results from the experimental evaluation of time-limited similarity queries and corresponding exploration operations.
- Published
- 2014
- Full Text
- View/download PDF
46. Efficient indexing of similarity models with inequality symbolic regression
- Author
-
Juraj Moško, Tomáš Bartoš, and Tomáš Skopal
- Subjects
Metadata ,Information retrieval ,Similarity (network science) ,Nearest neighbor search ,Search engine indexing ,Genetic programming ,Data mining ,Symbolic regression ,computer.software_genre ,computer ,Domain (software engineering) ,Mathematics ,Database index - Abstract
The increasing amount of available unstructured content introduced a new concept of searching for information - the content-based retrieval. The principle behind is that the objects are compared based on their content which is far more complex than simple text or metadata based searching. Many indexing techniques arose to provide an efficient and effective similarity searching. However, these methods are restricted to a specific domain such as the metric space model. If this prerequisite is not fulfilled, indexing cannot be used, while each similarity search query degrades to sequential scanning which is unacceptable for large datasets. Inspired by previous successful results, we decided to apply the principles of genetic programming to the area of database indexing. We developed the GP-SIMDEX which is a universal framework that is capable of finding precise and efficient indexing methods for similarity searching for any given similarity data. For this purpose, we introduce the inequality symbolic regression principle and show how it helps the GP-SIMDEX Framework to find appropriate results that in most cases outperform the best-known indexing methods.
- Published
- 2013
- Full Text
- View/download PDF
47. Dynamic multimedia exploration using SIFT matching
- Author
-
Tomáš Skopal, Jachym Toušek, Lukáš Navrátil, and Jakub Lokoč
- Subjects
Fluency ,Information retrieval ,Multimedia ,Computer science ,Image database ,Schema (psychology) ,Multimedia database ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-invariant feature transform ,computer.software_genre ,computer ,Similitude - Abstract
In this demo paper, we focus on the dynamic multimedia exploration techniques which are an intuitive, effective and entertaining way to present a pre-selected subset of a multimedia database to the users. More specifically, we present an exploration schema employing a similarity model based on SIFT descriptors that can be used to explore image database according to regions in the images. We also provide a simple mechanism to reduce the number of nonrelevant SIFT descriptors in the query image. The reduction of SIFTs in the query image improves the speed and fluency of the exploration process as demonstrated in our demo application.
- Published
- 2013
- Full Text
- View/download PDF
48. Designing Similarity Indexes with Parallel Genetic Programming
- Author
-
Tomáš Skopal and Tomáš Bartoš
- Subjects
Development (topology) ,Similarity (geometry) ,Theoretical computer science ,Triangle inequality ,Computer science ,Database object ,Metric (mathematics) ,Search engine indexing ,Genetic programming ,Data mining ,computer.software_genre ,computer ,Pivot table - Abstract
The increasing diversity of unstructured databases leads to the development of advanced indexing techniques as the metric indexing model does not fit to the general similarity models. Once the most critical postulate, namely the triangle inequality, does not hold, the metric model produces notable errors during the query evaluation. To overcome this situation and to obtain more qualitative results, we want to discover better indexing models for databases using arbitrary similarity measures. However, each database is unique in a specific way, so we outline the automatic way of exploring the best indexing method. We introduce the exploration approach using parallel genetic programming principles in a multi-threaded environment built upon recently introduced SIMDEX Framework. Furthermore, we introduce smart pivot table which is an intelligent indexing method capable of incorporating obtained results. We supplement the theoretical background with experiments showing the achieved improvements in comparison to the single-threaded evaluations.
- Published
- 2013
- Full Text
- View/download PDF
49. Efficient Extraction of Feature Signatures Using Multi-GPU Architecture
- Author
-
Martin Kruliš, Jakub Lokoč, and Tomáš Skopal
- Subjects
Speedup ,Computer science ,business.industry ,Nearest neighbor search ,Feature extraction ,Search engine indexing ,Pattern recognition ,computer.software_genre ,Feature (computer vision) ,Scalability ,Data mining ,Artificial intelligence ,Cluster analysis ,business ,Throughput (business) ,computer - Abstract
Recent popular applications like online video analysis or image exploration techniques utilizing content-based retrieval create a serious demand for fast and scalable feature extraction implementations. One of the promising content-based retrieval models is based on the feature signatures and the signature quadratic form distance. Although the model proved its competitiveness in terms of the effectiveness, the slow feature extraction comprising costly k-means clustering limits the model only for preprocessing steps. In this paper, we present a highly efficient multi-GPU implementation of the feature extraction process, reaching more than two orders of magnitude speedup with respect to classical CPU platform and the peak throughput that exceeds 8 thousand signatures per second. Such an implementation allows to extract requested batches of frames or images online without annoying delays. Moreover, besides online extraction tasks, our GPU implementation can be used also in a traditional preprocessing and training phase. For example, fast extraction allows indexing of huge databases or inspecting significantly larger parameter space when searching for an optimal similarity model configuration that is optimal according to both efficiency and effectiveness.
- Published
- 2013
- Full Text
- View/download PDF
50. On Scalable Approximate Search with the Signature Quadratic Form Distance
- Author
-
Tomáš Grošup, Tomáš Skopal, and Jakub Lokoč
- Subjects
Similarity (network science) ,business.industry ,Quadratic form ,Nearest neighbor search ,Metric (mathematics) ,Pattern recognition ,Artificial intelligence ,business ,Space (mathematics) ,Measure (mathematics) ,Signature (logic) ,Mathematics ,Curse of dimensionality - Abstract
The signature quadratic form distance and feature signatures have become a respected similarity space for effective content-based retrieval. Furthermore, the similarity space is configurable by a parameter alpha affecting both retrieval precision and intrinsic dimensionality, and thus interesting trade-offs can be achieved when a metric index is used for exact search. In this paper we combine such configurable model with state of the art approximate search techniques developed for the M-Index. In the experiments, we show that employing a configuration resulting in the best effectiveness of the measure leads also to very competitive approximate search effectiveness when using the M-Index, regardless the high intrinsic dimensionality of the corresponding similarity space.
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.