Back to Search
Start Over
On Revealing Shared Conceptualization Among Open Datasets
- Source :
- SSRN Electronic Journal.
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Openness and transparency initiatives are not only milestones of science progress but have also influenced various fields of organization and industry. Under this influence, varieties of government institutions worldwide have published a large number of datasets through open data portals. Government data covers diverse subjects and the scale of available data is growing every year. Published data is expected to be both accessible and discoverable. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata is often missing which decreases users’ ability to obtain the desired information. As the scale of published datasets grows, this problem increases. An approach we describe in this paper is focused towards decreasing this problem by implementing knowledge structures and algorithms capable of proposing the best match for the category where an uncategorized dataset should belong to. By doing so, our aim is twofold: enrich datasets metadata by suggesting an appropriate category and increase its visibility and discoverability. Our approach relies on information regarding open datasets provided by users — dataset description contained within dataset tags. Since dataset tags express low consistency due to their origin, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Optimization is performed in terms of reducing the number of distinct tag values used for dataset description. Once optimized, dataset tags are used to reveal shared conceptualization originating from their usage by means of Formal Concept Analysis. We will demonstrate the advantage of our proposal by comparing concept lattices generated using Formal Concept Analysis before and after the optimization process and use generated structure as a knowledge base to categorize uncategorized open datasets. Finally, we will present a categorization mechanism based on the generated knowledge base that takes advantage of semantic similarity measures to propose a category suitable for an uncategorized dataset.
- Subjects :
- Information retrieval
Conceptualization
Computer Networks and Communications
business.industry
Computer science
02 engineering and technology
Discoverability
Human-Computer Interaction
Metadata
Consistency (database systems)
Open data
Semantic similarity
Knowledge base
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Formal concept analysis
020201 artificial intelligence & image processing
business
Software
Subjects
Details
- ISSN :
- 15565068
- Database :
- OpenAIRE
- Journal :
- SSRN Electronic Journal
- Accession number :
- edsair.doi.dedup.....3827d73defe34fb408cbd54a7ea94f00
- Full Text :
- https://doi.org/10.2139/ssrn.3770603