36 results on '"Similarity Estimation"'
Search Results
2. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning.
- Author
-
Feng, Quan, Yao, Jia-Yu, Xie, Ming-Kun, Huang, Sheng-Jun, and Chen, Song-Can
- Subjects
COGNITIVE psychology ,GROUP work in education ,DISTILLATION ,TASK performance ,COOPERATION - Abstract
Multi-task learning (MTL) can boost the performance of individual tasks by mutual learning among multiple related tasks. However, when these tasks assume diverse complexities, their corresponding losses involved in the MTL objective inevitably compete with each other and ultimately make the learning biased towards simple tasks rather than complex ones. To address this imbalanced learning problem, we propose a novel MTL method that can equip multiple existing deep MTL model architectures with a sequential cooperative distillation (SCD) module. Specifically, we first introduce an efficient mechanism to measure the similarity between tasks, and group similar tasks into the same block to allow their cooperative learning from each other. Based on this, the grouped task blocks are sorted in a queue to determine the learning sequence of the tasks according to their complexities estimated with the defined performance indicator. Finally, a distillation between the individual task-specific models and the MTL model is performed block by block from complex to simple manner, achieving a balance between competition and cooperation among learning multiple tasks. Extensive experiments demonstrate that our method is significantly more competitive compared with state-of-the-art methods, ranking No.1 with average performances across multiple datasets by improving 12.95% and 3.72% compared with OMTL and MTLKD, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Sequential Cooperative Distillation for Imbalanced Multi-Task Learning
- Published
- 2024
- Full Text
- View/download PDF
4. Data Allocation with Neural Similarity Estimation for Data-Intensive Computing
- Author
-
Vamosi, Ralf, Schikuta, Erich, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Groen, Derek, editor, de Mulatier, Clélia, editor, Paszynski, Maciej, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M. A., editor
- Published
- 2022
- Full Text
- View/download PDF
5. Re-Identify Deformable Targets for Visual Tracking
- Author
-
Zhang, Runqing, Fan, Chunxiao, Ming, Yue, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ma, Huimin, editor, Wang, Liang, editor, Zhang, Changshui, editor, Wu, Fei, editor, Tan, Tieniu, editor, Wang, Yaonan, editor, Lai, Jianhuang, editor, and Zhao, Yao, editor
- Published
- 2021
- Full Text
- View/download PDF
6. Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval.
- Author
-
Shi, Yufeng, Zhao, Yue, Liu, Xin, Zheng, Feng, Ou, Weihua, You, Xinge, and Peng, Qinmu
- Subjects
- *
LABOR costs , *INFORMATION theory , *LEARNING ability , *MODAL logic , *INFORMATION design - Abstract
Cross-modal hashing that leverages hash functions to project high-dimensional data from different modalities into the compact common hamming space, has shown immeasurable potential in cross-modal retrieval. To ease labor costs, unsupervised cross-modal hashing methods are proposed. However, existing unsupervised methods still suffer from two factors in the optimization of hash functions: 1) similarity guidance, they barely give a clear definition of whether is similar or not between data points, leading to the residual of the redundant information; 2) optimization strategy, they ignore the fact that the similarity learning abilities of different hash functions are different, which makes the hash function of one modality weaker than the hash function of the other modality. To alleviate such limitations, this paper proposes an unsupervised cross-modal hashing method to train hash functions with discriminative similarity guidance and adaptively-enhanced optimization strategy, termed Deep Adaptively-Enhanced Hashing (DAEH). Specifically, to estimate the similarity relations with discriminability, Information Mixed Similarity Estimation (IMSE) is designed by integrating information from distance distributions and the similarity ratio. Moreover, Adaptive Teacher Guided Enhancement (ATGE) optimization strategy is also designed, which employs information theory to discover the weaker hash function and utilizes an extra teacher network to enhance it. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed DAEH against the state-of-the-arts. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Inferring context with reliable collaborators: a novel similarity estimation method for recommender systems.
- Author
-
Ali, Waqar, Kumar, Jay, and Shao, Jie
- Subjects
RECOMMENDER systems ,ACQUISITION of data ,PRIVACY - Abstract
Additional context information is vital for context-aware recommender systems. The whole paradigm of context-aware recommender systems is built upon the availability of contextual features. Apart from the significance of context, we highlight a key issue for existing context-aware recommendation paradigm that if the user environment did not provide contextual features such as time, location, or companion due to privacy constraints or if the data collection system is unable to record contextual attributes due to legal or technical concerns then the existing context-aware recommendation paradigm has no uniform mechanism to deal with this situation. In this research, we address these challenges and propose a novel item-context similarity (ICS) model capable of adaptively generating reliable collaborators for a subject user on a subject item. Additionally, ICS is fused into a weighting model called contextually reliable collaborators (CRC) that considers the current item context, the nonlinear relationship between candidate collaborators and the asymmetry between rating preferences of users to finally generate rating prediction. Experiments show that neighbors computed through ICS are more reliable than the classical similarity estimation methods and the ICS-based CRC model has outperformed state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. LSGDDN-LCD: An appearance-based loop closure detection using local superpixel grid descriptors and incremental dynamic nodes.
- Author
-
Zhang, Baosheng, Xian, Yelan, and Ma, Xiaoguang
- Subjects
- *
ONLINE databases , *MAPS , *COST - Abstract
Loop Closure Detection (LCD) is an essential component of visual Simultaneous Localization and Mapping (SLAM) systems. It enables the recognition of previously visited scenes to eliminate pose and map estimate drifts arising from long-term exploration. However, current appearance-based LCD methods face significant challenges, including high computational costs, viewpoint changes, and dynamic objects in scenes. This paper introduced an online appearance based LCD using Local Superpixel Grids Descriptor (LSGD) and Dynamic Nodes (DN), i.e., LSGDDN-LCD, to find similarities between scenes via handcrafted features extracted from the LSGD. Additionally, we proposed the adaptive mechanism to group similar scenes called Dynamic Nodes , which incrementally adjusted the database in an online manner, allowing for efficient and online retrieval of previously viewed images without need of the pre-training. Experimental results confirmed that the LSGDDN-LCD significantly improved LCD precision–recall and efficiency, and outperformed several state-of-the-art (SOTA) approaches on public and our own datasets, indicating its great potential as a generic LCD framework. Our implementation of the LSGDDN-LCD approach and the datasets were open-sourced on GitHub (https://github.com/BaoshengZhang0/LSGDDN-LCD.git). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Predicting Remaining Useful Life with Similarity-Based Priors
- Author
-
Soons, Youri, Dijkman, Remco, Jilderda, Maurice, Duivesteijn, Wouter, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Berthold, Michael R., editor, Feelders, Ad, editor, and Krempl, Georg, editor
- Published
- 2020
- Full Text
- View/download PDF
10. Skin Melanoma Assessment Using Kapur’s Entropy and Level Set—A Study with Bat Algorithm
- Author
-
Rajinikanth, V., Satapathy, Suresh Chandra, Dey, Nilanjan, Fernandes, Steven Lawrence, Manic, K. Suresh, Howlett, Robert James, Series Editor, Jain, Lakhmi C., Series Editor, Satapathy, Suresh Chandra, editor, Bhateja, Vikrant, editor, and Das, Swagatam, editor
- Published
- 2019
- Full Text
- View/download PDF
11. Min-Hash Sketches
- Author
-
Cohen, Edith and Kao, Ming-Yang, editor
- Published
- 2016
- Full Text
- View/download PDF
12. Improving Locality Sensitive Hashing Based Similarity Search and Estimation for Kernels
- Author
-
Chakrabarti, Aniket, Bandyopadhyay, Bortik, Parthasarathy, Srinivasan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Frasconi, Paolo, editor, Landwehr, Niels, editor, Manco, Giuseppe, editor, and Vreeken, Jilles, editor
- Published
- 2016
- Full Text
- View/download PDF
13. Improving image similarity estimation via global distance distribution information.
- Author
-
Liao, Lixin, Zhao, Yao, Wei, Shikui, and Zhao, Yufeng
- Subjects
- *
ENTROPY (Information theory) , *SCALABILITY , *THEORY of distributions (Functional analysis) , *COMPUTER vision , *IMAGE processing - Abstract
Abstract Estimating the similarity between two images or image patches is at the heart of many computer vision problems including content-based image retrieval, image registration, and scene recognition. However, commonly used distance-based similarity estimation is not always reliable due to the limitations in both image understanding techniques and distance metrics. In this paper, we present a scheme to improve the similarity estimation under image search scenario. To this end, we explore the discriminative capability underlying global distance distribution obtained by querying an auxiliary image dataset in an unsupervised manner. According to the results of motivational experiments, we discover that global distance distributions have the desired capability in distinguishing inter-class images which can be applied to enhance the original distance metric. Following this finding, we propose a novel approach to incorporate the global distance distribution into the original distance metric to improve the reliability of the similarity estimation. One key novelty of this approach is to model the global distance distribution as Rayleigh distribution and then represent the difference between two distributions by the relative entropy. In this way, the difference between two global distance distributions can be calculated in an extremely efficient way. We also demonstrate that Rayleigh distribution leads to consistent performance compared to the real distribution. Extensive experiments on three public datasets with various image representations and distance metrics show that the enhanced similarity estimation remarkably outperforms the original one. Furthermore, the proposed approach shows the desired scalability for handling large-scale image search scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. Similarity Estimation for HTML Code Blocks
- Author
-
Simona Ramanauskaitė and Kiril Griazev
- Subjects
HTML ,data similarity ,similarity estimation ,Social Sciences - Abstract
Data mining from web pages becomes more frequently adapted in business areas. However on the one hand while analyzing the current situation, we observe that solutions for mining structured data from web pages exists. On the other hand we see that a scientific dataset for unstructured data that would allow create and test new data selection methods does not exist. This limits the development and research of unstructured web data therefore we propose a method for HTML code block similarity estimation. The method combines both data and structure comparison and allows quantitative similarity presentation of two HTML code blocks.
- Published
- 2018
- Full Text
- View/download PDF
15. Similarity Estimation
- Author
-
Sakr, Sherif, editor and Zomaya, Albert Y., editor
- Published
- 2019
- Full Text
- View/download PDF
16. A Comparative Study of Different Distances for Similarity Estimation
- Author
-
Li, Zhong, Ding, Qiaolin, Zhang, Weihua, and Chen, Ran, editor
- Published
- 2011
- Full Text
- View/download PDF
17. Efficient Duplicate Record Detection Based on Similarity Estimation
- Author
-
Li, Mohan, Wang, Hongzhi, Li, Jianzhong, Gao, Hong, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Chen, Lei, editor, Tang, Changjie, editor, Yang, Jun, editor, and Gao, Yunjun, editor
- Published
- 2010
- Full Text
- View/download PDF
18. Similarity Estimation Using Bayes Ensembles
- Author
-
Emrich, Tobias, Graf, Franz, Kriegel, Hans-Peter, Schubert, Matthias, Thoma, Marisa, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Gertz, Michael, editor, and Ludäscher, Bertram, editor
- Published
- 2010
- Full Text
- View/download PDF
19. QoS-Based Concurrent User-Service Grouping for Web Service Recommendation.
- Author
-
Senthil Kumar, S. and Margret Anouncia, S.
- Abstract
Recently, tremendous growth of web services to share the program, data and resources requires the optimal recommendation strategy. The major issues observed in existing recommendation strategies are scalability, sparsity and the cold start. The employment of matrix factorization (MF) models addressed all the issues effectively. But, they increase the scalability of the system. This paper proposes the new framework that contains web service grouping, distance estimation, service utilization level estimation and the item-to-item comparison (Pearson Correlation Coefficient (PCC)) to improve the recommendation performance. The grouping of users according to the Haversine distance formulation to reduce the complexity in the relevant web service recommendation against the complex queries. The locations and the fields in the services utilization in proposed work provide the effective recommendation performance. The comparative analysis between the proposed novel recommendation framework with the existing techniques assures the effectiveness of proposed approach in web service recommendation. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. THE APPLICATION OF AUTOMATED CORRELATION OPTIMIZED WARPING TO THE QUALITY EVALUATION OF Radix Puerariae thomsonii: CORRECTING RETENTION TIME SHIFT IN THE CHROMATOGRAPHIC FINGERPRINTS
- Author
-
Long Jiao, Shan Bing, Xiaofei Wang, Xue Zhiwei, and Hua Li
- Subjects
automated correlation optimized warping ,chromatographic fingerprint ,Radix Puerariae thomsonii ,principal component analysis ,similarity estimation ,Chemistry ,QD1-999 - Abstract
The application of automated correlation optimized warping (ACOW) to the correction of retention time shift in the chromatographic fingerprints of Radix Puerariae thomsonii (RPT) was investigated. Twenty-seven samples were extracted from 9 batches of RPT products. The fingerprints of the 27 samples were established by the HPLC method. Because there is a retention time shift in the established fingerprints, the quality of these samples cannot be correctly evaluated by using similarity estimation and principal component analysis (PCA). Thus, the ACOW method was used to align these fingerprints. In the ACOW procedure, the warping parameters, which have a significant influence on the alignment result, were optimized by an automated algorithm. After correcting the retention time shift, the quality of these RPT samples was correctly evaluated by similarity estimation and PCA. It is demonstrated that ACOW is a practical method for aligning the chromatographic fingerprints of RPT. The combination of ACOW, similarity estimation, and PCA is shown to be a promising method for evaluating the quality of Traditional Chinese Medicine.
- Published
- 2015
- Full Text
- View/download PDF
21. Annual runoff prediction using a nearest-neighbour method based on cosine angle distance for similarity estimation.
- Author
-
GUANGHUA QIN, HONGXIA LI, XIN WANG, QINGYAN HE, and SHENQI LI
- Subjects
RUNOFF ,HYDROLOGICAL forecasting ,ESTIMATION theory ,NEAREST neighbor analysis (Statistics) ,MEASUREMENT of distances ,EUCLIDEAN distance - Abstract
The Nearest Neighbour Method (NNM) is a data-driven and non-parametric scheme established on the similarity characteristics of hydrological phenomena. One of the important parts of NNM is to choose a proper distance measure. The Euclidean distance (EUD) is a commonly used distance measure, which represents the absolute distance of a spatial point and is directly related to the coordinate of the point, but is not sensitive to the direction of the feature vector. This paper used the cosine angle distance (CAD) for the similarity measure, which reflects more differences in the direction, and compared it to EUD. This technique is applied to annual runoff at YiChang station on the Yangtze River. The results show the NNM with CAD has a better performance than that of EUD. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
22. User-Centric Similarity Search.
- Author
-
Georgoulas, Konstantinos, Vlachou, Akrivi, Doulkeridis, Christos, and Kotidis, Yannis
- Subjects
- *
MARKETING research , *CONSUMER preferences , *EUCLIDEAN distance , *QUERY (Information retrieval system) , *EMAIL systems , *NEAREST neighbor analysis (Statistics) - Abstract
User preferences play a significant role in market analysis. In the database literature, there has been extensive work on query primitives, such as the well known top-$k$
- Published
- 2017
- Full Text
- View/download PDF
23. Financial supply chain analysis with borrower identification in smart lending platform.
- Author
-
Mitra, Rony, Goswami, Adrijit, and Tiwari, Manoj Kumar
- Subjects
- *
SUPPLY chains , *LOANS , *PEER-to-peer lending , *DEBT-to-equity ratio , *CAPITAL costs , *BOOSTING algorithms - Abstract
The popularity of the online peer-to-peer (P2P) lending platform in the financial supply chain (FSC) has grown tremendously in the past few years. However, it is pretty challenging to develop an efficient financial supply chain for Micro, Small, and Medium-Sized Enterprises (MSMEs) by providing credits and services. In this research, an FSC model analyzes by emphasizing the retailer and supplier relationship. The retailer predicts the market demand to maximize profit, and accordingly, orders to the suppliers and suppliers utilize the production capabilities in limited monetary value. In such a case, the supplier also needs to apply for loans to meet the production commitment. With an intelligent lending platform, applicants can get loans quickly and more conveniently from the lenders. Still, identifying defaulter borrowers is a difficult task on the peer-to-peer lending platform. The main focus of the lenders is to maximize profit and minimize risk by giving the loan to non-defaulter suppliers. The present study describes the importance of the financial parameters (λ) of the retailers and suppliers (MSMEs) and their impact on the related supply chain. We provide the proof of dependency of the financial parameter on the cost of debt and the ratio of the debt-to-equity by developing two propositions. We propose an innovative k -Random Boosting Classifiers (k -RBC) algorithm for identifying potential good and bad borrowers to capture this case. The time complexity of the k -RBC algorithm is O (n 2 l o g (n)). The results obtained from the study show a significant improvement in comparison to the outputs from existing approaches on the same datasets. Our algorithm gives 90% accuracy to identify potential good borrowers, whereas existing algorithms achieve up to 87% accuracy. Furthermore, the importance of the borrowers' features and its impact are analyzed of the lending platform in FSC. • Defines the importance of financial parameters and its impact on supply chain. • Provides the proof of dependency of the financial parameter on the cost of debt. • Prove dependency of the financial parameter on the ratio of the debt-to-equity. • Proposes the k-Random Boosting Classification model to identify defaulters. • Identifies crucial features to enhance the probability of getting loans approval. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Robust Object Tracking Based on Principal Component Analysis and Local Sparse Representation.
- Author
-
Liu, Haicang, Li, Shutao, and Fang, Leyuan
- Subjects
- *
PRINCIPAL components analysis , *GAUSSIAN distribution , *HISTOGRAMS , *BAYESIAN analysis , *GAUSSIAN function - Abstract
Object tracking methods based on the principal component analysis (PCA) are effective against object change caused by illumination variation and motion blur. However, when the object is occluded, the tracking result of the PCA-based methods will drift away from the target. In this paper, we propose a new robust object tracking method based on the PCA and local sparse representation (LSR). First, candidates are reconstructed through the PCA subspace model in global manner. To handle occlusion, a patch-based similarity estimation strategy is proposed for the PCA subspace model. In the patch-based strategy, the PCA representation error map is divided into patches to estimate the similarity between target and candidate considering the occlusion. Second, the LSR is introduced to detect the occluded patches of the object and estimate the similarity through the residual error in the sparse coding. Finally, the two similarities of each candidate from the PCA subspace model and LSR model are fused to predict the tracking result. The experimental results demonstrate that the proposed tracking method favorably performs against several state-of-the-art methods on challenging image sequences. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
25. CaS: Collection-aware Segmentation
- Author
-
Costa, Raquel, Fonseca, Manuel J., and Ferreira, Alfredo
- Subjects
Automatic Segmentation ,3D Object Collections ,3D Object Segmentation ,Similarity Estimation - Abstract
Ao longo dos tempos, a segmentação tem provado ser um desafio devido `a sua subjectividade. A segmentação depende não apenas do domínio em causa mas acima de tudo da interpretação que os humanos fazem do objecto. Para cada contexto, diversas soluções específicas foram propostas com diferentes objectivos, limitações e vantagens. Neste trabalho propomos ultrapassar algumas dessas limitações usando o algoritmo de segmentação Collection-aware Segmentation (CaS). Este algoritmo identifica segmentos de objectos em colecções baseados na sua individualidade nessa colecção. Para esse efeito realizámos um conjunto de testes para compreender como as pessoas segmentam objectos numa colecção. A partir dos resultados destes testes desenvolvemos os algoritmos Adaped-CaS e Geons-augmented CaS. Avaliações experimentais com utilizadores mostraram que a abordagem proposta produz segmentações com significado para os humanos., 20o EPCG: Encontro Portugues de Computacao Grafica, Artigos longos, 59, 66, Raquel Costa, Manuel J. Fonseca, and Alfredo Ferreira, 3D Object Segmentation, 3D Object Collections, Automatic Segmentation, Similarity Estimation
- Published
- 2021
- Full Text
- View/download PDF
26. FuncGNN : A graph neural network approach to program similarity
- Author
-
Nair, Aravind, Roy, Avijit, Meinke, Karl, Nair, Aravind, Roy, Avijit, and Meinke, Karl
- Abstract
Background: Program similarity is a fundamental concept, central to the solution of software engineering tasks such as software plagiarism, clone identification, code refactoring and code search. Accurate similarity estimation between programs requires an in-depth understanding of their structure, semantics and flow. A control flow graph (CFG), is a graphical representation of a program which captures its logical control flow and hence its semantics. A common approach is to estimate program similarity by analysing CFGs using graph similarity measures, e.g. graph edit distance (GED). However, graph edit distance is an NP-hard problem and computationally expensive, making the application of graph similarity techniques to complex software programs impractical. Aim: This study intends to examine the effectiveness of graph neural networks to estimate program similarity, by analysing the associated control flow graphs. Method: We introduce funcGNN1, which is a graph neural network trained on labeled CFG pairs to predict the GED between unseen program pairs by utilizing an effective embedding vector. To our knowledge, this is the first time graph neural networks have been applied on labeled CFGs for estimating the similarity between highlevel language programs. Results: We demonstrate the effectiveness of funcGNN to estimate the GED between programs and our experimental analysis demonstrates how it achieves a lower error rate (1.94 ×10-3), with faster (23 times faster than the quickest traditional GED approximation method) and better scalability compared with state of the art methods. Conclusion: funcGNN posses the inductive learning ability to infer program structure and generalise to unseen programs. The graph embedding of a program proposed by our methodology could be applied to several related software engineering problems (such as code plagiarism and clone identification) thus opening multiple research directions., QC 20210224
- Published
- 2020
- Full Text
- View/download PDF
27. Fitness approximation for bot evolution in genetic programming.
- Author
-
Esparcia-Alcázar, Anna and Moravec, Jaroslav
- Subjects
- *
ALGORITHMS , *APPROXIMATION theory , *VIDEO games , *ESTIMATION theory , *PHENOTYPES - Abstract
Estimating the fitness value of individuals in an evolutionary algorithm in order to reduce the computational expense of actually calculating the fitness has been a classical pursuit of practitioners. One area which could benefit from progress in this endeavour is bot evolution, i.e. the evolution of non-playing characters in computer games. Because assigning a fitness value to a bot (or rather, the decision tree that controls its behaviour) requires playing the game, the process is very costly. In this work, we introduce two major contributions to speed up this process in the computer game Unreal Tournament 2004™. Firstly, a method for fitness value approximation in genetic programming which is based on the idea that individuals that behave in a similar fashion will have a similar fitness. Thus, similarity of individuals is taken at the performance level, in contrast to commonly employed approaches which are either based on similarity of genotypes or, less frequently, phenotypes. The approximation performs a weighted average of the fitness values of a number of individuals, attaching a confidence level which is based on similarity estimation. The latter is the second contribution of this work, namely a method for estimating the similarity between individuals. This involves carrying out a number of tests consisting of playing a 'static' version of the game (with fixed inputs) with the individuals whose similarity is under evaluation and comparing the results. Because the tests involve a limited version of the game, the computational expense of the similarity estimation plus that of the fitness approximation is much lower than that of directly calculating the fitness. The success of the fitness approximation by similarity estimation method for bot evolution in UT2K4 allows us to expect similar results in environments that share the same characteristics. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
28. A Gene-Free Formulation of Classical Quantitative Genetics Used to Examine Results and Interpretations Under Three Standard Assumptions.
- Author
-
Taylor, Peter
- Abstract
Quantitative genetics (QG) analyses variation in traits of humans, other animals, or plants in ways that take account of the genealogical relatedness of the individuals whose traits are observed. 'Classical' QG, where the analysis of variation does not involve data on measurable genetic or environmental entities or factors, is reformulated in this article using models that are free of hypothetical, idealized versions of such factors, while still allowing for defined degrees of relatedness among kinds of individuals or 'varieties.' The gene- free formulation encompasses situations encountered in human QG as well as in agricultural QG. This formulation is used to describe three standard assumptions involved in classical QG and provide plausible alternatives. Several concerns about the partitioning of trait variation into components and its interpretation, most of which have a long history of debate, are discussed in light of the gene-free formulation and alternative assumptions. That discussion is at a theoretical level, not dependent on empirical data in any particular situation. Additional lines of work to put the gene-free formulation and alternative assumptions into practice and to assess their empirical consequences are noted, but lie beyond the scope of this article. The three standard QG assumptions examined are: (1) partitioning of trait variation into components requires models of hypothetical, idealized genes with simple Mendelian inheritance and direct contributions to the trait; (2) all other things being equal, similarity in traits for relatives is proportional to the fraction shared by the relatives of all the genes that vary in the population (e.g., fraternal or dizygotic twins share half of the variable genes that identical or monozygotic twins share); (3) in analyses of human data, genotype-environment interaction variance (in the classical QG sense) can be discounted. The concerns about the partitioning of trait variation discussed include: the distinction between traits and underlying measurable factors; the possible heterogeneity in factors underlying the development of a trait; the kinds of data needed to estimate key empirical parameters; and interpretations based on contributions of hypothetical genes; as well as, in human studies, the labeling of residual variance as a non-shared environmental effect; and the importance of estimating interaction variance. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
29. Near-duplicate document detection with improved similarity measurement.
- Author
-
Yuan, Xin-pan, Long, Jun, Zhang, Zu-ping, and Gui, Wei-hua
- Abstract
To quickly find documents with high similarity in existing documentation sets, fingerprint group merging retrieval algorithm is proposed to address both sides of the problem: a given similarity threshold could not be too low and fewer fingerprints could lead to low accuracy. It can be proved that the efficiency of similarity retrieval is improved by fingerprint group merging retrieval algorithm with lower similarity threshold. Experiments with the lower similarity threshold r=0.7 and high fingerprint bits k=400 demonstrate that the CPU time-consuming cost decreases from 1 921 s to 273 s. Theoretical analysis and experimental results verify the effectiveness of this method. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
30. Similarity Estimation with Non-Transitive LSH
- Author
-
Lewis, Robert R.
- Subjects
- Computer Science, Locality Sensitive Hashing, Dice Coefficient, Non-Transitive, Hash Collision Relation, Similarity Estimation, LSHable
- Abstract
The question, what spaces admit Locality Sensitive Hash families is of great importance to the field of data-analytical hashing (such spaces are called LSH-able). This question has been restated and examined with each update to the definition. Particularly important to this work is the definition proposed by Charikar defined over similarity measures that is used in the seminal paper to give the first formal proofs about LSHability. The study of Collision Probability Functions(CPFs) has led to the relaxation of some of the requirements. This thesis focuses on the ideas presented in the papers on Distance Sensitive Hashing and Asymmetric Hashing. In particular this thesis will take the idea that LSH families only need to have well-defined CPFs to be studied generally and explore the benefits of such a view of Hashing while relaxing the traditional LSH requirements further.This thesis proposes a novel definition of hashing by analyzing a recently developed concept in the field of hashing known as non-transitive hashing. This thesis will expand theoretically on the notion of non-transitive hashing in order to develop a more general framework for LSH. This thesis shows that the original argument for non-LSHability of a space introduced by Charikar no longer applies when non-transitive hash functions are permitted. Some similarity measures that seem unable to be hashed per the triangle inequality argument of Charikar can be when the notion of hash collisions is expanded non-transitively.This thesis will justify the use of non-transitive hashing by examining known applications that violate transitivity. The fundamental concepts of Or-Amplification and hash signatures can and often do violate transitivity. To accommodate this change this thesis also suggests a change in notation for talking about non-transitive hash collisions, namely using the traditional symbol for mathematical relation,~, instead of =. The relation might depend on the hash scheme and its complexity factored into the analysis of the algorithms one might use it for. In order to well define one particularly large class of non-transitive hashing applications, this thesis proposes the Boolean Hash Collision Relation in order to give a consistent theoretical understanding of what constitutes hash families built via Amplification of other families.Using this non-transitive approach to hash function collision, this thesis will show that certain spaces are strongly LSHable even though the corresponding dissimilarity violates the triangle inequality. This thesis will explore other similarity measures that might be LSHable or approximately hashed using non-transitive methods. This thesis applies those ideas to provide new classes of non-transitive hash families with asymmetric properties introduced in this thesis by way of NOT-amplification. Together with the familiar OR-Amplification and AND-Amplification, this thesis introduces the Boolean-collision relation to generalize the notion of Amplification and describe a large class of non-transitive hash families. Using this relation for hash comparison this thesis will provide an approximate hash of Sørensen Coefficient and Sokal-Sneath using non-transitive hashing.
- Published
- 2021
31. A Systematic Review on Minwise Hashing Algorithms
- Author
-
Tang, Jingjing and Tian, Yingjie
- Published
- 2016
- Full Text
- View/download PDF
32. Information Retrieval and Filtering of Multimedia Mineral Data.
- Author
-
Cakmakov, Dusan and Davcev, Danco
- Abstract
Most of the written materials are consisted of Multimedia (MM) information because beside text usually contain image information. The present information retrieval and filtering systems use only text parts of the documents or in best case images represented by keywords or image captions. Why do not use both, text and image features of the documents and in the retrieval or filtering process utilize more completely the document information content? Can such approach increase the effectiveness of retrieval and filtering processes? There is a very little difference between retrieval and filtering at an abstract level. In this paper, we will discuss some possible similarities and differences between them on the application level taking into account the experiments in retrieval and filtering of multimedia mineral information. [ABSTRACT FROM AUTHOR]
- Published
- 1995
- Full Text
- View/download PDF
33. Effective single-cell clustering through ensemble feature selection and similarity measurements.
- Author
-
Jeong, Hyundoo and Khunlertgit, Navadon
- Subjects
- *
FEATURE selection , *ALGORITHMS , *RNA sequencing , *GENE expression profiling , *SOURCE code , *COMPUTATIONAL complexity - Abstract
Single-cell RNA sequencing technologies have revolutionized biomedical research by providing an effective means to profile gene expressions in individual cells. One of the first fundamental steps to perform the in-depth analysis of single-cell sequencing data is cell type classification and identification. Computational methods such as clustering algorithms have been utilized and gaining in popularity because they can save considerable resources and time for experimental validations. Although selecting the optimal features (i.e., genes) is an essential process to obtain accurate and reliable single-cell clustering results, the computational complexity and dropout events that can introduce zero-inflated noise make this process very challenging. In this paper, we propose an effective single-cell clustering algorithm based on the ensemble feature selection and similarity measurements. We initially identify the set of potential features, then measure the cell-to-cell similarity based on the subset of the potentials through multiple feature sampling approaches. We construct the ensemble network based on cell-to-cell similarity. Finally, we apply a network-based clustering algorithm to obtain single-cell clusters. We evaluate the performance of our proposed algorithm through multiple assessments in real-world single-cell RNA sequencing datasets with known cell types. The results show that our proposed algorithm can identify accurate and consistent single-cell clustering. Moreover, the proposed algorithm takes relative expression as input, so it can easily be adopted by existing analysis pipelines. The source code has been made publicly available at https://github.com/jeonglab/scCLUE. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. A lightweight privacy preserving SMS-based recommendation system for mobile users
- Author
-
Andrea Vitaletti, Lorenzo Bergamini, Elisa Baglioni, Luca Becchetti, Giuseppe Persiano, Luca Filipponi, and Ugo Maria Colesanti
- Subjects
jaccard coeficient ,User information ,Social network ,Computer science ,business.industry ,Recommender system ,privacy ,mobile applications ,similarity estimation ,World Wide Web ,Mobile phone ,Ask price ,Mobile database ,Mobile search ,GSM services ,business - Abstract
In this paper we propose a fully decentralized approach for recommending new contacts in the social network of mobile phone users. With respect to existing solutions, our approach is characterized by some distinguishing features. In particular, the application we propose does not assume any centralized coordination: it transparently collects and processes user information that is accessible in any mobile phone, such as the log of calls, the list of contacts or the inbox/outbox of short messages and exchanges it with other users. This information is used to recommend new friendships to other users. Furthermore, the information needed to perform recommendation is collected and exchanged between users in a privacy preserving way. Finally, information necessary to implement the application is exchanged transparently and opportunistically, by using the residual space in standard short messages occasionally exchanged between users. As a consequence, we do not ask users to change their habits in using SMS.
- Published
- 2010
- Full Text
- View/download PDF
35. Ontology mapping specification in description logics for cooperative systems
- Author
-
Thibault Poulain, Nadine Cullot, Kokou Yétongnon, Laboratoire Electronique, Informatique et Image [UMR6306] (Le2i), Université de Bourgogne (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Arts et Métiers (ENSAM), Arts et Métiers Sciences et Technologies, HESAM Université (HESAM)-HESAM Université (HESAM)-Arts et Métiers Sciences et Technologies, HESAM Université (HESAM)-HESAM Université (HESAM)-AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement, Laboratoire Electronique, Informatique et Image ( Le2i ), Université de Bourgogne ( UB ) -AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement-Centre National de la Recherche Scientifique ( CNRS ), and Cullot, Nadine
- Subjects
Ontology (information science) ,computer.software_genre ,01 natural sciences ,030507 speech-language pathology & audiology ,03 medical and health sciences ,semantic web ,Description logic ,[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB] ,Semantic integration ,ontology ,0101 mathematics ,Semantic Web ,similarity estimation ,Mathematics ,Information retrieval ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,business.industry ,010102 general mathematics ,Mappings ,Mappings, ontologies, web sémantique, calcul de similarité estimation ,Mappings, ontology, semantic web, similarity estimation ,[ INFO.INFO-DB ] Computer Science [cs]/Databases [cs.DB] ,Artificial intelligence ,Web service ,0305 other medical science ,business ,Merge (version control) ,computer - Abstract
Le developpement rapide du Web semantique est lie a la specification de plus en plus d'ontologies. Celles-ci permettent de modeliser des connaissances agreees par des communautes de personnes concernant des domaines ou des tâches specifiques. Le meme domaine decrit par deux communautes distinctes sera modelise de facon differente. Les systemes cooperatifs visent a rendre les informations provenant de differentes sources disponibles au-dela de leurs divergences. Pour cela, ils doivent aligner, fusionner ou integrer ces ontologies. La decouverte de mappings est un point cle dans la resolution efficace des heterogeneites entre ontologies. Nous developpons une architecture qui connecte des systemes d'information via des ontologies, avec comme objectif la resolution de requetes complexes. Le but de notre article est de decrire les grandes lignes de cette architecture, apres avoir presente la methodologie utilisee pour mettre en correspondance les diverses ontologies rencontrees. The rapid development of the semantic Web is associated with the specification of various ontologies which formally represent agreements of communities of people on specific domains or tasks. The same knowledge formalized by different people leads to heterogeneous representations. Cooperative systems, which aim to make knowledge from various sources available in spite of their heterogeneities, need to align, merge or integrate the ontologies used to model the information sources. Furthermore, the resolution of differences among ontologies is necessary to process queries or use web services in distributed heterogeneous environments. Mapping discovery is a key issue to allow efficient resolution of heterogeneity. We develop an architecture for mapping different systems associated with ontologies. In this paper we present the key components and the underlying concepts of a framework and approach for comparing and matching different ontologies. Keywords : Mappings, ontologies, web semantique, calcul de similarite estimation\' Mappings, ontology, semantic web, similarity estimation. Journal des Sciences Pour l\'Ingenieur. Vol. 7 2006: pp. 64-71
- Published
- 2006
36. Omnibus outlier detection in sensor networks using windowed locality sensitive hashing
- Author
-
Yannis Kotidis, Antonios Deligiannakis, Nikos Giatrakos, and Minos Garofalakis
- Subjects
Computer Networks and Communications ,Computer science ,Data stream mining ,Sensor network ,Real-time computing ,Probabilistic logic ,Locality-sensitive hashing ,Reduction (complexity) ,Hardware and Architecture ,Outlier ,Locality sensitive hashing ,Anomaly detection ,Enhanced Data Rates for GSM Evolution ,Similarity estimation ,Wireless sensor network ,Software ,Streaming window model - Abstract
Summarization: Wireless Sensor Networks (WSNs) have become an integral part of cutting edge technological paradigms such as the Internet-of-Things (IoT) which incorporates a variety of smart application scenarios. WSNs include tiny sensors (motes), with constrained hardware capabilities and limited power supply that can collaboratively function in an unsupervised manner for a long period of time. Their purpose is to continuously monitor quantities of interest and provide answers to application queries. Sensor data streams are inherently spatiotemporal in nature, both because mote measurements form multidimensional time series and due to the spatial reference on the data based on the realm sensed by a mote. Motes are designed to be inexpensive, and thus sensory hardware is prone to temporary or permanent failures yielding faulty measurements. Such measurements may unpredictably forge a query answer, while truthful but abnormal mote samples may indicate undergoing phenomena. Therefore, outlier detection in sensor networks is of utmost importance. With limited power supply and communication being by far the main culprit in energy drain, outlier detection techniques in WSNs should achieve appropriate balance between reducing communication and providing real-time, continuously updated outlier reports. Prior works employ probabilistic or best effort approaches to accomplish the task, which either unpredictably compromise outlier detection accuracy or fail to explicitly tune the amount of communicated data. In this work, we introduce an omnibus outlier detection solution over spatiotemporally referenced sensor data that is capable of: (a) directly trading communication reduction for outlier detection quality with predictable accuracy guarantees, (b) accommodating both uni- and multi-dimensional outlier definitions, (c) operating under various streaming window models and (d) incorporating a wide variety of similarity measures to judge outliers. Παρουσιάστηκε στο: Future Generation Computer Systems
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.