Back to Search
Start Over
Similarity Estimation with Non-Transitive LSH
- Publication Year :
- 2021
-
Abstract
- The question, what spaces admit Locality Sensitive Hash families is of great importance to the field of data-analytical hashing (such spaces are called LSH-able). This question has been restated and examined with each update to the definition. Particularly important to this work is the definition proposed by Charikar defined over similarity measures that is used in the seminal paper to give the first formal proofs about LSHability. The study of Collision Probability Functions(CPFs) has led to the relaxation of some of the requirements. This thesis focuses on the ideas presented in the papers on Distance Sensitive Hashing and Asymmetric Hashing. In particular this thesis will take the idea that LSH families only need to have well-defined CPFs to be studied generally and explore the benefits of such a view of Hashing while relaxing the traditional LSH requirements further.This thesis proposes a novel definition of hashing by analyzing a recently developed concept in the field of hashing known as non-transitive hashing. This thesis will expand theoretically on the notion of non-transitive hashing in order to develop a more general framework for LSH. This thesis shows that the original argument for non-LSHability of a space introduced by Charikar no longer applies when non-transitive hash functions are permitted. Some similarity measures that seem unable to be hashed per the triangle inequality argument of Charikar can be when the notion of hash collisions is expanded non-transitively.This thesis will justify the use of non-transitive hashing by examining known applications that violate transitivity. The fundamental concepts of Or-Amplification and hash signatures can and often do violate transitivity. To accommodate this change this thesis also suggests a change in notation for talking about non-transitive hash collisions, namely using the traditional symbol for mathematical relation,~, instead of =. The relation might depend on the hash scheme and its complexity factored into the analysis of the algorithms one might use it for. In order to well define one particularly large class of non-transitive hashing applications, this thesis proposes the Boolean Hash Collision Relation in order to give a consistent theoretical understanding of what constitutes hash families built via Amplification of other families.Using this non-transitive approach to hash function collision, this thesis will show that certain spaces are strongly LSHable even though the corresponding dissimilarity violates the triangle inequality. This thesis will explore other similarity measures that might be LSHable or approximately hashed using non-transitive methods. This thesis applies those ideas to provide new classes of non-transitive hash families with asymmetric properties introduced in this thesis by way of NOT-amplification. Together with the familiar OR-Amplification and AND-Amplification, this thesis introduces the Boolean-collision relation to generalize the notion of Amplification and describe a large class of non-transitive hash families. Using this relation for hash comparison this thesis will provide an approximate hash of Sørensen Coefficient and Sokal-Sneath using non-transitive hashing.
Details
- Language :
- English
- Database :
- OpenDissertations
- Publication Type :
- Dissertation/ Thesis
- Accession number :
- ddu.oai.etd.ohiolink.edu.ucin162323979030229