great post and my favorite CS topic, LSH is particularly relevant to machine lea...

great post and my favorite CS topic, LSH is particularly relevant to machine learning because of its use in indexing of embeddings, which are now omnipresent in ML (from word2vec to transformers to image and graph embeddings etc, etc.)

it is now supported in ElasticSearch KNN index (they use HNSWLIB but you can call it a descendant of original LSH in a way)

check out ANN benchmarks [0] for comparison of LSH performance to other state of the art methods like proximity graphs/HNSWLIB [1] and quantization/SCANN [2]

As an introduction LSH (with MinHash) is also described in detail in the book "Mining Of Massive Datasets", ch.3, "Finding Similar items", highly recommended [3]

if you want to play with LSH, python "annoy" library is the best place to start [4]

[0] https://github.com/erikbern/ann-benchmarks

[1] https://github.com/google-research/google-research/tree/mast...

[2] https://github.com/nmslib/hnswlib

[3] http://infolab.stanford.edu/~ullman/mmds

[4] https://github.com/spotify/annoy