Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

great post and my favorite CS topic, LSH is particularly relevant to machine learning because of its use in indexing of embeddings, which are now omnipresent in ML (from word2vec to transformers to image and graph embeddings etc, etc.)

it is now supported in ElasticSearch KNN index (they use HNSWLIB but you can call it a descendant of original LSH in a way)

check out ANN benchmarks [0] for comparison of LSH performance to other state of the art methods like proximity graphs/HNSWLIB [1] and quantization/SCANN [2]

As an introduction LSH (with MinHash) is also described in detail in the book "Mining Of Massive Datasets", ch.3, "Finding Similar items", highly recommended [3]

if you want to play with LSH, python "annoy" library is the best place to start [4]

[0] https://github.com/erikbern/ann-benchmarks

[1] https://github.com/google-research/google-research/tree/mast...

[2] https://github.com/nmslib/hnswlib

[3] http://infolab.stanford.edu/~ullman/mmds

[4] https://github.com/spotify/annoy



http://infolab.stanford.edu/~ullman/mmds is 401 for me - presumably because directory indexing is turned off.

http://infolab.stanford.edu/~ullman/mmds/book.pdf is the book; http://infolab.stanford.edu/~ullman/mmds/bookL.pdf is the hyperlinked book; and http://www.mmds.org/ seems to be the official homepage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: