Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Malware detection uses it as a heuristic to peg specific “variants” to each other. Basically LSH is great for comparing corpuses of data that are slightly different.

If you hash the data using something like SHA256 and even one byte is different, it’ll produce a radically different hash (which is by design). With LSH you can have a measurement to say “corpus A has a 90% overlap with corpus B”, or in a single bit flip case, a very high correlation.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: