Ok, big shout out to monort [0] for the link to the video [1].
This is just a quick overview from a single viewing of the video, but it's called "funnel hashing". The idea is to split into exponentially smaller sub arrays, so the first chunk is n/m, the second is n/(m^2), etc. until you get down to a single element. Call them A0, A1, etc., so |A0| = n/m, |A1| = n/(m^2) etc., k levels in total.
Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.
Call \delta the fraction of slots that are empty (I'm unclear if this is a parameter that gets set at hash table creation or one that's dynamically updated). Setting c = log(1/d) and k = log(1/d) to get worst case complexity O(log^2(1/d)).
This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.
There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.
This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].
Actually they propose two algorithms: Funnel Hashing and Elastic Hashing. Funnel Hashing is "greedy" and defeats the Yao's conjecture that concerns greedy hash mechanisms. Elastic Hashing is "non-greedy" and provides a better amortized time than greedy algorithms.
That it circumvents Yao’s conjecture by being non-greedy contradicts the article. Is the article wrong or is your understanding of the paper? I don’t know, just want to see if you’re noticing something the article’s authors don’t know.
> Farach-Colton, Krapivin and Kuszmaul wanted to see if that same limit also applied to non-greedy hash tables. They showed that it did not by providing a counterexample, a non-greedy hash table with an average query time that’s much, much better than log x.
One thing I don't understand from watching the video, is what happens in the (very rare) case that you get collisions all the way down the funnel. I assume this is related to the "One special final level to catch a few keys" (around 14:41 in the video), but given that it has to be fixed size, this can also get full. What do you do in that case?
Thanks! So I guess the best recourse then is to resize the table? Seems like it should be part of the analysis, even if it's low probability of it happening. I haven't read the paper, though, so no strong opinion here...
(By the way, the text fragment does works somewhat in Firefox. Not on the first load, but if load it, then focus the URL field and press enter)
Yeah, I presume so. At least that's what Swiss Tables do. The paper is focused more on the asymptotics rather than the real-world hardware performance, so I can see why they chose not to handle such edge cases
This bothered me too, reading it and the sample implementations I've found so far just bail out. I thought one of the benefits of hash tables was that they don't have a predefined size?
The hash tables a programmer interacts with generally very much have a fixed size, but resize on demand. The idea of a fixed size is very much a part of the open addressing style hash tables -- how else could they even talk of how full a hash table is?
Quite a neat idea that could be useful for memory-constrained environments.
[Shameless plug]:
If you are into hashtables, you might want to check out Dandelion Hashtable [0]. We use it in our next-generation databases, and it was published in HPDC'24. It is currently the fastest in-memory hashtable in practice.
It improves closed addressing via bounded-cacheline chaining to surpass 1B in-memory requests/second on a commodity server.
What the author says there is "What we just showed was that we can achieve a worst-case expected probe complexity of log squared one over delta with a greedy algorithm. And we don't have too much time to go over the non-greedy algorithm but...".
The funnel hashing described in the video is greedy. The video doesn't cover the non-greedy elastic hashing.
"Greedy" means that the search and insertion do the same probe sequence, and insertion just uses the first free slot in that sequence.
This is just a quick overview from a single viewing of the video, but it's called "funnel hashing". The idea is to split into exponentially smaller sub arrays, so the first chunk is n/m, the second is n/(m^2), etc. until you get down to a single element. Call them A0, A1, etc., so |A0| = n/m, |A1| = n/(m^2) etc., k levels in total.
Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.
Call \delta the fraction of slots that are empty (I'm unclear if this is a parameter that gets set at hash table creation or one that's dynamically updated). Setting c = log(1/d) and k = log(1/d) to get worst case complexity O(log^2(1/d)).
This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.
There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.
This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].
[0] https://news.ycombinator.com/item?id=43007860
[1] https://www.youtube.com/watch?v=ArQNyOU1hyE
[2] https://arxiv.org/pdf/2301.10191