Information about Cuckoo Hashing

Enlarge picture
Cuckoo hashing example. The arrows show the alternative location of each key. A new item would be inserted in the location of A by moving A to its alternative location, currently occupied by B, and moving B to its alternative location which is currently vacant. Insertion of a new item in the location of H would not succeed: Since H is part of a cycle (together with W), the new item would get kicked out again.


Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001. [1]

The basic idea is to use two hash functions instead of only one. This provides two possible locations in the hash table for each key.

When a new key is inserted, a greedy approach is used: The new key is inserted in one of its two possible locations, "kicking out", that is, displacing any key that might already reside in this location. This displaced key is then inserted in its alternative location, again kicking out any key that might reside there, until a vacant position is found, or the procedure enters an infinite loop. In the latter case, the hash table is rebuilt using new hash functions.

Lookup requires just inspection of two locations in the hash table, which takes constant time in the worst case (see Big O notation). This is in contrast to many other hash table algorithms, which may not have a constant bound on the time to do a lookup.

It can also be shown that insertions succeed in expected constant time [1], even considering the possibility of having to rebuild the table, as long as the number of keys is kept below half of the capacity of the hash table.

The basic version of cuckoo hashing has load factor limited to 49%.

Generalizations of cuckoo hashing that use more than 2 alternative hash functions can be expected to utilize a larger part of the capacity of the hash table efficiently while sacrificing some lookup and insertion speed. Using just three hash functions increases the load to 91%. Another generalization of cuckoo hashing consists in using more than one key per bucket. Using just 2 keys per bucket permits a load factor above 80%.

(Other algorithms that use multiple hash functions include the Bloom filter). Cuckoo hashing can be used to implement a data structure equivalent to a Bloom filter.

The name derives from the behavior of some species of cuckoo, where the mother bird pushes eggs out of another bird's nest to lay her own. A study by Zukowski et al. [2] has showed that cuckoo hashing is much faster than chained hashing for small, cache-resident hash tables on modern processors. K. Ross [3] has shown bucketized versions of cuckoo hashing (variants that use buckets that contain more than one key) to be faster than conventional methods also for large hash tables, when space utilization is high. However as of 2007 cuckoo hashing remains widely unknown outside the research community.

See also

References

Computer programming (often shortened to programming or coding) is the process of writing, testing, and maintaining the source code of computer programs. The source code is written in a programming language.
..... Click the link for more information.
In computer science, a hash collision is a situation that occurs when two distinct inputs into a hash function produce identical outputs.

All hash functions have potential collisions, though with a well-designed hash function, collisions should occur less often (compared
..... Click the link for more information.
A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm "chops and mixes" (i.e.
..... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
greedy algorithm is any algorithm that follows the problem solving metaheuristic of making the locally optimum choice at each stage with the hope of finding the global optimum.
..... Click the link for more information.
infinite loop is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition or having one that can never be met.
..... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm "chops and mixes" (i.e.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not.
..... Click the link for more information.
Cuculidae
Vigors, 1825

Genera

See text.

The cuckoos are a family, Cuculidae, of near passerine birds. The order Cuculiformes, in addition to the cuckoos, also includes the turacos (family Musophagidae, sometimes treated as a separate
..... Click the link for more information.
CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations.
..... Click the link for more information.
20th century - 21st century - 22nd century
1970s  1980s  1990s  - 2000s -  2010s  2020s  2030s
2004 2005 2006 - 2007 - 2008 2009 2010

2007 by topic:
News by month
Jan - Feb - Mar - Apr - May - Jun
..... Click the link for more information.
A Perfect hash function of a set S is a hash function which maps different keys (elements) in S to different numbers. A perfect hash function with values in a range of size some constant times the number of elements in S can be used for efficient lookup operations, by placing the
..... Click the link for more information.
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location.[1]
..... Click the link for more information.
Double hashing is a computer programming technique used in hashing to resolve hash collisions, cases when two different values to be searched for produce the same hash key. It is a popular collision-resolution technique in open-addressed hash tables.
..... Click the link for more information.
In computer science, a hash collision is a situation that occurs when two distinct inputs into a hash function produce identical outputs.

All hash functions have potential collisions, though with a well-designed hash function, collisions should occur less often (compared
..... Click the link for more information.
A hash function [1] is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm "chops and mixes" (i.e.
..... Click the link for more information.
Quadratic probing is a scheme in computer programming for resolving collisions in hash tables.

Quadratic probing operates by taking the original hash value and adding successive values of an arbitrary quadratic polynomial to the starting value.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter