Tutorial 15 · Engine Programming

Hash Tables from Scratch

The data structure behind entity maps, asset registries, string interning tables, and shader variant caches. We build each collision-resolution strategy from scratch, animate probe sequences live so clustering is visible, walk through the SIMD metadata trick that makes Swiss tables the current state of the art, and end on the engine case studies that decide which variant ships.

Time~50 min LevelJunior to mid; review for senior PrereqsYou can read C++ or pseudocode. Arrays, pointers, modular arithmetic. Big O (read the prior tutorial if it feels rusty). HardwareNone. A feel for cache lines helps in §11.

01Why hash tables

A sorted array gives you O(log n) lookup via binary search. A hash table gives you O(1) expected lookup, O(1) insert, O(1) delete. That constant factor is one or two cache misses on modern hardware, roughly 10 to 100 nanoseconds depending on whether the table fits in L2. For the data structures an engine hits every frame (the entity-to-component map in an ECS, the string interning table, the shader permutation cache), the difference between O(log n) and O(1) is the difference between a microsecond and a handful of nanoseconds per lookup.

Hash tables show up wherever you need a key-value mapping with fast access:

Entity maps. In an ECS, each entity ID maps to a row of component data. That mapping is a hash table keyed on entity ID. Unreal's TMap, EASTL's hash_map, and most custom ECS implementations use open-addressing hash tables for this.
Asset registries. The content browser maps asset paths (strings) to metadata. String-keyed hash tables with interned keys dominate here.
Shader variant caches. A shader may compile into thousands of permutations (feature flags, platform toggles). The key is a bitmask or a small struct; the value is a compiled shader object. Lookup happens on the render thread, so it must be fast.
String interning. The engine maps every unique string to a single canonical pointer. Subsequent comparisons become pointer equality (one instruction) instead of strcmp. Unreal's FName system is a hash table.
Deduplication. Content pipelines deduplicate textures, meshes, and animation curves by content hash. The dedup index is a hash table.

What you'll have by the end

A working mental model of every hash table variant that ships in a real engine. Division and multiplication hashing, FNV-1a, xxHash, wyhash. Separate chaining and why it's cache-hostile. Open addressing by linear probing, quadratic probing, and double hashing, with the clustering analysis and Knuth's expected-probe-length formulas. Robin Hood hashing and its variance-reduction invariant. Swiss tables: the SIMD metadata byte trick, H1/H2 split, group probing. Deletion strategies (tombstones, backward shift, rehash). Load factor policies, incremental rehashing (the Redis approach), and the amortized O(1) proof sketch. Cache-line alignment and prefetching. The engine case studies: TMap, EASTL, ECS entity lookup, string interning, shader caches.

02A short history

Hashing predates modern computers by about a decade (Luhn's 1953 memo at IBM). The key dates:

1953

Luhn's IBM memorandum. Hans Peter Luhn writes an internal IBM memo describing a scheme for placing records into "buckets" addressed by a hash of the key^[1]. This is the earliest known description of hash addressing. The memo also appears to contain one of the earliest references to linked lists (chaining).

1957

Peterson's open-addressing analysis. W. Wesley Peterson publishes "Addressing for Random-Access Storage" in the IBM Journal of Research and Development^[2]. The paper gives the first published analysis of open addressing: expected probe lengths for the index-table method and the sorted-file method.

1963

Knuth's hashing analysis. Donald Knuth circulates his analysis of hashing algorithms, later published in The Art of Computer Programming, Volume 3, Section 6.4^[3]. The section runs about 46 pages covering division hashing, multiplication hashing, expected probe lengths under uniform hashing, and the birthday-paradox collision bounds that determine load-factor policies.

1985

Robin Hood hashing. Pedro Celis, Per-Ake Larson, and J. Ian Munro present "Robin Hood Hashing" at the 26th IEEE Symposium on Foundations of Computer Science^[4]. The insight: when a new key's probe distance exceeds the incumbent's, swap them. This reduces variance in probe length dramatically, with successful searches averaging roughly 2.13 probes even in a full table.

1991

FNV hash. Glenn Fowler and Phong Vo propose FNV hashing in comments on the IEEE POSIX P1003.2 standard; Landon Curt Noll improves it^[5]. FNV-1a becomes the go-to non-cryptographic hash for small keys: one XOR and one multiply per byte.

2012

xxHash. Yann Collet publishes xxHash^[6], a non-cryptographic hash function optimized for throughput on bulk data. Widely adopted in checksum and deduplication pipelines.

2017

Swiss tables. Matt Kulukundis presents "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step" at CppCon^[7]. The design uses SIMD instructions to probe 16 metadata bytes in parallel. Released as absl::flat_hash_map in Google's Abseil library. Rust's standard HashMap migrated from Robin Hood hashing to a Swiss-table clone (hashbrown) in Rust 1.36^[8].

2019

wyhash. Wang Yi publishes wyhash^[9], which tops the SMHasher benchmark as the fastest quality hash at the time. Adopted as the default hash in Zig, used as the non-AES fallback in Go's runtime since 1.17, and used in several game engines.

03Hash functions

A maps a key of arbitrary size to a fixed-size integer. The table uses that integer (modulo the table size) as a slot index. Two properties matter:

Uniformity. The hash should distribute keys uniformly across the output range. Non-uniform hashes pile keys into the same slots, increasing probe lengths.
The . Flipping one input bit should flip roughly half the output bits. A hash that fails this test (e.g., the identity function on sequential integers) creates patterns in the slot assignments that interact badly with power-of-2 table sizes.

Integer hashing

Division method: h(k) = k mod m. Simple, but the choice of m matters. If m is a power of 2, k mod m is just the low bits of k, and sequential keys land in sequential slots (no mixing). If m is prime, the distribution is better, but prime-modulo division is slow (an integer divide).

Fibonacci hashing (multiplication method): h(k) = (k * 2654435769) >> (32 - log2(m)). The magic constant is 2³²/φ, truncated. Multiplication by an irrational fraction scrambles the bits; the right-shift extracts the top bits, which have better avalanche than the low bits. Used in Linux kernel hash tables.

String hashing

FNV-1a^[5]: XOR each byte into the hash, then multiply by a prime. One XOR and one multiply per byte. Good avalanche on short strings. The 64-bit variant uses offset basis 0xcbf29ce484222325 and prime 0x00000100000001B3.

xxHash^[6]: processes 32 bytes per iteration using four accumulators with multiply-rotate-accumulate rounds. Throughput-optimized for bulk data (gigabytes per second on modern hardware). The XXH3 variant adds a 128-bit secret key for keyed hashing.

wyhash^[9]: extremely fast on short keys (under 32 bytes), making it ideal for hash-table use. Uses wide multiplies (64-bit * 64-bit = 128-bit) to mix bits. Used as the non-AES fallback hash in Go's runtime since 1.17; default on platforms without AES hardware support.

For string interning, the hash function must be deterministic across program runs (no random seed) and fast on strings up to roughly 256 bytes. FNV-1a is the common choice. For hash tables with DoS resistance (network-facing code), use a keyed hash (SipHash, wyhash with a random seed) to prevent hash flooding.

04Chaining

Separate chaining is the textbook's first collision resolution. Each slot holds a pointer to a linked list of entries that hash to that slot. Insert appends to the list. Lookup walks the list comparing keys. Delete removes the node.

The expected list length is the α = n/m. Under the SUHA, an unsuccessful search examines α entries on average; a successful search examines roughly 1 + α/2^[3].

Chaining tolerates load factors above 1.0 (more keys than slots). Java's HashMap defaults to α = 0.75 and re-chains at that threshold, but correctness doesn't depend on staying below 1.0.

The problem is cache behavior. Each lookup dereferences a pointer to the list head, then follows next pointers through heap-allocated nodes scattered across memory. Each pointer chase is a potential DRAM miss (roughly 100 ns on modern hardware^[10]). At α = 0.75, the average chain is short, but even one miss per lookup adds up at engine-frame frequency. Open addressing (next section) keeps all entries in a flat array, giving the CPU prefetcher something to work with.

Chaining still ships in two contexts: when the load factor is unpredictably high (some network protocol parsers allow unbounded keys per bucket with a hard cap on total size), and when the values are too large to move during Robin Hood swaps. For everything else, open addressing wins on modern hardware.

05Open addressing: linear probing

stores all entries directly in the table array. No linked lists, no heap allocations, no pointer chasing. When a key hashes to an occupied slot, the table probes forward through the array until it finds an empty slot (for insert) or the target key (for lookup).

Linear probing is the simplest variant: the for slot h is h, h+1, h+2, ... (mod m). Insert walks forward until it finds an empty slot. Lookup walks forward until it finds the key or an empty slot (proof that the key isn't present).

Knuth's analysis^[3] gives the expected number of probes under uniform hashing:

KNUTH §6.4 Unsuccessful: 12 (1 + 1 (1 - α) 2) Successful: 12 (1 + 1 1 - α)

At α = 0.5, a successful search averages 1.5 probes. At α = 0.75, it averages 2.5. At α = 0.9, it averages 5.5. The unsuccessful-search formula blows up faster: at α = 0.9, it's 50.5 probes. This is why open-addressing tables resize well before α = 1.

The failure mode is primary clustering. When several keys hash to nearby slots, their probe runs merge into a single long cluster. New keys that hash anywhere into the cluster join it, making it longer. Clusters grow quadratically in expectation^[3]. The widget below makes this visible: as the load factor increases, the table fills with long contiguous runs of occupied slots.

target load factor 0.50

slots filled

longest cluster

avg probe (insert)

key to insert 42

step time 300 ms

probes taken

slot found

strategy

linear

06Quadratic probing and double hashing

Linear probing's clustering problem comes from the step size: +1 every time. Two mitigations:

Quadratic probing: the probe sequence is h, h+1, h+3, h+6, h+10, ... (adding 1, 2, 3, 4, ... to the previous offset, i.e., triangular numbers). This breaks primary clustering because keys that hash to different slots follow different probe paths. It introduces a weaker problem called secondary clustering: keys that hash to the same slot still follow the same sequence, but keys at different slots don't pile up.

There is a coverage guarantee: if the table size m is a power of 2, the triangular-number sequence h, h+1, h+3, h+6, ... visits every slot exactly once before repeating. This follows from the fact that the differences 0, 1, 3, 6, ..., m(m-1)/2 are all distinct mod m when m is a power of 2. Python's dict uses a different open-addressing scheme: a perturbation-based recurrence (j = 5*j + 1 + perturb) that mixes in upper hash bits progressively.

Double hashing: the step size is a second hash function h₂(key). The probe sequence is h₁(key), h₁(key) + h₂(key), h₁(key) + 2·h₂(key), ... This eliminates both primary and secondary clustering: each key gets a unique step size. Under uniform hashing, the expected probe count matches the theoretical ideal: 1/(1 - α) for unsuccessful search, (1/α) ln(1/(1 - α)) for successful search.

The downside: double hashing's non-sequential probe pattern defeats the CPU prefetcher. Linear probing's sequential scan loads the next cache line automatically; double hashing jumps around the array, incurring a cache miss on most probes. In practice, linear probing with a good hash function and a low load factor (α ≤ 0.7) often beats double hashing on wall-clock time despite having higher probe counts, because each probe is a cache hit instead of a miss.

07Robin Hood hashing

Robin Hood hashing^[4] is linear probing with one added rule: during insert, if the new key's probe distance exceeds the incumbent key's probe distance at the current slot, swap them. The displaced incumbent continues probing from where the new key took over. The invariant: "steal from the rich (short probe distance), give to the poor (long probe distance)."

The effect on variance is dramatic. Standard linear probing has high variance in probe length: a few keys land in long clusters while most keys sit in their home slot. Robin Hood hashing redistributes the pain. The maximum probe distance drops, and the variance in probe length becomes near-zero. Successful searches average roughly 2.13 probes even at 100% load^[4].

The win for lookups: because the maximum probe distance is bounded and small, an unsuccessful search can terminate early. If you're searching for a key and the current slot's probe distance is less than what your search key's probe distance would be at this position, the key cannot be further in the table. Stop immediately.

Rust's standard HashMap used Robin Hood hashing with backward-shift deletion from 2014 until Rust 1.36 (2019), when it was replaced by the Swiss-table-based hashbrown crate^[8]. The migration was motivated by Swiss tables' better performance on short-key lookups (the SIMD metadata probe is faster than comparing probe distances).

step time 500 ms

inserts done

swaps this insert

max probe dist

08Swiss tables (flat hash maps)

Google's Swiss table design^[7] is the current state of the art for general-purpose hash maps. The core idea: store one metadata byte per slot in a separate array, and use SSE2 instructions to compare 16 metadata bytes in one instruction. A lookup that would take 16 separate comparisons in a conventional table takes one _mm_cmpeq_epi8.

The H1/H2 split

The hash of a key is split into two parts:

H1 (upper 57 bits): the slot index. Determines which group of 16 slots to check first.
H2 (lower 7 bits): stored in the metadata byte for that slot. A compact fingerprint used to filter candidates before comparing the full key.

Each metadata byte is one of three states: empty (0x80 = 0b10000000), deleted/tombstone (0xFE), or occupied (0b0xxxxxxx, where the 7 low bits are H2). The high bit distinguishes empty/deleted from occupied, and the 7-bit H2 fingerprint filters out roughly 127/128 of non-matching keys before the table compares the full key.

The SIMD probe

A lookup works as follows:

Compute H1 and H2 from the key's hash.
Use H1 to find the starting group (16 consecutive metadata bytes, aligned to a 16-byte boundary).
Load the 16 metadata bytes into a 128-bit SSE2 register.
Broadcast H2 into all 16 lanes of a second register.
Execute _mm_cmpeq_epi8: each lane that matches produces 0xFF; non-matches produce 0x00.
Extract a 16-bit bitmask with _mm_movemask_epi8. Each set bit is a candidate slot.
For each candidate, compare the full key. On a match, return the value.
If no match, check if any metadata byte in the group was "empty." If so, the key is absent. If the group was full or contained only tombstones, advance to the next group (quadratic probing on groups) and repeat.

The result: most lookups check one group (16 slots) with two SIMD instructions and zero or one full-key comparison. At α = 0.875 (the default maximum load factor in Abseil), the expected number of groups probed is roughly 1.5.

lookup key 37

H2 matches

full-key comparisons

result

09Deletion strategies

Deletion in open-addressing tables is harder than insertion. You cannot simply clear a slot, because that breaks probe chains: a lookup for a key whose probe passed through the now-empty slot would stop too early and falsely report "not found."

Four strategies:

Tombstones. Mark the deleted slot as "deleted" instead of "empty." Lookups treat tombstones as occupied (skip over them); inserts treat them as empty (reuse them). Simple, but tombstones accumulate and degrade lookup performance over time. Swiss tables use this (the 0xFE metadata byte).
Backward shift deletion. After removing a key, scan forward and shift back any subsequent key whose home slot is at or before the deleted position. Maintains the linear-probing invariant without tombstones. Used by Robin Hood tables (Rust's pre-hashbrown HashMap used this). More expensive per delete, but lookups never encounter stale markers.
Robin Hood deletion. A variant of backward shift: after removing a key, walk forward; any key with probe distance > 0 shifts back one slot, decrementing its probe distance. Terminates when a key is already at its home slot or an empty slot is found.
Rehash on delete. After deleting, reinsert every key in the cluster. Correct but expensive. Only viable for tiny tables or when deletions are rare.

Tombstone accumulation is the most common real-world bug. A table with many inserts and deletes fills with tombstones. Lookups scan more and more tombstone-occupied slots. The fix is periodic compaction (rehash everything into a fresh table, dropping tombstones) or use backward-shift deletion.

10Load factor and resizing

When the load factor exceeds a threshold, the table allocates a larger array and reinserts every key. The threshold depends on the probing strategy:

Linear probing: resize at α = 0.5 to 0.7. Higher than 0.7 and clustering dominates.
Robin Hood: can run at α = 0.9+ due to bounded variance, but wall-clock degrades.
Swiss tables (Abseil): resize at α = 7/8 = 0.875. The SIMD probe tolerates high load because most lookups still check only one group.
Chaining: resize at α = 0.75 (Java's default). Can exceed 1.0, but performance degrades linearly.

Growth policies

Power-of-2 growth (double the table size): modulo reduces to a bitmask (h & (m-1)), avoiding integer division. Requires a hash function with good avalanche; a bad hash combined with power-of-2 modulo creates systematic collisions. Most production tables use this.

Prime growth (next prime after doubling): better distribution with mediocre hash functions, but the modulo operation requires a division (tens of nanoseconds). GCC's std::unordered_map uses prime sizes.

Incremental rehashing (the Redis approach)

A full rehash of n keys is O(n). For a real-time system (a game server, an in-memory database), a single O(n) stall is unacceptable. Redis^[11] solves this with incremental rehashing: it keeps two tables (ht[0] and ht[1]) simultaneously. When a resize triggers, the new table is allocated but entries are migrated one bucket at a time, during regular insert/delete/lookup operations. Each operation moves a small batch of entries from the old table to the new one. During the migration, lookups check both tables. The migration field rehashidx tracks progress; when it reaches the end, the old table is freed.

Amortized O(1) proof sketch

Each insert costs O(1) expected. A resize that doubles the table costs O(n) but happens only after n inserts since the last resize. Spread the O(n) cost over the n inserts: each insert pays O(1) amortized. The accounting argument: charge each insert $3 instead of $1. Use $1 for the insert, save $2. By the time the table is full, you have $2n saved, which pays for the O(n) resize.

highlight load factor 0.70

linear (successful)

uniform (successful)

chaining (successful)

11Cache behavior and memory layout

The reason open addressing dominates on modern hardware: cache lines. A typical cache line is 64 bytes. A linear-probing table with 8-byte entries fits 8 entries per cache line. A probe that walks 4 consecutive slots likely hits within the same cache line (one miss for the first, then three hits). A chaining table dereferences a pointer to a heap-allocated node, which is almost certainly on a different cache line, so every hop is a miss.

Swiss tables push this further. The metadata array is separate from the key-value array: 16 metadata bytes fit in a single 16-byte chunk (one quarter of a cache line). Loading and comparing the group metadata is fast because the data is tiny and contiguous. The full-key comparison, which touches the key-value array, happens at most once or twice per lookup.

Prefetching. Some implementations issue a software prefetch for the key-value slot as soon as the H2 match is found, overlapping the prefetch latency with the bitmask extraction. This hides 50 to 100 ns of memory latency on a table that doesn't fit in L2.

Alignment. Aligning the metadata array to a 16-byte boundary ensures the SSE2 load is a single aligned load instead of a cross-line split. Abseil's implementation enforces this. On ARM with NEON, the same trick applies with vceqq_u8 instead of _mm_cmpeq_epi8.

12Game engine case studies

1 · Unreal's TMap

TMap<K,V> is implemented as a TSet<TPair<K,V>> backed by a sparse array^[12]. The hash function is GetTypeHash(), customizable per key type. The default for integers is a multiply-shift; for strings, CityHash is available. Open addressing with linear probing. The load factor threshold is 0.7 by default. TMap is the single most-used container in Unreal gameplay code after TArray.

2 · EASTL hash_map

Electronic Arts' EASTL^[13] provides hash_map and fixed_hash_map (pre-allocated, no heap allocation during gameplay). The design uses separate chaining with small-buffer optimization: each bucket inlines one entry, spilling to a linked list only on collision. The fixed variant is critical for console development where dynamic allocation during gameplay is forbidden.

3 · Entity-to-component lookup in ECS

In a sparse-set ECS (EnTT and similar), each component type has a dense array of component data and a sparse array mapping entity IDs to indices into the dense array. The sparse array is direct-indexed (not hashed): the entity ID is the array index, giving O(1) lookup with no collision resolution. In archetype-based ECS systems (Flecs, Unity DOTS, Bevy), the entity-to-archetype mapping is typically a hash table keyed on entity ID, using open addressing. The hash function is often just the entity ID itself (identity hash), because entity IDs are already well-distributed integers.

4 · String interning (FName)

Unreal's FName system hashes the string once, stores it in a global name table (a hash table of unique strings), and thereafter uses the table index as the name's identity. Comparing two FNames is an integer comparison, not a string comparison. The name table uses separate chaining with a fixed bucket count (historically 65536 buckets). Looking up an FName by string is one hash, one bucket lookup, and on average less than one chain walk (the table is sized so the average chain is short).

5 · Shader permutation caches

A material with 12 boolean features generates up to 4096 shader permutations. The compiled-shader cache maps a permutation key (a bitmask or a small struct of flags) to a compiled PSO. The key is small and the hash is cheap (Fibonacci hash on the bitmask). The table is typically a flat open-addressing table with power-of-2 size. Lookups happen on the render thread at draw-call time, so they must complete in under 100 ns.

Implementation: open-addressing hash table

open-addressing hash table

// Minimal open-addressing hash table with linear probing.
// Tombstone deletion. Power-of-2 table size. Fibonacci hash.
struct FlatMap {
    enum SlotState : uint8_t { EMPTY, OCCUPIED, DELETED };

    struct Slot {
        SlotState state = EMPTY;
        uint64_t key;
        uint64_t value;
    };

    std::vector<Slot> slots;
    size_t count = 0;
    size_t mask;                                // table_size - 1

    explicit FlatMap(size_t capacity = 16)
        : slots(capacity), mask(capacity - 1) {}

    // Fibonacci hash: multiply by 2^64 / phi, take top bits.
    size_t hash(uint64_t key) const {
        return (key * 11400714819323198485ULL) >> (64 - __builtin_ctzll(mask + 1));
    }

    void insert(uint64_t key, uint64_t value) {
        if (count * 10 > (mask + 1) * 7) rehash();  // resize at alpha = 0.7
        size_t idx = hash(key);
        while (slots[idx].state == OCCUPIED) {
            if (slots[idx].key == key) {              // update existing
                slots[idx].value = value;
                return;
            }
            idx = (idx + 1) & mask;                   // linear probe
        }
        slots[idx] = { OCCUPIED, key, value };
        count++;
    }

    uint64_t* find(uint64_t key) {
        size_t idx = hash(key);
        while (slots[idx].state != EMPTY) {
            if (slots[idx].state == OCCUPIED && slots[idx].key == key)
                return &slots[idx].value;
            idx = (idx + 1) & mask;
        }
        return nullptr;
    }

    void erase(uint64_t key) {
        size_t idx = hash(key);
        while (slots[idx].state != EMPTY) {
            if (slots[idx].state == OCCUPIED && slots[idx].key == key) {
                slots[idx].state = DELETED;           // tombstone
                count--;
                return;
            }
            idx = (idx + 1) & mask;
        }
    }

    void rehash() {
        size_t newSize = (mask + 1) * 2;
        std::vector<Slot> old = std::move(slots);
        slots.assign(newSize, Slot{});
        mask = newSize - 1;
        count = 0;
        for (auto& slot : old)
            if (slot.state == OCCUPIED) insert(slot.key, slot.value);
    }
};

// Minimal open-addressing hash table with linear probing.
// Tombstone deletion. Power-of-2 table size. Fibonacci hash.
const FIB: u64 = 11400714819323198485;

enum Slot {
    Empty,
    Deleted,
    Occupied { key: u64, value: u64 },
}

struct FlatMap {
    slots: Vec<Slot>,
    count: usize,
    mask: usize,
}

impl FlatMap {
    fn new(capacity: usize) -> Self {
        let slots = (0..capacity).map(|_| Slot::Empty).collect();
        FlatMap { slots, count: 0, mask: capacity - 1 }
    }

    fn hash(&self, key: u64) -> usize {
        let shift = 64 - (self.mask + 1).trailing_zeros();
        (key.wrapping_mul(FIB) >> shift) as usize
    }

    fn insert(&mut self, key: u64, value: u64) {
        if self.count * 10 > (self.mask + 1) * 7 {
            self.rehash();                  // resize at alpha = 0.7
        }
        let mut idx = self.hash(key);
        loop {
            match &self.slots[idx] {
                Slot::Occupied { key: k, .. } if *k == key => {
                    self.slots[idx] = Slot::Occupied { key, value };
                    return;
                }
                Slot::Empty | Slot::Deleted => {
                    self.slots[idx] = Slot::Occupied { key, value };
                    self.count += 1;
                    return;
                }
                _ => idx = (idx + 1) & self.mask,   // linear probe
            }
        }
    }

    fn find(&self, key: u64) -> Option<u64> {
        let mut idx = self.hash(key);
        loop {
            match &self.slots[idx] {
                Slot::Occupied { key: k, value } if *k == key => {
                    return Some(*value);
                }
                Slot::Empty => return None,
                _ => idx = (idx + 1) & self.mask,
            }
        }
    }

    fn rehash(&mut self) {
        let new_size = (self.mask + 1) * 2;
        let old = std::mem::replace(
            &mut self.slots,
            (0..new_size).map(|_| Slot::Empty).collect(),
        );
        self.mask = new_size - 1;
        self.count = 0;
        for slot in old {
            if let Slot::Occupied { key, value } = slot {
                self.insert(key, value);
            }
        }
    }
}

What's intentionally missing

This implementation skips: iterator support, heterogeneous lookup (looking up a &str in a String-keyed table), tombstone compaction (periodic rehash to clear accumulated tombstones), SIMD metadata (Swiss table optimization), Robin Hood displacement, prefetch hints, and thread safety. A production hash table (abseil, hashbrown, boost::unordered_flat_map) handles all of these. The code above is the skeleton; the widgets above show the probing behaviors that differentiate the variants.

13Pitfalls

Hash flooding (DoS). An attacker crafts keys that all hash to the same slot. Expected O(1) becomes O(n). Defense: use a keyed hash function (SipHash, wyhash with a random per-table seed). Rust's default HashMap uses SipHash with a random seed for this reason.
Bad hash functions with power-of-2 tables. A hash that returns sequential integers (identity hash on sequential IDs) combined with a power-of-2 table size maps sequential keys to sequential slots: primary clustering at load factor 0.0. The fix: use a hash with good avalanche (Fibonacci multiply, wyhash) or use prime table sizes.
Iterator invalidation. Most hash tables invalidate all iterators on insert (because insert may trigger a rehash that moves every entry). Calling erase() inside a for-each loop over the table is undefined behavior in C++ std::unordered_map. Use the return value of erase() (an iterator to the next element) to continue iteration.
Tombstone accumulation. A table with many insert/delete cycles fills with tombstones. Lookup performance degrades because probes skip tombstones but don't stop at them. Fix: periodic rehash to compact, or use backward-shift deletion.
Power-of-2 modulo with bad hashes. If h(k) = k and m = 2^p, then h(k) & (m-1) is just the low p bits. If keys share low bits (e.g., pointers aligned to 8 bytes), they all map to the same few slots. Fix: mix the bits before masking (Fibonacci multiply, or h >> 16 ^ h on 32-bit hashes).
Using mutable fields as hash keys. If you insert a key, then modify the fields the hash depends on, the entry becomes unreachable: the lookup hashes to a different slot than the one the entry occupies. Unreal's documentation explicitly warns about this for TMap.

14What's next

Perfect hashing. If the key set is known at compile time (keyword tables, opcode maps), a perfect hash function maps each key to a unique slot with zero collisions. gperf and the CMPH library generate them. O(1) worst case, not just expected.
Cuckoo hashing. Two tables, two hash functions. Each key has exactly two candidate slots (one per table). Insert displaces the incumbent if both candidates are occupied, kicking it to its alternate slot. Worst-case O(1) lookup (check two slots, done). Resize is rare but O(n) when it happens.
Concurrent hash maps. Lock-free and lock-striped variants: Java's ConcurrentHashMap, Intel TBB's concurrent_hash_map, folly's AtomicHashMap. The main challenge is resizing without blocking readers.
GPU hash tables. SlabHash (Ashkiani et al., 2018) and the CUDA hash table in cuCollections. GPU hash tables must tolerate high contention across thousands of threads and avoid warp divergence on probe-chain walks.

For the engine programmer, the most useful next step is to profile your game's hottest hash table. Measure the load factor, the average probe length, and the cache-miss rate. If the table is chained, benchmark a switch to open addressing. If it's open-addressing with a bad hash, try Fibonacci hashing. If it's already a Swiss table, you're at the current frontier.

Sources

Hans Peter Luhn. Internal IBM memorandum on hash addressing, January 1953. Not publicly available; described in the IEEE Spectrum article "Hans Peter Luhn and the Birth of the Hashing Algorithm." spectrum.ieee.org. The earliest known description of hash-based bucket addressing and linked-list chaining.
W. Wesley Peterson. "Addressing for Random-Access Storage." IBM Journal of Research and Development 1(2):130-146, April 1957. DOI: 10.1147/rd.12.0130. ieeexplore.ieee.org/document/5392733. First published analysis of open-addressing probe lengths and collision resolution.
Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Section 6.4 "Hashing." Addison-Wesley, 1973 (2nd edition 1998). ISBN 0-201-89685-0. gwern.net (excerpt). The reference analysis of hashing: division/multiplication methods, expected probe lengths for linear probing, double hashing, and chaining under SUHA.
Pedro Celis, Per-Ake Larson, J. Ian Munro. "Robin Hood Hashing (Preliminary Report)." In Proc. 26th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 281-288, 1985. semanticscholar.org. Introduces the displacement invariant: steal from the rich (low probe distance), give to the poor (high probe distance). Achieves constant expected probe count even at full load.
Glenn Fowler, Landon Curt Noll, Kiem-Phong Vo. "FNV Hash." IETF Internet-Draft (informational). ietf.org/archive/id/draft-eastlake-fnv-21.html. The FNV-1a hash specification: XOR each byte then multiply by a prime. Public domain. The go-to non-cryptographic hash for short keys since 1991.
Yann Collet. "xxHash: Extremely fast non-cryptographic hash algorithm." GitHub repository, 2012-present. github.com/Cyan4973/xxHash. Throughput-optimized hash using four accumulators with multiply-rotate rounds. XXH3 variant adds a 128-bit secret key.
Matt Kulukundis. "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step." CppCon 2017. youtube.com/watch?v=ncHmEUmJZf4. See also: abseil.io/about/design/swisstables. The Swiss table design: SIMD metadata bytes, H1/H2 split, group probing. Released as absl::flat_hash_map.
Amanieu d'Antras et al. "hashbrown: Rust port of Google's SwissTable hash map." GitHub repository. github.com/rust-lang/hashbrown. Adopted as the backing implementation of Rust's standard HashMap since Rust 1.36 (2019), replacing the previous Robin Hood implementation.
Wang Yi. "wyhash: The fastest quality hash function." GitHub repository, 2019. github.com/wangyi-fudan/wyhash. Uses wide multiply (64x64=128) for mixing. Non-AES fallback hash in Go's runtime (since 1.17). Default hash in Zig, V, and Nim. Topped the SMHasher benchmark on release.
Peter Norvig. "Teach Yourself Programming in Ten Years," approximate timing table. norvig.com/21-days.html. Memory-hierarchy latency ballparks: L1 ~0.5 ns, L2 ~7 ns, DRAM ~100 ns. Used to estimate per-probe cost.
Salvatore Sanfilippo et al. "Redis source: dict.c." GitHub repository. github.com/redis/redis/blob/unstable/src/dict.c. Incremental rehashing implementation: two hash tables, one-bucket-at-a-time migration during regular operations, rehashidx progress tracking.
Epic Games. "TMap." Unreal Engine documentation. dev.epicgames.com/documentation/unreal-engine/map-containers. Documents TMap as a TSet of TPairs backed by a sparse array, with GetTypeHash customization and O(1) add/remove/find.
Paul Pedriana et al. "EASTL: Electronic Arts Standard Template Library." N2271, ISO/IEC JTC1/SC22/WG21, 2007. open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html. See also: github.com/electronicarts/EASTL. The design paper and open-source release of EA's game-oriented STL replacement, including hash_map and fixed_hash_map.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms, 4th ed. MIT Press, 2022. ISBN 9780262046305. mitpress.mit.edu/9780262046305. Chapter 11 covers hash tables, chaining, open addressing, and the SUHA analysis.