Spatial Partitioning
Every major engine ships an acceleration structure. Frustum culling, broad-phase collision, raycasting, range queries, nearest-neighbor lookups: none of them can afford to test every object against every other. The structures that make these queries sublinear are uniform grids, quadtrees, octrees, BSP trees, bounding volume hierarchies, and spatial hash maps. We build each one from scratch, animate the construction, fire rays through a live BVH, and end on the case studies that ship in Bullet, Unreal, Unity, Nanite, and RTX.
01Why partition space
Testing every object against every other is O(n²). A physics scene with 5,000 dynamic bodies produces 12.5 million pair tests per frame. At 60 fps on a single core that is roughly 750 million tests per second, each one requiring at least an AABB overlap check (six comparisons). Most of those pairs are nowhere near each other. The purpose of spatial partitioning is to skip them.
The split between broad phase and narrow phase exists for exactly this reason. The broad phase uses a spatial data structure to reduce n² candidate pairs to something closer to n or n log n, then the narrow phase runs the expensive per-pair geometry tests only on the surviving candidates.
The same structures serve other queries. Frustum culling walks the tree and prunes entire branches whose bounding volumes lie outside the camera frustum. Raycasting (mouse picking, line-of-sight checks, bullet traces) traverses the tree along the ray and skips volumes the ray does not intersect. Range queries ("give me every unit within 50 meters of this position") descend into nodes that overlap the query sphere and skip the rest. K-nearest-neighbor queries do the same with a shrinking search radius.
Without an acceleration structure, each of those queries is O(n). With one, the average case drops to O(log n) for tree-based structures or O(1) for hash-based structures, at the cost of O(n) or O(n log n) construction time and O(n) storage.
A working understanding of every spatial data structure that ships in a modern engine. Uniform grids and when they win. Quadtrees with insert, remove, and range query. Octrees (pointer-based and linear via Morton codes). BSP trees and why Doom used them. BVHs with the surface area heuristic. Stack-based ray-BVH traversal with early-out. Dynamic AABB trees with fat margins (the Box2D approach). Spatial hash maps. And the case studies: Bullet's DbvtBroadphase, Unreal's octree, Unity's internal BVH, Nanite's cluster culling hierarchy, RTX TLAS/BLAS.
02A short history
Spatial data structures emerged from three separate communities: computer graphics (visibility), computational geometry (range searching), and database indexing (spatial queries). The key dates for engine work:
03Uniform grids
The simplest spatial structure: divide the world into a fixed grid of equal-sized cells. Each cell holds a list of objects whose centers (or bounding volumes) overlap that cell. Insert is O(1): hash the object's position to a cell index, append. Query is O(1) in the number of cells checked: compute which cells the query region overlaps, iterate their contents.
The spatial hash function maps cell coordinates to a flat array index. A typical choice: (cellX * 73856093) ^ (cellY * 19349663) % tableSize. The primes are arbitrary; what matters is that they produce few collisions for the expected cell coordinate range.
Grids win when the objects are roughly uniform in size and roughly uniform in distribution. Particle systems are the canonical example: thousands of similar-sized particles spread over the simulation domain. 2D games where entities are all within a factor of two in bounding-box size (top-down shooters, roguelikes, RTS unit collision) also fit well.
The failure mode is the large-object problem. An object that spans many cells must be inserted into all of them, and every query that touches any of those cells finds it. A single large terrain chunk in a grid of small cells can appear in hundreds of cells, multiplying both insertion cost and query noise. Hierarchical grids (multiple grid resolutions, one per object-size class) mitigate this at the cost of multiple lookups per query.
For uniform distributions with similar-sized objects, a grid is faster than any tree. There is no recursion, no tree traversal, no pointer chasing. The hash table lookup is one indirection. On a cache-cold path, that single indirection beats the three to eight indirections a balanced BVH requires to reach a leaf. Particle systems, 2D bullet-hell games, and SPH fluid sims almost always use grids.
04Quadtrees
A quadtree recursively subdivides 2D space into four quadrants. Each internal node has exactly four children (NW, NE, SW, SE). Objects are inserted into the deepest node that fully contains them. When a leaf exceeds a capacity threshold, it subdivides.
Two flavors exist. A point quadtree stores point data; each split plane passes through a stored point. A region quadtree splits space uniformly at the midpoint of each axis, regardless of where the points are. Region quadtrees are what engines use: the split positions are implicit (computable from the node's bounds), so the tree stores no split-plane data.
Depth limits prevent degenerate input (many objects at the same position) from producing infinite recursion. A typical max depth of 8 gives 2&sup8; = 256 leaf cells per axis, fine resolution for most 2D scenes. The per-leaf capacity (max objects before subdivision) is usually between 4 and 16.
The widget below is a live quadtree. Click to place entities. The tree subdivides in real time.
Grid vs quadtree: range query comparison
The practical question is how many cells or nodes each structure checks to answer a range query. A grid checks every cell the query rectangle overlaps, regardless of whether those cells contain objects. A quadtree skips entire empty subtrees. For clustered distributions (most game scenes), the quadtree checks fewer nodes. For perfectly uniform distributions, the grid can be faster due to zero traversal overhead.
struct AABB { float minX, minY, maxX, maxY; };
struct QuadTreeNode {
AABB bounds;
int depth;
int maxDepth;
int maxPerLeaf;
std::vector<Entity*> items;
std::unique_ptr<QuadTreeNode> children[4]; // NW, NE, SW, SE
bool contains(float px, float py) const {
return px >= bounds.minX && px < bounds.maxX
&& py >= bounds.minY && py < bounds.maxY;
}
void subdivide() {
float midX = (bounds.minX + bounds.maxX) * 0.5f;
float midY = (bounds.minY + bounds.maxY) * 0.5f;
children[0] = makeNode({bounds.minX, bounds.minY, midX, midY}, depth + 1);
children[1] = makeNode({midX, bounds.minY, bounds.maxX, midY}, depth + 1);
children[2] = makeNode({bounds.minX, midY, midX, bounds.maxY}, depth + 1);
children[3] = makeNode({midX, midY, bounds.maxX, bounds.maxY}, depth + 1);
// Re-insert existing items into children
for (Entity* entity : items) {
for (auto& child : children) {
if (child->contains(entity->posX, entity->posY)) {
child->insert(entity);
break;
}
}
}
items.clear();
}
void insert(Entity* entity) {
if (!contains(entity->posX, entity->posY)) return;
// Leaf: store here if under capacity or at max depth
if (!children[0] && (items.size() < maxPerLeaf || depth >= maxDepth)) {
items.push_back(entity);
return;
}
if (!children[0]) subdivide();
for (auto& child : children) {
if (child->contains(entity->posX, entity->posY)) {
child->insert(entity);
return;
}
}
}
// Range query: collect all entities whose positions fall inside queryBox.
void query(const AABB& queryBox, std::vector<Entity*>& results) const {
// Prune: if this node does not overlap the query, skip entirely
if (queryBox.maxX < bounds.minX || queryBox.minX > bounds.maxX ||
queryBox.maxY < bounds.minY || queryBox.minY > bounds.maxY) return;
for (Entity* entity : items) {
if (entity->posX >= queryBox.minX && entity->posX <= queryBox.maxX &&
entity->posY >= queryBox.minY && entity->posY <= queryBox.maxY) {
results.push_back(entity);
}
}
if (children[0]) {
for (const auto& child : children) {
child->query(queryBox, results);
}
}
}
};
struct AABB { min_x: f32, min_y: f32, max_x: f32, max_y: f32 }
struct QuadTreeNode {
bounds: AABB,
depth: u32,
max_depth: u32,
max_per_leaf: usize,
items: Vec<Entity>,
children: Option<Box<[QuadTreeNode; 4]>>,
}
impl QuadTreeNode {
fn contains(&self, px: f32, py: f32) -> bool {
px >= self.bounds.min_x && px < self.bounds.max_x
&& py >= self.bounds.min_y && py < self.bounds.max_y
}
fn insert(&mut self, entity: Entity) {
if !self.contains(entity.pos_x, entity.pos_y) { return; }
if self.children.is_none()
&& (self.items.len() < self.max_per_leaf
|| self.depth >= self.max_depth)
{
self.items.push(entity);
return;
}
if self.children.is_none() { self.subdivide(); }
let children = self.children.as_mut().unwrap();
for child in children.iter_mut() {
if child.contains(entity.pos_x, entity.pos_y) {
child.insert(entity);
return;
}
}
}
fn query(&self, query_box: &AABB, results: &mut Vec<&Entity>) {
// Prune: no overlap with this node
if query_box.max_x < self.bounds.min_x
|| query_box.min_x > self.bounds.max_x
|| query_box.max_y < self.bounds.min_y
|| query_box.min_y > self.bounds.max_y
{ return; }
for entity in &self.items {
if entity.pos_x >= query_box.min_x
&& entity.pos_x <= query_box.max_x
&& entity.pos_y >= query_box.min_y
&& entity.pos_y <= query_box.max_y
{
results.push(entity);
}
}
if let Some(children) = &self.children {
for child in children.iter() {
child.query(query_box, results);
}
}
}
}
05Octrees
An octree is a quadtree in three dimensions. Each node splits into eight children (2³). Insert, remove, and query follow the same logic, with an extra axis.
Two storage layouts are common. A pointer-based octree stores eight child pointers per node, same as the quadtree above. A linear octree encodes each node's position as a Morton code and stores all nodes in a flat sorted array. The Morton code (also called the Z-order curve) interleaves the bits of the x, y, z cell coordinates. Parent-child and neighbor relationships are computed by bit shifts, not pointer dereferences.
Linear octrees matter for GPU work. Sparse voxel octrees (SVOs) encode voxelized geometry as a linear octree in GPU memory. NVIDIA's GVDB library and the academic SVO literature use Morton codes for spatial locality. Laine and Karras (2010) describe efficient SVO traversal on GPUs at NVIDIA Research.
For CPU-side engine use, pointer-based octrees with a depth limit of 6 to 8 levels are standard. Unreal Engine uses an octree (TOctree2) for its spatial query system (proximity queries, component overlap tests). The pointer-based layout is simpler to update incrementally when objects move.
06BSP trees
A BSP tree partitions space with arbitrary hyperplanes, not axis-aligned splits. Each internal node stores a plane (in 2D, a line; in 3D, a plane equation ax + by + cz + d = 0). Objects or polygon fragments are classified to the front or back half-space; objects that straddle the plane are split.
The rendering trick: given a viewpoint, traverse the BSP tree in a specific order. If the viewpoint is in front of a node's plane, draw the back subtree first, then the node's polygon, then the front subtree. This gives a back-to-front ordering suitable for the painter's algorithm, without a depth buffer.
Doom (1993) and Quake (1996) depended on BSP trees. John Carmack chose them for Doom's renderer because the 386 lacked the fill rate for a Z-buffer at acceptable resolution[5]. The BSP was built offline by the map compiler (NODES lump in the WAD file). Doom's renderer reversed the traversal to front-to-back, skipping already-drawn columns to avoid overdraw (more efficient than the original back-to-front painter's approach on constrained hardware). Quake extended BSP with a PVS (potentially visible set), precomputed per BSP leaf. The PVS lookup skipped entire sections of the level with zero runtime cost.
BSP trees are mostly historical for rendering (modern engines use BVH-based frustum culling and GPU occlusion queries). They remain relevant for CSG (constructive solid geometry) operations: boolean union, intersection, and difference of 3D meshes. Unreal's BSP-based level geometry workflow still exists, though static meshes have replaced it for most content.
07Bounding volume hierarchies
A BVH is a tree of bounding volumes. Each leaf holds one object (or a small cluster). Each internal node holds the bounding volume that encloses all objects in its subtree. The bounding volume is almost always an AABB because AABB-AABB overlap is six comparisons and AABB-ray intersection (the slab method) is fast and branchless.
Construction is top-down or bottom-up. Top-down: start with all objects, pick a split (axis and position), partition into left and right, recurse. Bottom-up (agglomerative): start with one leaf per object, repeatedly merge the two nodes whose combined bounding volume is smallest. Top-down is simpler and faster to build. Bottom-up can produce tighter trees but is O(n²) in the naive implementation.
Top-down split strategies:
- Median split. Sort objects along the longest axis, split at the median. Produces balanced trees. Simple, O(n log n) per level.
- Midpoint split. Split at the spatial midpoint of the node's bounding volume. Faster than median (no sort), but can produce unbalanced trees if objects are clustered.
- SAH split. Evaluate a cost function at many candidate split positions, pick the one that minimizes expected traversal cost. The surface area heuristic (next section).
The widget below shows a BVH being constructed step by step. Each step picks an axis, evaluates a split, and partitions the objects. The bounding boxes of the resulting nodes are drawn.
In RTX hardware ray tracing, the BVH has two levels. The TLAS (top-level acceleration structure) stores instance transforms and pointers to BLASes. The BLAS (bottom-level acceleration structure) stores the actual triangle geometry for each mesh. Rebuilding the TLAS per frame is cheap (it contains only instance data). Rebuilding a BLAS is expensive (full triangle-level SAH), so static geometry builds the BLAS once and reuses it.
08Surface area heuristic
The SAH cost model estimates the expected number of node intersection tests a random ray will perform. The key insight, from Goldsmith and Salmon (1987)[4]: the probability that a uniformly random ray hits a convex bounding volume is proportional to its surface area. In 2D, surface area is the perimeter; in 3D, the actual surface area.
For a node with bounding volume A containing children L and R:
SA(X) is the surface area of bounding volume X. NL and NR are the object counts in the left and right children. Ctrav is the cost of traversing one internal node; Cisect is the cost of intersecting one leaf object. The ratio SA(L)/SA(A) is the conditional probability that a ray hitting A also hits L.
The builder evaluates this cost for many candidate splits (every object boundary on each axis, or a fixed number of bins), picks the split with the lowest cost, and recurses. If no split is cheaper than making the node a leaf, the node becomes a leaf.
Binned SAH (Wald 2007[7]) approximates the exact evaluation by distributing objects into a fixed number of bins (typically 8 to 32) along each axis. The cost is evaluated at each bin boundary, not at every object boundary. This reduces the per-level cost from O(n log n) to O(n), making full SAH BVH construction practical for per-frame rebuilds on scenes with tens of thousands of primitives.
09BVH traversal
The canonical BVH traversal for ray queries is stack-based depth-first search. Push the root. Pop a node. Test the ray against the node's AABB. If it misses, discard. If it hits and the node is a leaf, test the ray against the leaf's objects. If it hits and the node is internal, push both children. The stack depth is bounded by the tree height (typically 20 to 30 for a million-primitive scene).
The slab method for ray-AABB intersection computes entry and exit times on each axis independently, then checks if the intervals overlap. It is branchless, SIMD-friendly, and the standard in production ray tracers.
// Slab-method ray-AABB intersection.
// Returns true if the ray [origin, origin + dir * tmax] hits the box.
// On hit, tmin is the entry distance (may be negative if origin is inside).
bool rayAABB(const Vec3& origin, const Vec3& invDir,
const AABB& box, float& tmin, float tmax)
{
// invDir = 1.0 / direction, precomputed to avoid division per test
float t1 = (box.minX - origin.x) * invDir.x;
float t2 = (box.maxX - origin.x) * invDir.x;
tmin = std::min(t1, t2);
tmax = std::min(tmax, std::max(t1, t2));
t1 = (box.minY - origin.y) * invDir.y;
t2 = (box.maxY - origin.y) * invDir.y;
tmin = std::max(tmin, std::min(t1, t2));
tmax = std::min(tmax, std::max(t1, t2));
t1 = (box.minZ - origin.z) * invDir.z;
t2 = (box.maxZ - origin.z) * invDir.z;
tmin = std::max(tmin, std::min(t1, t2));
tmax = std::min(tmax, std::max(t1, t2));
return tmax >= std::max(tmin, 0.0f);
}
// Slab-method ray-AABB intersection.
// Returns Some(tmin) on hit, None on miss.
fn ray_aabb(origin: Vec3, inv_dir: Vec3, aabb: &AABB, t_max: f32) -> Option<f32> {
let t1x = (aabb.min_x - origin.x) * inv_dir.x;
let t2x = (aabb.max_x - origin.x) * inv_dir.x;
let mut tmin = t1x.min(t2x);
let mut tmax = t_max.min(t1x.max(t2x));
let t1y = (aabb.min_y - origin.y) * inv_dir.y;
let t2y = (aabb.max_y - origin.y) * inv_dir.y;
tmin = tmin.max(t1y.min(t2y));
tmax = tmax.min(t1y.max(t2y));
let t1z = (aabb.min_z - origin.z) * inv_dir.z;
let t2z = (aabb.max_z - origin.z) * inv_dir.z;
tmin = tmin.max(t1z.min(t2z));
tmax = tmax.min(t1z.max(t2z));
if tmax >= tmin.max(0.0) { Some(tmin) } else { None }
}
Two optimizations matter in practice:
- Ordered traversal (near child first). When both children are hit, push the far child first and the near child second. The near child is popped and tested first, and any hit in the near child tightens tmax, which causes the far child's test to fail more often. This early-out can skip 30-50% of node tests on typical scenes.
- Packet traversal. Test a bundle of coherent rays against the same node at once, using SIMD. If all rays in the packet miss a node, skip it. Ray packets exploit the fact that primary rays from adjacent pixels are nearly parallel and hit similar parts of the BVH.
The widget below fires rays through a 2D BVH. Click to cast a ray from the left edge. The traversal animates step by step: green boxes are hits (the ray intersects the AABB), red boxes with X marks are misses (early-out).
10Spatial hashing
A spatial hash is a grid stored as a hash table instead of a dense 2D/3D array. Insert: compute the cell coordinates from the object's position, hash them to a table index, append the object to that bucket. Query: compute which cells overlap the query region, look up each bucket, test the contents.
Cell size selection matters. If cells are too large, each cell contains too many objects and the query degenerates to brute force within the cell. If cells are too small, objects span multiple cells (the large-object problem again). A common heuristic: set the cell size to twice the average object radius, so most objects fit in a single cell.
Multi-level hashing uses several hash tables at different cell sizes (e.g., 1x, 4x, 16x). Small objects go in the fine grid, large objects in the coarse grid. A query checks all levels. This handles mixed-size objects without the straddle problem, at the cost of multiple lookups.
struct SpatialHash {
float cellSize;
int tableSize;
std::vector<std::vector<Entity*>> buckets;
SpatialHash(float cellSize, int tableSize)
: cellSize(cellSize), tableSize(tableSize),
buckets(tableSize) {}
// Hash cell coordinates to a bucket index
int hash(int cellX, int cellY) const {
// Large primes reduce collision rate for typical coordinate ranges
return ((cellX * 73856093) ^ (cellY * 19349663)) % tableSize;
}
int toCell(float coord) const {
return static_cast<int>(std::floor(coord / cellSize));
}
void insert(Entity* entity) {
int cx = toCell(entity->posX);
int cy = toCell(entity->posY);
int idx = ((hash(cx, cy) % tableSize) + tableSize) % tableSize;
buckets[idx].push_back(entity);
}
// Range query: find all entities within queryBox.
void query(const AABB& queryBox,
std::vector<Entity*>& results) const
{
int minCX = toCell(queryBox.minX);
int maxCX = toCell(queryBox.maxX);
int minCY = toCell(queryBox.minY);
int maxCY = toCell(queryBox.maxY);
for (int cx = minCX; cx <= maxCX; ++cx) {
for (int cy = minCY; cy <= maxCY; ++cy) {
int idx = ((hash(cx, cy) % tableSize) + tableSize)
% tableSize;
for (Entity* entity : buckets[idx]) {
// Fine check: is this entity actually in the query?
if (entity->posX >= queryBox.minX
&& entity->posX <= queryBox.maxX
&& entity->posY >= queryBox.minY
&& entity->posY <= queryBox.maxY)
{
results.push_back(entity);
}
}
}
}
}
void clear() {
for (auto& bucket : buckets) bucket.clear();
}
};
struct SpatialHash {
cell_size: f32,
table_size: usize,
buckets: Vec<Vec<Entity>>,
}
impl SpatialHash {
fn new(cell_size: f32, table_size: usize) -> Self {
SpatialHash {
cell_size,
table_size,
buckets: vec![Vec::new(); table_size],
}
}
fn hash(&self, cell_x: i32, cell_y: i32) -> usize {
let raw = (cell_x.wrapping_mul(73856093))
^ (cell_y.wrapping_mul(19349663));
(raw.unsigned_abs() as usize) % self.table_size
}
fn to_cell(&self, coord: f32) -> i32 {
(coord / self.cell_size).floor() as i32
}
fn insert(&mut self, entity: Entity) {
let cx = self.to_cell(entity.pos_x);
let cy = self.to_cell(entity.pos_y);
let idx = self.hash(cx, cy);
self.buckets[idx].push(entity);
}
fn query(&self, query_box: &AABB) -> Vec<&Entity> {
let mut results = Vec::new();
let min_cx = self.to_cell(query_box.min_x);
let max_cx = self.to_cell(query_box.max_x);
let min_cy = self.to_cell(query_box.min_y);
let max_cy = self.to_cell(query_box.max_y);
for cx in min_cx..=max_cx {
for cy in min_cy..=max_cy {
let idx = self.hash(cx, cy);
for entity in &self.buckets[idx] {
if entity.pos_x >= query_box.min_x
&& entity.pos_x <= query_box.max_x
&& entity.pos_y >= query_box.min_y
&& entity.pos_y <= query_box.max_y
{
results.push(entity);
}
}
}
}
results
}
}
11Loose bounds and margin
Tight bounding volumes cause excessive re-insertion in dynamic scenes. When an object moves by one pixel, its AABB changes, and the tree must remove and re-insert it. If the object moves every frame (which is most dynamic objects), the tree is rebuilt from scratch every frame.
Loose octrees (Ulrich 2000[6]) expand each node's bounds by a factor of two. A node that would normally cover a 10x10 region instead covers 20x20, centered on the same position. The expansion guarantees that any object whose center is in the original region, and whose radius is at most half the original region size, fits entirely within the expanded bounds without straddling children. Insert is O(log n) with no splits, no straddle, and no multi-cell registration.
Fat AABBs in Box2D's dynamic tree[10] use a different approach. Each leaf's AABB is enlarged by a fixed margin (Catto uses roughly 0.1 meters in Box2D's default settings). The object moves freely within the fat AABB without triggering a tree update. Only when the object's tight AABB leaves the fat AABB does the tree remove and re-insert the leaf, recomputing a new fat AABB centered on the current position.
The trade-off: larger margins mean fewer re-insertions (good for update cost), but looser bounding volumes mean more false positives in queries (bad for query cost). Catto's GDC 2019 talk[10] measures this empirically and finds that the margin should be roughly proportional to the object's expected displacement per frame.
12Queries
Every spatial structure supports the same four query types, but each query traverses the tree differently:
| Query | Test per node | Traversal | Engine use |
|---|---|---|---|
| Range (AABB overlap) | AABB-AABB overlap (6 comparisons) | Descend into every child whose AABB overlaps the query AABB. Prune the rest. | Broad-phase collision, proximity triggers, area-of-effect abilities. |
| Ray | Ray-AABB intersection (slab method) | Stack-based DFS with ordered traversal (near child first). tmax tightens as closer hits are found. | Mouse picking, line-of-sight, bullet traces, GPU ray tracing. |
| Frustum | AABB-frustum test (6 plane tests, or SAT) | Same as range query but with a frustum instead of an AABB. Entire subtrees outside the frustum are culled. | Frustum culling for the render list. Runs every frame. |
| K-nearest-neighbor | Distance to AABB (min-distance bounding) | Priority queue ordered by minimum possible distance. Prune any node whose min-distance exceeds the current k-th nearest. | AI perception ("find the 3 closest enemies"), LOD selection, point-cloud queries. |
The key difference is how aggressively each query can prune. Ray queries get tighter with every hit (tmax shrinks). K-NN queries get tighter as the k-th neighbor gets closer. Range and frustum queries have fixed bounds and prune only by spatial overlap.
13Dynamic updates
Static scenes build the acceleration structure once and query it many times. Dynamic scenes must update the structure every frame. Three strategies:
- Rebuild from scratch. Throw away the old tree, build a new one from the current object positions. Simple, produces optimal trees, but O(n log n) per frame. Practical for scenes under roughly 10,000 objects with fast builders (binned SAH).
- Refit. Walk the existing tree bottom-up and recompute each node's bounding volume from its children. O(n) per frame. Preserves the tree topology, so if objects move a lot the tree degrades (bounding volumes overlap, query performance drops). Refitting is what most real-time renderers do for deformable meshes in a BLAS.
- Incremental insert/remove. When an object moves, remove it from the tree, re-insert at the new position. The tree stays balanced if removal and insertion maintain invariants (rotations, as in Box2D's dynamic tree). Cost per update is O(log n). The fat-AABB trick (ยง11) amortizes the number of updates.
Erin Catto's dynamic AABB tree[10] in Box2D combines incremental insert/remove with fat margins and tree rotations. When a leaf is re-inserted, the tree walks up from the insertion point and applies AVL-style rotations to keep the tree height-balanced (the insertion path uses surface area cost to choose which subtree to descend into). The result is a broadphase that handles thousands of dynamic bodies at well under a millisecond per frame.
The widget below shows 15 objects moving. Compare "fat AABB" mode (objects move freely within their margins, re-insertion happens only when the tight box leaves the fat box) to "tight refit" mode (every object triggers a refit every frame). The refit counter shows the difference.
14Case studies
Bullet Physics: DbvtBroadphase
Bullet's default broadphase is btDbvtBroadphase[11], a pair of dynamic AABB trees (one for static objects, one for dynamic). The btDbvt class is a dynamic bounding volume tree with incremental insertion, removal, and node optimization (tree rotations to reduce total surface area). Objects can move between the static and dynamic trees. Insert and remove are O(log n); tree optimization is amortized over multiple frames.
Unreal Engine: TOctree2
Unreal uses a pointer-based octree (TOctree2 in the engine source) for spatial queries: component overlap tests, proximity queries, and the audio occlusion system. The octree is rebuilt from scratch each time a significant change occurs (level load, large batch of spawns). Individual object movement triggers incremental remove-and-reinsert. Nanite's cluster culling uses a separate BVH over cluster groups, built offline.
Unity: internal BVH
Unity's physics broadphase (PhysX integration) uses a BVH internally. The DOTS (Data-Oriented Technology Stack) physics package exposes a BVH for custom spatial queries (CollisionWorld.BuildBroadphase). The NavMesh system uses Recast/Detour (same as Unreal and Godot).
Nanite: cluster culling hierarchy
Nanite (Unreal Engine 5) organizes mesh clusters into a BVH where each node stores a bounding sphere and a projected-error metric. The GPU culling pass traverses this hierarchy, testing each node against the view frustum and the screen-space error threshold. Clusters that are too small to matter at the current LOD are culled. The hierarchy is built offline during asset import, not at runtime.
RTX TLAS/BLAS
NVIDIA's RTX ray tracing[9] exposes a two-level BVH through the Vulkan and DirectX ray tracing APIs. The BLAS contains per-mesh triangle geometry, built once (or rebuilt for deformable meshes). The TLAS contains instance transforms and BLAS references, rebuilt or refit every frame. RT Cores in the GPU hardware accelerate the ray-AABB and ray-triangle intersection tests. The driver builds the BVH using SAH (the exact algorithm is proprietary, but NVIDIA's published patents describe binned SAH variants).
15Pitfalls
- Wrong structure for the workload. A BVH for a uniform particle system is slower than a grid. A grid for a scene with mixed-size objects (1-meter crates and 500-meter terrain chunks) hits the large-object problem. Match the structure to the data distribution and object-size variance.
- Deep trees from degenerate input. A quadtree with all objects at the same position subdivides to max depth with one object per leaf chain. Set a depth limit. Same issue with BVH: collinear objects produce a degenerate linked list instead of a balanced tree. SAH handles this better than median split (it chooses to make a leaf when no split helps).
- Large objects spanning many cells. In a grid or quadtree, an object that straddles four cells is stored in all four. An object that spans 100 cells is stored in 100. Loose bounds (ยง11) or hierarchical grids fix this.
- Update cost vs query cost. Fat margins reduce update cost but increase query cost (looser bounds, more false positives). Tight bounds reduce query cost but increase update cost (more re-insertions). The right margin depends on the ratio of queries to updates per frame and the expected object velocity.
- Forgetting to refit after deformation. If the underlying geometry changes (skeletal animation, cloth simulation, destruction) and the BVH is not refitted, queries return stale or incorrect results. Refitting is cheap (O(n)), but forgetting it is a common source of "physics not matching visuals" bugs.
16What's next
GPU BVH construction is the current frontier. Karras (2012) published a parallel radix-sort-based BVH builder that constructs a tree from Morton codes entirely on the GPU. Apetrei (2014) extended it with a fast LBVH (linear BVH) variant. Modern GPU ray tracers (OptiX, DXR, Vulkan RT) use these algorithms internally, but the driver implementations are proprietary.
Parallel tree building on CPU is also active. Intel's Embree uses a hybrid approach: binned SAH at the top levels (for tree quality), then Morton-code LBVH at the bottom levels (for speed). This gives near-SAH quality at near-LBVH speed, building millions of primitives per second on a multi-core CPU.
Learned index structures for spatial data are an emerging research direction. Kraska et al. (2018) showed that neural networks can replace B-tree indexes for 1D data. Extending this to spatial data (replacing R-trees or BVHs with learned models that predict which region of space contains relevant objects) is an active area, though nothing has shipped in a game engine yet.
Compressed octrees for voxel data are relevant to voxel engines and sparse volume rendering. Efficient SVO (sparse voxel octree) compression using DAGs (directed acyclic graphs that share identical subtrees) can reduce storage by an order of magnitude for repetitive voxel scenes. Kampe et al. (2013) published the foundational work on SVO DAGs.
17Sources
- Henry Fuchs, Zvi M. Kedem, Bruce F. Naylor. "On Visible Surface Generation by A Priori Tree Structures." Computer Graphics (SIGGRAPH '80 Proceedings), 14(3):124-133, July 1980. ResearchGate. The foundational BSP tree paper. Introduces binary space partitioning for front-to-back polygon rendering.
- Donald Meagher. "Geometric Modeling Using Octree Encoding." Computer Graphics and Image Processing, 19(2):129-147, June 1982. ScienceDirect. Introduces octree encoding for 3D solid modeling. Storage proportional to surface area; linear-time boolean operations.
- Antonin Guttman. "R-Trees: A Dynamic Index Structure for Spatial Searching." Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57, 1984. ACM DL. The R-tree: a B+-tree variant for spatial indexing. Ancestor of every AABB tree in game physics.
- Jeffrey Goldsmith, John Salmon. "Automatic Creation of Object Hierarchies for Ray Tracing." IEEE Computer Graphics and Applications, 7(5):14-20, May 1987. IEEE Xplore. First automatic BVH construction algorithm. Introduced surface area as a proxy for ray-hit probability.
- Fabien Sanglard. Game Engine Black Book: Doom. Self-published, 2018. fabiensanglard.net/gebbdoom. Detailed technical analysis of Doom's BSP-based renderer, including the node builder and the runtime traversal.
- Thatcher Ulrich. "Loose Octrees." In Game Programming Gems, Charles River Media, 2000. tulrich.com/geekstuff/partitioning.html. Expand node bounds by 2x to eliminate the straddle problem. Objects always fit in exactly one node.
- Ingo Wald. "On fast Construction of SAH-based Bounding Volume Hierarchies." Proceedings of the IEEE Symposium on Interactive Ray Tracing, pp. 33-40, 2007. PDF. Binned SAH: approximate the cost function with bins for O(n) per-level construction. Enables per-frame BVH rebuilds.
- Ingo Wald, Solomon Boulos, Peter Shirley. "Ray Tracing Deformable Scenes Using Dynamic Bounding Volume Hierarchies." ACM Transactions on Graphics, 26(1), January 2007. ACM DL. Demonstrates BVH refitting for deformable scenes with minimal quality loss compared to full rebuilds.
- NVIDIA. "NVIDIA RTX: Ray Tracing Technology." Developer documentation, 2018-present. NVIDIA Blog. RT Cores accelerate ray-AABB and ray-triangle intersection in hardware. The two-level TLAS/BLAS structure separates per-instance transforms from per-mesh geometry.
- Erin Catto. "Dynamic Bounding Volume Hierarchies." GDC 2019. PDF. The broadphase behind Box2D. Dynamic AABB tree with fat margins, incremental insertion/removal, and tree rotations. The standard reference for 2D physics broadphase.
- Erwin Coumans. "Bullet Physics Library." Open source, 2003-present. GitHub. btDbvtBroadphase: dual dynamic AABB trees (static + dynamic). The default broadphase for Bullet, used in Blender, Godot, and many shipped games.